An event-driven, fully automated AWS cloud security platform that detects threats across your AWS environment and responds within seconds — no human intervention required for containment. Built on GuardDuty, Security Hub, Inspector, Macie, IAM Access Analyzer, and AWS Config, with Lambda functions that handle the full incident response lifecycle: detect → contain → forensics → alert → audit.
- Architecture Overview
- Why Automated Response
- Repository Structure
- Security Services
- Lambda Remediation Functions
- EventBridge Routing
- Infrastructure
- Testing
- Compliance Mapping
- Prerequisites
- Deployment Guide
- Runbook — Common Operations
- Troubleshooting
- Security Considerations
┌──────────────────────────────────────────────────────────────────────────────────┐
│ AWS SECURITY AUTOMATION PLATFORM │
│ │
│ DETECTION LAYER │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ GuardDuty │ │ Inspector │ │ Macie │ │ IAM Access │ │
│ │ │ │ v2 │ │ │ │ Analyzer │ │
│ │ ML-powered │ │ CVE scan: │ │ S3 sensitive │ │ External │ │
│ │ threat │ │ EC2/ECR/ │ │ data (PII, │ │ access to │ │
│ │ detection │ │ Lambda │ │ credentials) │ │ resources │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬───────┘ └──────┬──────┘ │
│ │ │ │ │ │
│ ┌──────▼────────────────▼────────────────▼──────────────────▼──────┐ │
│ │ AWS Security Hub │ │
│ │ Aggregates all findings, CIS Benchmark + PCI DSS standards │ │
│ │ Single pane of glass for security posture │ │
│ └──────────────────────────────┬────────────────────────────────────┘ │
│ │ │
│ ROUTING LAYER │ │
│ ┌──────────────────────────────▼────────────────────────────────────┐ │
│ │ Amazon EventBridge │ │
│ │ │ │
│ │ Rule: EC2 findings ──────────────────────────────────────────► │ │
│ │ Rule: IAM findings ──────────────────────────────────────────► │ │
│ │ Rule: Macie findings ─────────────────────────────────────────► │ │
│ │ Rule: IP-based findings ─────────────────────────────────────► │ │
│ │ Rule: Schedule (every 6h) ────────────────────────────────────► │ │
│ └──────────────────────────────┬────────────────────────────────────┘ │
│ │ │
│ RESPONSE LAYER │ │
│ ┌──────────┐ ┌──────────┐ ┌─┴────────┐ ┌──────────┐ ┌──────────────────┐ │
│ │isolate- │ │revoke- │ │ block-ip │ │ s3- │ │findings- │ │
│ │ec2 │ │iam-keys │ │ │ │remediat- │ │aggregator │ │
│ │ │ │ │ │ │ │ion │ │(scheduled) │ │
│ │1. Attach │ │1. Disable│ │1. WAFv2 │ │1. Block │ │• GuardDuty │ │
│ │ quarant│ │ key │ │ IP set │ │ public │ │• Security Hub │ │
│ │ ine SG │ │2. Deny │ │2. NACL │ │ access │ │• Inspector │ │
│ │2. Remove │ │ policy │ │ rule │ │2. Enable │ │• Access Analyzer │ │
│ │ all SGs│ │3. Tag │ │3. Threat │ │ KMS │ │• CloudWatch │ │
│ │3. Tag │ │ user │ │ intel │ │ encrypt│ │ metrics │ │
│ │4. Snap- │ │4. Revoke │ │ table │ │3. Enable │ │• Slack digest │ │
│ │ shot │ │ session│ │4. Alert │ │ version│ │• S3 report │ │
│ │5. Alert │ │5. Alert │ │ │ │4. Bucket │ │ │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ policy │ └──────────────────┘ │
│ │ │ │ │5. Alert │ │
│ AUDIT / NOTIFICATION LAYER │ └──────────┘ │
│ ┌────▼─────────────▼─────────────▼──────────────────────────────────────────┐ │
│ │ DynamoDB Audit Trail SNS → Slack / PagerDuty / Email │ │
│ │ - Every action logged - Alert on every CRITICAL/HIGH finding │ │
│ │ - 90-day retention - Daily digest summary │ │
│ │ - KMS encrypted - Lambda DLQ for failed remediations │ │
│ └────────────────────────────────────────────────────────────────────────────┘ │
│ │
│ COMPLIANCE / AUDIT LAYER │
│ ┌──────────────┐ ┌──────────────────────────────────────────────────────────┐ │
│ │ CloudTrail │ │ AWS Config │ │
│ │ Multi-region│ │ Rules: s3-public-access, s3-encryption, root-mfa, │ │
│ │ KMS encrypt │ │ iam-password-policy, cloudtrail-enabled │ │
│ │ S3 + CW logs│ │ Continuous compliance evaluation │ │
│ └──────────────┘ └──────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────────────┘
The average time to detect a breach is 207 days. The average time to contain it is 73 days. Automated response changes those numbers dramatically.
A compromised EC2 instance running crypto mining software can cost thousands of dollars per day. A leaked IAM key can be exploited within minutes of being discovered (attackers scan GitHub for leaked keys in near real-time). A publicly exposed S3 bucket with PII is a data breach that must be reported to regulators within 72 hours under GDPR.
This platform responds to all three scenarios in seconds:
- Compromised EC2: GuardDuty fires → EventBridge routes → Lambda attaches quarantine security group (zero egress/ingress) and creates forensic EBS snapshots → all within 30 seconds of detection
- Leaked IAM key: GuardDuty fires → EventBridge routes → Lambda disables the access key and attaches explicit-deny policy → credential is useless within 30 seconds
- Public S3 bucket: Macie fires → EventBridge routes → Lambda enables block-public-access, KMS encryption, versioning, and access logging → remediated within 60 seconds
Every action is logged to a DynamoDB audit table with TTL=90 days, and every response function has a Lambda Dead Letter Queue (DLQ) so failed remediations are not silently dropped.
aws-security-automation/
│
├── README.md
├── pytest.ini # Test configuration (80% coverage gate)
│
├── lambda/
│ ├── requirements.txt # boto3, pytest, moto, coverage
│ │
│ ├── layers/
│ │ └── common/
│ │ └── utils.py # Shared layer: logging, SNS, DynamoDB audit,
│ │ # AWS client factory, finding parsers, decorators
│ │
│ ├── functions/
│ │ ├── isolate-ec2/
│ │ │ └── handler.py # EC2 containment: quarantine SG, EBS snapshots
│ │ │
│ │ ├── revoke-iam-keys/
│ │ │ └── handler.py # IAM containment: disable key, deny policy,
│ │ │ # revoke sessions, handle root/assumed roles
│ │ │
│ │ ├── block-ip/
│ │ │ └── handler.py # Network containment: WAFv2, NACL, threat intel
│ │ │
│ │ ├── s3-remediation/
│ │ │ └── handler.py # S3 hardening: public access, KMS, versioning,
│ │ │ # access logging, restrictive bucket policy
│ │ │
│ │ └── findings-aggregator/
│ │ └── handler.py # Scheduled: aggregate all sources, CloudWatch
│ │ # metrics, daily Slack digest, S3 report
│ │
│ └── tests/
│ ├── conftest.py # Shared fixtures, env vars, boto3 mocking
│ ├── test_isolate_ec2.py # 12 tests: routing, isolation, SG, snapshots
│ ├── test_revoke_iam_keys.py # 9 tests: disable key, deny policy, root, sessions
│ └── test_s3_remediation.py # 10 tests: block-ip (7) + S3 remediation (8)
│
└── terraform/
└── environments/
└── prod/
├── main.tf # All security services + Lambda + EventBridge
├── variables.tf
└── outputs.tf
GuardDuty uses machine learning to analyze CloudTrail API logs, VPC Flow Logs, and DNS query logs continuously. It detects anomalies that rule-based tools miss — for example, an IAM user that has never called ec2:RunInstances suddenly launching 50 instances at 3am is anomalous behavior that GuardDuty catches even without a specific rule for it.
What it monitors:
| Data Source | Detections |
|---|---|
| CloudTrail management events | Unusual API calls, policy changes, new IAM users |
| CloudTrail S3 data events | Unusual S3 access patterns, mass data exfiltration |
| VPC Flow Logs | Port scanning, brute force, C2 communication |
| DNS logs | Malware C2 domain communication, DNS data exfiltration |
| EKS audit logs | Privilege escalation, container escapes |
| EBS volumes | Malware scanning via malware protection feature |
Finding types this platform responds to:
| Finding Type | MITRE ATT&CK | Response |
|---|---|---|
UnauthorizedAccess:EC2/SSHBruteForce |
T1110.001 | EC2 isolation |
UnauthorizedAccess:EC2/RDPBruteForce |
T1110.001 | EC2 isolation |
CryptoCurrency:EC2/BitcoinTool.B!DNS |
T1496 | EC2 isolation |
Backdoor:EC2/C&CActivity.B!DNS |
T1071 | EC2 isolation |
Trojan:EC2/BlackholeTraffic |
T1071 | EC2 isolation |
UnauthorizedAccess:EC2/TorClient |
T1090 | EC2 isolation |
UnauthorizedAccess:IAMUser/InstanceCredentialExfiltration |
T1552.005 | Revoke credentials |
UnauthorizedAccess:IAMUser/ConsoleLoginSuccess.B |
T1078 | Revoke credentials |
Policy:IAMUser/RootCredentialUsage |
T1078.004 | Critical alert |
Stealth:IAMUser/CloudTrailLoggingDisabled |
T1562.008 | Re-enable CloudTrail |
Impact:IAMUser/AnomalousBehavior |
T1486 | Revoke credentials |
Recon:EC2/PortProbing |
T1595 | Block IP (WAF + NACL) |
Terraform configuration:
resource "aws_guardduty_detector" "main" {
enable = true
datasources {
s3_logs { enable = true }
kubernetes { audit_logs { enable = true } }
malware_protection {
scan_ec2_instance_with_findings {
ebs_volumes { enable = true }
}
}
}
}Security Hub is the single pane of glass. It receives findings from GuardDuty, Inspector, Macie, IAM Access Analyzer, and Config, normalizes them into a standard format (ASFF — Amazon Security Finding Format), and evaluates them against security standards.
Standards enabled:
| Standard | Coverage |
|---|---|
| CIS AWS Foundations Benchmark v1.4 | 58 controls — IAM, logging, monitoring, networking |
| AWS Foundational Security Best Practices | 200+ controls across all AWS services |
| PCI DSS v3.2.1 | Payment card data protection controls |
Why Security Hub matters for this platform:
Every Lambda function can call securityhub:BatchUpdateFindings to mark findings as RESOLVED after remediation. This keeps the Security Hub dashboard clean and provides an audit trail showing which findings were auto-remediated vs. which required human intervention.
Inspector v2 continuously scans:
- EC2 instances (OS packages, software CVEs via SSM Agent)
- ECR container images (layer-by-layer CVE scanning — runs on push AND continuously on stored images)
- Lambda functions (package dependencies)
Unlike a one-time pipeline scan (like Trivy in the jenkins pipeline), Inspector monitors running workloads. If a new CVE is published for a package already installed on your EC2 fleet, Inspector fires a finding within hours.
Integration with the platform:
- New CRITICAL CVEs appear in Security Hub → findings-aggregator picks them up → included in daily digest
- Findings-aggregator tracks
critical_cveslist with CVE IDs and CVSS scores - CloudWatch metric
FindingsBySource/inspectortracks CVE count over time
Macie scans S3 buckets for sensitive data using ML classifiers:
- Personally Identifiable Information (PII) — names, addresses, SSNs, passport numbers
- Financial data — credit card numbers, bank account numbers
- Credentials — AWS access keys, private keys, passwords in files
- Healthcare data — PHI under HIPAA
Why this matters: A developer accidentally commits a CSV with 10,000 customer email addresses to an S3 bucket they're using for testing. Without Macie, this goes undetected. Macie finds it within 15 minutes (configured to FIFTEEN_MINUTES publishing frequency) and triggers the s3-remediation Lambda, which blocks public access and applies a restrictive bucket policy before anyone outside the account can download it.
Scheduled classification job:
resource "aws_macie2_classification_job" "s3_scan" {
job_type = "SCHEDULED"
schedule_frequency { weekly_schedule = "MONDAY" }
s3_job_definition {
bucket_definitions {
account_id = local.account_id
buckets = ["arn:aws:s3:::*"] # Scan all buckets
}
}
}IAM Access Analyzer continuously evaluates resource policies to identify resources accessible from outside your AWS account. It analyzes:
- S3 bucket policies
- IAM role trust policies
- KMS key policies
- Lambda function policies
- SQS queue policies
- Secrets Manager secrets
An active Access Analyzer finding means an external entity (another AWS account, the public, or a third-party service) has been granted access to one of your resources. This is sometimes intentional (a cross-account role for a vendor) but is often a misconfiguration.
Findings-aggregator tracks the count of active Access Analyzer findings and includes them in the daily digest. Any new findings are surfaced in the CloudWatch dashboard.
AWS Config continuously evaluates resource configurations against rules. Unlike GuardDuty (which detects active threats), Config detects drift — when a resource was correctly configured and then changed.
Config rules deployed:
| Rule | What It Checks | Severity |
|---|---|---|
s3-bucket-public-access-prohibited |
S3 Block Public Access enabled | HIGH |
s3-bucket-server-side-encryption-enabled |
Default encryption on all S3 buckets | HIGH |
root-account-mfa-enabled |
Root account has MFA | CRITICAL |
iam-password-policy |
Password policy meets complexity requirements | MEDIUM |
cloud-trail-enabled |
CloudTrail is active in all regions | CRITICAL |
Config + Lambda = automatic remediation:
Config violations flow to Security Hub → EventBridge → s3-remediation Lambda (for S3 rules). CloudTrail disabling is handled by revoke-iam-keys Lambda which re-enables it immediately.
CloudTrail records every API call made in the account. This is the raw material for forensic investigation after an incident. Configuration:
- Multi-region trail (catches API calls in every region, including ones you're not using)
- S3 and Lambda data events enabled (records object-level access)
- KMS encrypted logs
- CloudWatch Logs delivery (enables CloudWatch Insights queries on API activity)
- Log file validation enabled (detects if logs have been tampered with)
Why log file validation matters: If an attacker compromises your account and deletes CloudTrail logs to cover their tracks, the Stealth:IAMUser/CloudTrailLoggingDisabled GuardDuty finding fires and the revoke-iam-keys Lambda re-enables CloudTrail. Even if they delete the log files, the SHA-256 hash chain (log file validation) proves which logs are missing.
All Lambda functions share:
- A common layer (
lambda/layers/common/utils.py) with logging, SNS notifications, DynamoDB audit writer, AWS client factory - The
@remediation_handlerdecorator which provides execution timing, error handling, and structured logging reserved_concurrent_executions = 10— prevents runaway remediation loops- X-Ray tracing enabled
- Dead Letter Queue (SQS) for failed invocations
- Python 3.12 runtime
Trigger: GuardDuty EC2 findings (SSH/RDP brute force, crypto mining, C2 communication, Tor)
Decision logic:
- Finding type in
ISOLATION_FINDING_TYPESOR severity HIGH/CRITICAL → full isolation - Finding type in
ALERT_ONLY_TYPES→ notification only, no isolation - Everything else → notification only
Isolation steps:
- Describe the instance — get current security groups, VPC ID, EBS volumes, state
- Skip if already terminated
- Get or create quarantine security group (no inbound, no outbound rules, tagged with
security:purpose=quarantine) modify_instance_attribute— replace all security groups with quarantine SG onlycreate_tags— mark instance as quarantined with finding ID and timestampcreate_snapshot— forensic EBS snapshot of every attached volume- Write audit record to DynamoDB
- Send SNS notification (→ Slack)
The quarantine SG design:
The quarantine SG has absolutely no rules — not even a default allow. This is different from a "deny all" rule: there simply are no rules, so no traffic is permitted. AWS evaluates security groups as whitelists — if nothing is explicitly allowed, nothing is allowed. The key detail is that revoke_security_group_egress removes the default outbound allow-all rule that AWS creates automatically on new SGs.
Trigger: GuardDuty IAM findings (credential exfiltration, anomalous behavior, root usage, policy changes)
Decision tree:
Root credential usage?
→ Cannot disable root → critical SNS alert with escalation instructions
→ Manual investigation required
CloudTrail disabled?
→ Re-enable CloudTrail for all trails in the region
IAM finding + user principal?
→ Disable access key (Status: Inactive)
→ Attach explicit-deny inline policy (blocks ALL actions)
→ Tag user as quarantined
→ Force password reset (invalidates console session)
Assumed role?
→ Attach deny policy with DateLessThan condition
(revokes all tokens issued before this timestamp)
Why explicit-deny + disable key:
Disabling an access key is not enough. The key might have been used to create other credentials or to establish long-term sessions. The explicit-deny policy (Effect: Deny, Action: *, Resource: *) blocks all actions regardless of what other policies allow. Belt and suspenders.
Root credential usage: Root credentials cannot be disabled programmatically. The function sends a CRITICAL alert with specific investigation instructions and escalates to PagerDuty via SNS message attributes.
Trigger: GuardDuty findings with a remote IP (port probing, brute force, C2)
Safety check — private IP detection: Before blocking any IP, the function checks if it falls in RFC 1918 private ranges (10/8, 172.16/12, 192.168/16). A private IP in a GuardDuty finding usually means lateral movement — blocking it in the NACL would break internal connectivity. The function alerts with "private-ip-lateral-movement-possible" instead.
Blocking mechanism:
- WAFv2 — adds IP to a managed IP set that blocks traffic at CloudFront/ALB before it reaches EC2. This is the most effective layer — traffic is dropped at the edge.
- NACL — adds a DENY rule to the default VPC NACL. NACLs are stateless and operate at the subnet level. Uses rule numbers 200-299 (reserved for automated blocks) to avoid conflict with manual rules.
- DynamoDB threat intel table — records the IP with 30-day TTL. Can be queried by other Lambda functions to check if an IP is known-bad before establishing connections.
Triggers: Macie findings (sensitive data), Security Hub S3 controls, Config violations
Remediation actions:
| Condition | Action |
|---|---|
| Public access enabled | put_public_access_block (BlockPublicAcls, IgnorePublicAcls, BlockPublicPolicy, RestrictPublicBuckets all = true) |
| No default encryption | put_bucket_encryption with SSE-KMS |
| No versioning | put_bucket_versioning (Status: Enabled) |
| No access logging | put_bucket_logging → central security logging bucket |
| Macie PII finding | put_bucket_policy with DenyNonHTTPS + DenyPublicAccess statements |
The restrictive bucket policy: Applied only when Macie finds PII/sensitive data. Forces HTTPS (prevents eavesdropping on bucket traffic) and restricts access to the account's own principals only (no cross-account reads of PII data).
Trigger: EventBridge scheduled rule — every 6 hours
What it aggregates:
- GuardDuty: all active findings in the last 24 hours, grouped by severity and type
- Security Hub: all NEW/ACTIVE findings, grouped by severity and product
- IAM Access Analyzer: all active external access findings
- Inspector v2: CRITICAL CVEs across EC2/ECR/Lambda
Outputs:
- CloudWatch metrics —
FindingsBySeverity/{CRITICAL,HIGH,MEDIUM,LOW}andFindingsBySource/{guardduty,security_hub,inspector,iam_access_analyzer}. These power CloudWatch dashboards and alarms. - Slack daily digest — summary of finding counts with color-coded severity (red = critical, yellow = medium, green = clean). Sent via SNS.
- S3 JSON report — full aggregated report stored at
s3://findings-bucket/findings-summaries/YYYY/MM/DD/HHMMSS.json. Lifecycle rule transitions to Glacier after 90 days, expires after 365 days.
EventBridge is the routing layer between security services and Lambda functions. Each rule uses event pattern matching to route specific finding types to the correct Lambda.
Rule patterns:
// EC2 findings → isolate-ec2
{
"source": ["aws.guardduty"],
"detail-type": ["GuardDuty Finding"],
"detail": {
"type": [
{"prefix": "UnauthorizedAccess:EC2/"},
{"prefix": "CryptoCurrency:EC2/"},
{"prefix": "Backdoor:EC2/"},
{"prefix": "Trojan:EC2/"},
{"prefix": "Recon:EC2/"}
]
}
}The prefix matching is deliberate — it catches new GuardDuty finding subtypes in these categories without requiring rule updates. When AWS adds a new CryptoCurrency:EC2/MoneroTool finding type, the existing rule routes it to the right Lambda automatically.
Everything is provisioned by Terraform. Key design decisions:
Single KMS key, multiple uses: One KMS key encrypts CloudTrail logs, DynamoDB tables, S3 buckets, SQS DLQ, and Lambda environment variables. The key policy explicitly allows CloudTrail to use it for log encryption. Annual rotation is enabled.
Lambda DLQ: Every Lambda has a Dead Letter Queue (SQS). If a Lambda invocation fails (network error, throttle, unhandled exception), EventBridge retries twice and then delivers the event to the DLQ. A CloudWatch alarm monitors DLQ message count and alerts if any messages arrive — failed remediations are never silently dropped.
Reserved concurrency = 10:
Each Lambda has reserved_concurrent_executions = 10. This prevents a scenario where a flood of GuardDuty findings (e.g., a large-scale brute force attack) spawns thousands of simultaneous Lambda invocations that exhaust the account's concurrency limit and starve other Lambda functions.
S3 lifecycle policy: The findings reports bucket has a lifecycle rule: findings are accessible for 90 days (standard storage), transition to Glacier after 90 days (cost optimization), and expire after 365 days (compliance data retention).
The test suite uses pytest with moto for mocking AWS services. Coverage requirement is 80% (enforced in pytest.ini).
Test structure:
| Test File | Functions Under Test | Test Count |
|---|---|---|
test_isolate_ec2.py |
handler, isolate_instance, get_or_create_quarantine_sg, create_forensic_snapshots | 12 |
test_revoke_iam_keys.py |
handler, revoke_credentials, handle_root_credential_usage, handle_policy_change | 9 |
test_s3_remediation.py |
block-ip handler + helpers, s3-remediation handler + helpers | 15 |
Key test patterns:
All tests mock boto3 via conftest.py autouse fixture — no real AWS calls are ever made in unit tests. The mock_boto3_session fixture patches boto3.client and boto3.resource globally.
Happy path + failure handling — every function has tests for both success and graceful failure (API errors, missing resources, unexpected input).
Security invariants tested:
- Private IPs are never blocked (would break internal connectivity)
- Root credential usage never attempts to disable root (impossible)
- Terminated instances are skipped (cannot isolate a terminated instance)
- Explicit deny policy is a true Deny-All (not just a deny of specific actions)
Run tests:
cd aws-security-automation
# Install dependencies
pip install -r lambda/requirements.txt
# Run all tests with coverage
pytest
# Run specific test file
pytest lambda/tests/test_isolate_ec2.py -v
# Run with coverage report
pytest --cov=lambda --cov-report=html
open htmlcov/index.html
# Run only fast unit tests
pytest -m "not integration" -v| Control | NIST 800-53 | CIS AWS | PCI DSS | Implemented By |
|---|---|---|---|---|
| Threat detection | SI-3, SI-4 | 3.x | 11.5 | GuardDuty |
| Vulnerability management | RA-5, SI-2 | — | 6.3 | Inspector v2 |
| Sensitive data discovery | SC-28, MP-4 | — | 3.4 | Macie |
| Audit logging | AU-2, AU-12 | 2.1 | 10.2 | CloudTrail |
| Compliance monitoring | CA-7 | 1.x, 2.x | 2.2 | AWS Config |
| Incident response | IR-4, IR-5 | — | 12.9 | Lambda (all) |
| IAM access review | AC-2, AC-6 | 1.x | 7.1 | IAM Access Analyzer |
| Credential management | IA-5 | 1.14 | 8.x | revoke-iam-keys |
| Network protection | SC-7 | 4.x | 1.x | block-ip |
| Encryption at rest | SC-28 | 2.1.1 | 3.4 | s3-remediation |
| Automated remediation | SI-7, IR-4 | — | 12.10 | All Lambda functions |
- AWS account with admin permissions
- Terraform 1.7+ (
terraform version) - Python 3.12+ (
python3 --version) - AWS CLI v2 configured (
aws sts get-caller-identity) - S3 bucket for Terraform state
- DynamoDB table for Terraform lock
# 1. Clone repository
git clone https://github.com/YOUR_USERNAME/aws-security-automation.git
cd aws-security-automation
# 2. Create Terraform state backend (one-time)
aws s3 mb s3://enterprise-security-tfstate --region us-east-1
aws s3api put-bucket-versioning \
--bucket enterprise-security-tfstate \
--versioning-configuration Status=Enabled
aws dynamodb create-table \
--table-name enterprise-security-tflock \
--attribute-definitions AttributeName=LockID,AttributeType=S \
--key-schema AttributeName=LockID,KeyType=HASH \
--billing-mode PAY_PER_REQUEST
# 3. Install Python dependencies and run tests
pip install -r lambda/requirements.txt
pytest
# All tests must pass before deploying
# 4. Initialize and deploy Terraform
cd terraform/environments/prod
terraform init
terraform plan -var="alert_email=security@company.com" -out=tfplan
# Review the plan carefully
terraform apply tfplan
# 5. Verify services are active
aws guardduty list-detectors
aws securityhub describe-hub
aws macie2 get-macie-session
aws accessanalyzer list-analyzers
# 6. Test the pipeline with a GuardDuty sample finding
aws guardduty create-sample-findings \
--detector-id $(aws guardduty list-detectors --query 'DetectorIds[0]' --output text) \
--finding-types "UnauthorizedAccess:EC2/SSHBruteForce"
# Watch Lambda logs
aws logs tail /aws/lambda/security-auto-prod-isolate-ec2 --follow# Find quarantined instances
aws ec2 describe-instances \
--filters "Name=tag:security:quarantined,Values=true" \
--query 'Reservations[].Instances[].{ID:InstanceId,State:State.Name,Reason:Tags[?Key==`security:quarantine-reason`].Value|[0]}'
# Get the finding that triggered isolation
INSTANCE_ID="i-0abc123"
aws ec2 describe-tags --filters "Name=resource-id,Values=${INSTANCE_ID}" \
--query 'Tags[?Key==`security:quarantine-finding-id`].Value' --output text
# View DynamoDB audit record
aws dynamodb get-item \
--table-name security-remediation-audit \
--key '{"finding_id": {"S": "FINDING_ID_HERE"}, "timestamp": {"S": "TIMESTAMP_HERE"}}'
# Access forensic snapshot
aws ec2 describe-snapshots \
--filters "Name=tag:security:source-instance,Values=${INSTANCE_ID}"
# Once investigation is complete — restore instance
# 1. Remove quarantine tag
aws ec2 delete-tags --resources ${INSTANCE_ID} \
--tags Key=security:quarantined
# 2. Restore original security groups (stored in tag)
aws ec2 describe-tags --filters "Name=resource-id,Values=${INSTANCE_ID}" \
--query 'Tags[?Key==`security:original-security-groups`].Value' --output textIAM_USER="compromised-user"
# View all inline policies (find the deny policy)
aws iam list-user-policies --user-name ${IAM_USER}
# Remove the security quarantine policy after investigation
aws iam delete-user-policy \
--user-name ${IAM_USER} \
--policy-name "SecurityAutomationExplicitDeny"
# Re-enable access key if cleared
aws iam update-access-key \
--user-name ${IAM_USER} \
--access-key-id AKIAIOSFODNN7EXAMPLE \
--status Active
# Remove quarantine tags
aws iam untag-user \
--user-name ${IAM_USER} \
--tag-keys "security:quarantined" "security:quarantine-reason"# Query all blocked IPs
aws dynamodb scan \
--table-name security-threat-intel \
--query 'Items[].{IP:ip_address.S, Type:finding_type.S, Severity:severity.S, Seen:first_seen.S}'
# Remove an IP that was false-positive
aws dynamodb delete-item \
--table-name security-threat-intel \
--key '{"ip_address": {"S": "1.2.3.4"}}'
# Also remove from WAF
IP_SET_ID=$(aws wafv2 list-ip-sets --scope REGIONAL \
--query 'IPSets[?Name==`security-auto-prod-blocked-ips`].Id' --output text)
aws wafv2 get-ip-set --name security-auto-prod-blocked-ips \
--scope REGIONAL --id ${IP_SET_ID}
# Then update-ip-set removing the IP# Trigger isolate-ec2 manually
aws lambda invoke \
--function-name security-auto-prod-isolate-ec2 \
--payload '{"source":"aws.guardduty","detail-type":"GuardDuty Finding","detail":{"id":"manual-test-001","type":"UnauthorizedAccess:EC2/SSHBruteForce","severity":8.0,"accountId":"123456789012","region":"us-east-1","title":"Manual test","description":"Manual trigger","resource":{"instanceDetails":{"instanceId":"i-YOUR_INSTANCE_ID","networkInterfaces":[]}},"service":{"action":{}}}}' \
output.json
cat output.json | jq# Manually trigger the findings aggregator
aws lambda invoke \
--function-name security-auto-prod-findings-aggregator \
--payload '{}' \
aggregator-output.json
cat aggregator-output.json | jq
# Check the S3 reports
TODAY=$(date +%Y/%m/%d)
aws s3 ls s3://security-auto-prod-findings-ACCOUNT_ID/findings-summaries/${TODAY}/
aws s3 cp s3://security-auto-prod-findings-ACCOUNT_ID/findings-summaries/${TODAY}/LATEST.json - | jqLambda is not triggering on GuardDuty findings:
# Verify EventBridge rule is enabled
aws events describe-rule --name security-auto-prod-guardduty-ec2
# Check Lambda permissions
aws lambda get-policy --function-name security-auto-prod-isolate-ec2
# Test EventBridge rule with a sample event
aws events test-event-pattern \
--event-pattern '{"source":["aws.guardduty"],"detail":{"type":[{"prefix":"UnauthorizedAccess:EC2/"}]}}' \
--event '{"source":"aws.guardduty","detail":{"type":"UnauthorizedAccess:EC2/SSHBruteForce"}}'Lambda is failing — check DLQ:
# Get DLQ URL
DLQ_URL=$(aws sqs get-queue-url --queue-name security-auto-prod-lambda-dlq --query QueueUrl --output text)
# Receive failed messages
aws sqs receive-message --queue-url ${DLQ_URL} --max-number-of-messages 10Lambda cannot modify EC2 instance:
- Verify the Lambda execution role has
ec2:ModifyInstanceAttributepermission - Check if the instance has a resource-based policy blocking modifications
- Verify the Lambda is in a VPC/region that can reach the EC2 API endpoint
GuardDuty not generating findings in test:
# Generate sample findings for all types
aws guardduty create-sample-findings \
--detector-id DETECTOR_ID \
--finding-types \
"UnauthorizedAccess:EC2/SSHBruteForce" \
"CryptoCurrency:EC2/BitcoinTool.B!DNS" \
"UnauthorizedAccess:IAMUser/InstanceCredentialExfiltration.OutsideAWS"Principle of least privilege — Lambda IAM role:
The Lambda execution role grants only the specific actions needed for each remediation type. For example, it allows ec2:ModifyInstanceAttribute (to change security groups) but not ec2:TerminateInstances (intentional — terminating an instance destroys forensic evidence). Review the IAM policy in main.tf before deploying.
Reserved concurrency as a safety control:
reserved_concurrent_executions = 10 prevents automated runaway. If 1000 GuardDuty findings fire simultaneously (e.g., during a large-scale attack), only 10 Lambda invocations run concurrently. The rest queue. This prevents both AWS account concurrency exhaustion and inadvertent mass remediation.
Private IP safety check:
The block-ip function refuses to block RFC 1918 addresses. A private source IP in a GuardDuty finding often indicates lateral movement (an attacker who already has a foothold in your network) — blocking it in the NACL would block legitimate internal traffic and potentially prevent your own incident response team from accessing affected systems.
Root credential handling:
Root credentials cannot be disabled via API. The revoke-iam-keys function never attempts to call iam:UpdateAccessKey for root — doing so would throw an exception that could mask the critical alert. Instead, it sends a maximum-urgency SNS notification and explicitly lists manual investigation steps.
Audit trail integrity:
Every DynamoDB audit record includes lambda_function (function name), timestamp, finding_id, actions_taken, and success. The table has point-in-time recovery enabled. Records have a 90-day TTL — long enough for compliance audits but not forever (cost control).
MIT