Skip to content

effiekarinea/aws-security-automation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AWS Security Automation Platform

Python Terraform AWS Lambda GuardDuty Security Hub pytest Coverage

An event-driven, fully automated AWS cloud security platform that detects threats across your AWS environment and responds within seconds — no human intervention required for containment. Built on GuardDuty, Security Hub, Inspector, Macie, IAM Access Analyzer, and AWS Config, with Lambda functions that handle the full incident response lifecycle: detect → contain → forensics → alert → audit.


Table of Contents

  1. Architecture Overview
  2. Why Automated Response
  3. Repository Structure
  4. Security Services
  5. Lambda Remediation Functions
  6. EventBridge Routing
  7. Infrastructure
  8. Testing
  9. Compliance Mapping
  10. Prerequisites
  11. Deployment Guide
  12. Runbook — Common Operations
  13. Troubleshooting
  14. Security Considerations

Architecture Overview

┌──────────────────────────────────────────────────────────────────────────────────┐
│                    AWS SECURITY AUTOMATION PLATFORM                              │
│                                                                                  │
│  DETECTION LAYER                                                                 │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐           │
│  │  GuardDuty  │  │  Inspector  │  │    Macie     │  │  IAM Access │           │
│  │             │  │    v2       │  │              │  │  Analyzer   │           │
│  │ ML-powered  │  │ CVE scan:   │  │ S3 sensitive │  │ External    │           │
│  │ threat      │  │ EC2/ECR/    │  │ data (PII,   │  │ access to   │           │
│  │ detection   │  │ Lambda      │  │ credentials) │  │ resources   │           │
│  └──────┬──────┘  └──────┬──────┘  └──────┬───────┘  └──────┬──────┘           │
│         │                │                │                  │                  │
│  ┌──────▼────────────────▼────────────────▼──────────────────▼──────┐           │
│  │                  AWS Security Hub                                 │           │
│  │  Aggregates all findings, CIS Benchmark + PCI DSS standards       │           │
│  │  Single pane of glass for security posture                        │           │
│  └──────────────────────────────┬────────────────────────────────────┘           │
│                                 │                                                │
│  ROUTING LAYER                  │                                                │
│  ┌──────────────────────────────▼────────────────────────────────────┐           │
│  │                    Amazon EventBridge                             │           │
│  │                                                                   │           │
│  │  Rule: EC2 findings ──────────────────────────────────────────►  │           │
│  │  Rule: IAM findings ──────────────────────────────────────────►  │           │
│  │  Rule: Macie findings ─────────────────────────────────────────► │           │
│  │  Rule: IP-based findings ─────────────────────────────────────►  │           │
│  │  Rule: Schedule (every 6h) ────────────────────────────────────► │           │
│  └──────────────────────────────┬────────────────────────────────────┘           │
│                                 │                                                │
│  RESPONSE LAYER                 │                                                │
│  ┌──────────┐  ┌──────────┐  ┌─┴────────┐  ┌──────────┐  ┌──────────────────┐  │
│  │isolate-  │  │revoke-   │  │ block-ip │  │   s3-    │  │findings-         │  │
│  │ec2       │  │iam-keys  │  │          │  │remediat- │  │aggregator        │  │
│  │          │  │          │  │          │  │ion       │  │(scheduled)       │  │
│  │1. Attach │  │1. Disable│  │1. WAFv2  │  │1. Block  │  │• GuardDuty       │  │
│  │   quarant│  │   key    │  │   IP set │  │   public │  │• Security Hub    │  │
│  │   ine SG │  │2. Deny   │  │2. NACL   │  │   access │  │• Inspector       │  │
│  │2. Remove │  │   policy │  │   rule   │  │2. Enable │  │• Access Analyzer │  │
│  │   all SGs│  │3. Tag    │  │3. Threat │  │   KMS    │  │• CloudWatch      │  │
│  │3. Tag    │  │   user   │  │   intel  │  │   encrypt│  │  metrics         │  │
│  │4. Snap-  │  │4. Revoke │  │   table  │  │3. Enable │  │• Slack digest    │  │
│  │   shot   │  │   session│  │4. Alert  │  │   version│  │• S3 report       │  │
│  │5. Alert  │  │5. Alert  │  │          │  │4. Bucket │  │                  │  │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘  │   policy │  └──────────────────┘  │
│       │             │             │         │5. Alert  │                         │
│  AUDIT / NOTIFICATION LAYER      │         └──────────┘                         │
│  ┌────▼─────────────▼─────────────▼──────────────────────────────────────────┐   │
│  │  DynamoDB Audit Trail      SNS → Slack / PagerDuty / Email               │   │
│  │  - Every action logged     - Alert on every CRITICAL/HIGH finding        │   │
│  │  - 90-day retention        - Daily digest summary                        │   │
│  │  - KMS encrypted           - Lambda DLQ for failed remediations          │   │
│  └────────────────────────────────────────────────────────────────────────────┘   │
│                                                                                  │
│  COMPLIANCE / AUDIT LAYER                                                        │
│  ┌──────────────┐  ┌──────────────────────────────────────────────────────────┐  │
│  │  CloudTrail  │  │  AWS Config                                              │  │
│  │  Multi-region│  │  Rules: s3-public-access, s3-encryption, root-mfa,       │  │
│  │  KMS encrypt │  │  iam-password-policy, cloudtrail-enabled                  │  │
│  │  S3 + CW logs│  │  Continuous compliance evaluation                        │  │
│  └──────────────┘  └──────────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────────────────────────────┘

Why Automated Response

The average time to detect a breach is 207 days. The average time to contain it is 73 days. Automated response changes those numbers dramatically.

A compromised EC2 instance running crypto mining software can cost thousands of dollars per day. A leaked IAM key can be exploited within minutes of being discovered (attackers scan GitHub for leaked keys in near real-time). A publicly exposed S3 bucket with PII is a data breach that must be reported to regulators within 72 hours under GDPR.

This platform responds to all three scenarios in seconds:

  • Compromised EC2: GuardDuty fires → EventBridge routes → Lambda attaches quarantine security group (zero egress/ingress) and creates forensic EBS snapshots → all within 30 seconds of detection
  • Leaked IAM key: GuardDuty fires → EventBridge routes → Lambda disables the access key and attaches explicit-deny policy → credential is useless within 30 seconds
  • Public S3 bucket: Macie fires → EventBridge routes → Lambda enables block-public-access, KMS encryption, versioning, and access logging → remediated within 60 seconds

Every action is logged to a DynamoDB audit table with TTL=90 days, and every response function has a Lambda Dead Letter Queue (DLQ) so failed remediations are not silently dropped.


Repository Structure

aws-security-automation/
│
├── README.md
├── pytest.ini                              # Test configuration (80% coverage gate)
│
├── lambda/
│   ├── requirements.txt                    # boto3, pytest, moto, coverage
│   │
│   ├── layers/
│   │   └── common/
│   │       └── utils.py                    # Shared layer: logging, SNS, DynamoDB audit,
│   │                                       # AWS client factory, finding parsers, decorators
│   │
│   ├── functions/
│   │   ├── isolate-ec2/
│   │   │   └── handler.py                  # EC2 containment: quarantine SG, EBS snapshots
│   │   │
│   │   ├── revoke-iam-keys/
│   │   │   └── handler.py                  # IAM containment: disable key, deny policy,
│   │   │                                   # revoke sessions, handle root/assumed roles
│   │   │
│   │   ├── block-ip/
│   │   │   └── handler.py                  # Network containment: WAFv2, NACL, threat intel
│   │   │
│   │   ├── s3-remediation/
│   │   │   └── handler.py                  # S3 hardening: public access, KMS, versioning,
│   │   │                                   # access logging, restrictive bucket policy
│   │   │
│   │   └── findings-aggregator/
│   │       └── handler.py                  # Scheduled: aggregate all sources, CloudWatch
│   │                                       # metrics, daily Slack digest, S3 report
│   │
│   └── tests/
│       ├── conftest.py                     # Shared fixtures, env vars, boto3 mocking
│       ├── test_isolate_ec2.py             # 12 tests: routing, isolation, SG, snapshots
│       ├── test_revoke_iam_keys.py         # 9 tests: disable key, deny policy, root, sessions
│       └── test_s3_remediation.py          # 10 tests: block-ip (7) + S3 remediation (8)
│
└── terraform/
    └── environments/
        └── prod/
            ├── main.tf                     # All security services + Lambda + EventBridge
            ├── variables.tf
            └── outputs.tf

Security Services

GuardDuty — Threat Detection

GuardDuty uses machine learning to analyze CloudTrail API logs, VPC Flow Logs, and DNS query logs continuously. It detects anomalies that rule-based tools miss — for example, an IAM user that has never called ec2:RunInstances suddenly launching 50 instances at 3am is anomalous behavior that GuardDuty catches even without a specific rule for it.

What it monitors:

Data Source Detections
CloudTrail management events Unusual API calls, policy changes, new IAM users
CloudTrail S3 data events Unusual S3 access patterns, mass data exfiltration
VPC Flow Logs Port scanning, brute force, C2 communication
DNS logs Malware C2 domain communication, DNS data exfiltration
EKS audit logs Privilege escalation, container escapes
EBS volumes Malware scanning via malware protection feature

Finding types this platform responds to:

Finding Type MITRE ATT&CK Response
UnauthorizedAccess:EC2/SSHBruteForce T1110.001 EC2 isolation
UnauthorizedAccess:EC2/RDPBruteForce T1110.001 EC2 isolation
CryptoCurrency:EC2/BitcoinTool.B!DNS T1496 EC2 isolation
Backdoor:EC2/C&CActivity.B!DNS T1071 EC2 isolation
Trojan:EC2/BlackholeTraffic T1071 EC2 isolation
UnauthorizedAccess:EC2/TorClient T1090 EC2 isolation
UnauthorizedAccess:IAMUser/InstanceCredentialExfiltration T1552.005 Revoke credentials
UnauthorizedAccess:IAMUser/ConsoleLoginSuccess.B T1078 Revoke credentials
Policy:IAMUser/RootCredentialUsage T1078.004 Critical alert
Stealth:IAMUser/CloudTrailLoggingDisabled T1562.008 Re-enable CloudTrail
Impact:IAMUser/AnomalousBehavior T1486 Revoke credentials
Recon:EC2/PortProbing T1595 Block IP (WAF + NACL)

Terraform configuration:

resource "aws_guardduty_detector" "main" {
  enable = true
  datasources {
    s3_logs { enable = true }
    kubernetes { audit_logs { enable = true } }
    malware_protection {
      scan_ec2_instance_with_findings {
        ebs_volumes { enable = true }
      }
    }
  }
}

Security Hub — Finding Aggregation

Security Hub is the single pane of glass. It receives findings from GuardDuty, Inspector, Macie, IAM Access Analyzer, and Config, normalizes them into a standard format (ASFF — Amazon Security Finding Format), and evaluates them against security standards.

Standards enabled:

Standard Coverage
CIS AWS Foundations Benchmark v1.4 58 controls — IAM, logging, monitoring, networking
AWS Foundational Security Best Practices 200+ controls across all AWS services
PCI DSS v3.2.1 Payment card data protection controls

Why Security Hub matters for this platform:

Every Lambda function can call securityhub:BatchUpdateFindings to mark findings as RESOLVED after remediation. This keeps the Security Hub dashboard clean and provides an audit trail showing which findings were auto-remediated vs. which required human intervention.

Inspector v2 — Vulnerability Scanning

Inspector v2 continuously scans:

  • EC2 instances (OS packages, software CVEs via SSM Agent)
  • ECR container images (layer-by-layer CVE scanning — runs on push AND continuously on stored images)
  • Lambda functions (package dependencies)

Unlike a one-time pipeline scan (like Trivy in the jenkins pipeline), Inspector monitors running workloads. If a new CVE is published for a package already installed on your EC2 fleet, Inspector fires a finding within hours.

Integration with the platform:

  • New CRITICAL CVEs appear in Security Hub → findings-aggregator picks them up → included in daily digest
  • Findings-aggregator tracks critical_cves list with CVE IDs and CVSS scores
  • CloudWatch metric FindingsBySource/inspector tracks CVE count over time

Macie — Sensitive Data Discovery

Macie scans S3 buckets for sensitive data using ML classifiers:

  • Personally Identifiable Information (PII) — names, addresses, SSNs, passport numbers
  • Financial data — credit card numbers, bank account numbers
  • Credentials — AWS access keys, private keys, passwords in files
  • Healthcare data — PHI under HIPAA

Why this matters: A developer accidentally commits a CSV with 10,000 customer email addresses to an S3 bucket they're using for testing. Without Macie, this goes undetected. Macie finds it within 15 minutes (configured to FIFTEEN_MINUTES publishing frequency) and triggers the s3-remediation Lambda, which blocks public access and applies a restrictive bucket policy before anyone outside the account can download it.

Scheduled classification job:

resource "aws_macie2_classification_job" "s3_scan" {
  job_type = "SCHEDULED"
  schedule_frequency { weekly_schedule = "MONDAY" }
  s3_job_definition {
    bucket_definitions {
      account_id = local.account_id
      buckets    = ["arn:aws:s3:::*"]  # Scan all buckets
    }
  }
}

IAM Access Analyzer

IAM Access Analyzer continuously evaluates resource policies to identify resources accessible from outside your AWS account. It analyzes:

  • S3 bucket policies
  • IAM role trust policies
  • KMS key policies
  • Lambda function policies
  • SQS queue policies
  • Secrets Manager secrets

An active Access Analyzer finding means an external entity (another AWS account, the public, or a third-party service) has been granted access to one of your resources. This is sometimes intentional (a cross-account role for a vendor) but is often a misconfiguration.

Findings-aggregator tracks the count of active Access Analyzer findings and includes them in the daily digest. Any new findings are surfaced in the CloudWatch dashboard.

AWS Config — Compliance Monitoring

AWS Config continuously evaluates resource configurations against rules. Unlike GuardDuty (which detects active threats), Config detects drift — when a resource was correctly configured and then changed.

Config rules deployed:

Rule What It Checks Severity
s3-bucket-public-access-prohibited S3 Block Public Access enabled HIGH
s3-bucket-server-side-encryption-enabled Default encryption on all S3 buckets HIGH
root-account-mfa-enabled Root account has MFA CRITICAL
iam-password-policy Password policy meets complexity requirements MEDIUM
cloud-trail-enabled CloudTrail is active in all regions CRITICAL

Config + Lambda = automatic remediation: Config violations flow to Security Hub → EventBridge → s3-remediation Lambda (for S3 rules). CloudTrail disabling is handled by revoke-iam-keys Lambda which re-enables it immediately.

CloudTrail — Audit Logging

CloudTrail records every API call made in the account. This is the raw material for forensic investigation after an incident. Configuration:

  • Multi-region trail (catches API calls in every region, including ones you're not using)
  • S3 and Lambda data events enabled (records object-level access)
  • KMS encrypted logs
  • CloudWatch Logs delivery (enables CloudWatch Insights queries on API activity)
  • Log file validation enabled (detects if logs have been tampered with)

Why log file validation matters: If an attacker compromises your account and deletes CloudTrail logs to cover their tracks, the Stealth:IAMUser/CloudTrailLoggingDisabled GuardDuty finding fires and the revoke-iam-keys Lambda re-enables CloudTrail. Even if they delete the log files, the SHA-256 hash chain (log file validation) proves which logs are missing.


Lambda Remediation Functions

All Lambda functions share:

  • A common layer (lambda/layers/common/utils.py) with logging, SNS notifications, DynamoDB audit writer, AWS client factory
  • The @remediation_handler decorator which provides execution timing, error handling, and structured logging
  • reserved_concurrent_executions = 10 — prevents runaway remediation loops
  • X-Ray tracing enabled
  • Dead Letter Queue (SQS) for failed invocations
  • Python 3.12 runtime

isolate-ec2

Trigger: GuardDuty EC2 findings (SSH/RDP brute force, crypto mining, C2 communication, Tor)

Decision logic:

  • Finding type in ISOLATION_FINDING_TYPES OR severity HIGH/CRITICAL → full isolation
  • Finding type in ALERT_ONLY_TYPES → notification only, no isolation
  • Everything else → notification only

Isolation steps:

  1. Describe the instance — get current security groups, VPC ID, EBS volumes, state
  2. Skip if already terminated
  3. Get or create quarantine security group (no inbound, no outbound rules, tagged with security:purpose=quarantine)
  4. modify_instance_attribute — replace all security groups with quarantine SG only
  5. create_tags — mark instance as quarantined with finding ID and timestamp
  6. create_snapshot — forensic EBS snapshot of every attached volume
  7. Write audit record to DynamoDB
  8. Send SNS notification (→ Slack)

The quarantine SG design: The quarantine SG has absolutely no rules — not even a default allow. This is different from a "deny all" rule: there simply are no rules, so no traffic is permitted. AWS evaluates security groups as whitelists — if nothing is explicitly allowed, nothing is allowed. The key detail is that revoke_security_group_egress removes the default outbound allow-all rule that AWS creates automatically on new SGs.

revoke-iam-keys

Trigger: GuardDuty IAM findings (credential exfiltration, anomalous behavior, root usage, policy changes)

Decision tree:

Root credential usage?
  → Cannot disable root → critical SNS alert with escalation instructions
  → Manual investigation required

CloudTrail disabled?
  → Re-enable CloudTrail for all trails in the region

IAM finding + user principal?
  → Disable access key (Status: Inactive)
  → Attach explicit-deny inline policy (blocks ALL actions)
  → Tag user as quarantined
  → Force password reset (invalidates console session)

Assumed role?
  → Attach deny policy with DateLessThan condition
    (revokes all tokens issued before this timestamp)

Why explicit-deny + disable key: Disabling an access key is not enough. The key might have been used to create other credentials or to establish long-term sessions. The explicit-deny policy (Effect: Deny, Action: *, Resource: *) blocks all actions regardless of what other policies allow. Belt and suspenders.

Root credential usage: Root credentials cannot be disabled programmatically. The function sends a CRITICAL alert with specific investigation instructions and escalates to PagerDuty via SNS message attributes.

block-ip

Trigger: GuardDuty findings with a remote IP (port probing, brute force, C2)

Safety check — private IP detection: Before blocking any IP, the function checks if it falls in RFC 1918 private ranges (10/8, 172.16/12, 192.168/16). A private IP in a GuardDuty finding usually means lateral movement — blocking it in the NACL would break internal connectivity. The function alerts with "private-ip-lateral-movement-possible" instead.

Blocking mechanism:

  1. WAFv2 — adds IP to a managed IP set that blocks traffic at CloudFront/ALB before it reaches EC2. This is the most effective layer — traffic is dropped at the edge.
  2. NACL — adds a DENY rule to the default VPC NACL. NACLs are stateless and operate at the subnet level. Uses rule numbers 200-299 (reserved for automated blocks) to avoid conflict with manual rules.
  3. DynamoDB threat intel table — records the IP with 30-day TTL. Can be queried by other Lambda functions to check if an IP is known-bad before establishing connections.

s3-remediation

Triggers: Macie findings (sensitive data), Security Hub S3 controls, Config violations

Remediation actions:

Condition Action
Public access enabled put_public_access_block (BlockPublicAcls, IgnorePublicAcls, BlockPublicPolicy, RestrictPublicBuckets all = true)
No default encryption put_bucket_encryption with SSE-KMS
No versioning put_bucket_versioning (Status: Enabled)
No access logging put_bucket_logging → central security logging bucket
Macie PII finding put_bucket_policy with DenyNonHTTPS + DenyPublicAccess statements

The restrictive bucket policy: Applied only when Macie finds PII/sensitive data. Forces HTTPS (prevents eavesdropping on bucket traffic) and restricts access to the account's own principals only (no cross-account reads of PII data).

findings-aggregator

Trigger: EventBridge scheduled rule — every 6 hours

What it aggregates:

  • GuardDuty: all active findings in the last 24 hours, grouped by severity and type
  • Security Hub: all NEW/ACTIVE findings, grouped by severity and product
  • IAM Access Analyzer: all active external access findings
  • Inspector v2: CRITICAL CVEs across EC2/ECR/Lambda

Outputs:

  1. CloudWatch metricsFindingsBySeverity/{CRITICAL,HIGH,MEDIUM,LOW} and FindingsBySource/{guardduty,security_hub,inspector,iam_access_analyzer}. These power CloudWatch dashboards and alarms.
  2. Slack daily digest — summary of finding counts with color-coded severity (red = critical, yellow = medium, green = clean). Sent via SNS.
  3. S3 JSON report — full aggregated report stored at s3://findings-bucket/findings-summaries/YYYY/MM/DD/HHMMSS.json. Lifecycle rule transitions to Glacier after 90 days, expires after 365 days.

EventBridge Routing

EventBridge is the routing layer between security services and Lambda functions. Each rule uses event pattern matching to route specific finding types to the correct Lambda.

Rule patterns:

// EC2 findings → isolate-ec2
{
  "source": ["aws.guardduty"],
  "detail-type": ["GuardDuty Finding"],
  "detail": {
    "type": [
      {"prefix": "UnauthorizedAccess:EC2/"},
      {"prefix": "CryptoCurrency:EC2/"},
      {"prefix": "Backdoor:EC2/"},
      {"prefix": "Trojan:EC2/"},
      {"prefix": "Recon:EC2/"}
    ]
  }
}

The prefix matching is deliberate — it catches new GuardDuty finding subtypes in these categories without requiring rule updates. When AWS adds a new CryptoCurrency:EC2/MoneroTool finding type, the existing rule routes it to the right Lambda automatically.


Infrastructure

Everything is provisioned by Terraform. Key design decisions:

Single KMS key, multiple uses: One KMS key encrypts CloudTrail logs, DynamoDB tables, S3 buckets, SQS DLQ, and Lambda environment variables. The key policy explicitly allows CloudTrail to use it for log encryption. Annual rotation is enabled.

Lambda DLQ: Every Lambda has a Dead Letter Queue (SQS). If a Lambda invocation fails (network error, throttle, unhandled exception), EventBridge retries twice and then delivers the event to the DLQ. A CloudWatch alarm monitors DLQ message count and alerts if any messages arrive — failed remediations are never silently dropped.

Reserved concurrency = 10: Each Lambda has reserved_concurrent_executions = 10. This prevents a scenario where a flood of GuardDuty findings (e.g., a large-scale brute force attack) spawns thousands of simultaneous Lambda invocations that exhaust the account's concurrency limit and starve other Lambda functions.

S3 lifecycle policy: The findings reports bucket has a lifecycle rule: findings are accessible for 90 days (standard storage), transition to Glacier after 90 days (cost optimization), and expire after 365 days (compliance data retention).


Testing

The test suite uses pytest with moto for mocking AWS services. Coverage requirement is 80% (enforced in pytest.ini).

Test structure:

Test File Functions Under Test Test Count
test_isolate_ec2.py handler, isolate_instance, get_or_create_quarantine_sg, create_forensic_snapshots 12
test_revoke_iam_keys.py handler, revoke_credentials, handle_root_credential_usage, handle_policy_change 9
test_s3_remediation.py block-ip handler + helpers, s3-remediation handler + helpers 15

Key test patterns:

All tests mock boto3 via conftest.py autouse fixture — no real AWS calls are ever made in unit tests. The mock_boto3_session fixture patches boto3.client and boto3.resource globally.

Happy path + failure handling — every function has tests for both success and graceful failure (API errors, missing resources, unexpected input).

Security invariants tested:

  • Private IPs are never blocked (would break internal connectivity)
  • Root credential usage never attempts to disable root (impossible)
  • Terminated instances are skipped (cannot isolate a terminated instance)
  • Explicit deny policy is a true Deny-All (not just a deny of specific actions)

Run tests:

cd aws-security-automation

# Install dependencies
pip install -r lambda/requirements.txt

# Run all tests with coverage
pytest

# Run specific test file
pytest lambda/tests/test_isolate_ec2.py -v

# Run with coverage report
pytest --cov=lambda --cov-report=html
open htmlcov/index.html

# Run only fast unit tests
pytest -m "not integration" -v

Compliance Mapping

Control NIST 800-53 CIS AWS PCI DSS Implemented By
Threat detection SI-3, SI-4 3.x 11.5 GuardDuty
Vulnerability management RA-5, SI-2 6.3 Inspector v2
Sensitive data discovery SC-28, MP-4 3.4 Macie
Audit logging AU-2, AU-12 2.1 10.2 CloudTrail
Compliance monitoring CA-7 1.x, 2.x 2.2 AWS Config
Incident response IR-4, IR-5 12.9 Lambda (all)
IAM access review AC-2, AC-6 1.x 7.1 IAM Access Analyzer
Credential management IA-5 1.14 8.x revoke-iam-keys
Network protection SC-7 4.x 1.x block-ip
Encryption at rest SC-28 2.1.1 3.4 s3-remediation
Automated remediation SI-7, IR-4 12.10 All Lambda functions

Prerequisites

  • AWS account with admin permissions
  • Terraform 1.7+ (terraform version)
  • Python 3.12+ (python3 --version)
  • AWS CLI v2 configured (aws sts get-caller-identity)
  • S3 bucket for Terraform state
  • DynamoDB table for Terraform lock

Deployment Guide

# 1. Clone repository
git clone https://github.com/YOUR_USERNAME/aws-security-automation.git
cd aws-security-automation

# 2. Create Terraform state backend (one-time)
aws s3 mb s3://enterprise-security-tfstate --region us-east-1
aws s3api put-bucket-versioning \
  --bucket enterprise-security-tfstate \
  --versioning-configuration Status=Enabled

aws dynamodb create-table \
  --table-name enterprise-security-tflock \
  --attribute-definitions AttributeName=LockID,AttributeType=S \
  --key-schema AttributeName=LockID,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST

# 3. Install Python dependencies and run tests
pip install -r lambda/requirements.txt
pytest
# All tests must pass before deploying

# 4. Initialize and deploy Terraform
cd terraform/environments/prod
terraform init
terraform plan -var="alert_email=security@company.com" -out=tfplan
# Review the plan carefully
terraform apply tfplan

# 5. Verify services are active
aws guardduty list-detectors
aws securityhub describe-hub
aws macie2 get-macie-session
aws accessanalyzer list-analyzers

# 6. Test the pipeline with a GuardDuty sample finding
aws guardduty create-sample-findings \
  --detector-id $(aws guardduty list-detectors --query 'DetectorIds[0]' --output text) \
  --finding-types "UnauthorizedAccess:EC2/SSHBruteForce"

# Watch Lambda logs
aws logs tail /aws/lambda/security-auto-prod-isolate-ec2 --follow

Runbook — Common Operations

Investigating a quarantined EC2 instance

# Find quarantined instances
aws ec2 describe-instances \
  --filters "Name=tag:security:quarantined,Values=true" \
  --query 'Reservations[].Instances[].{ID:InstanceId,State:State.Name,Reason:Tags[?Key==`security:quarantine-reason`].Value|[0]}'

# Get the finding that triggered isolation
INSTANCE_ID="i-0abc123"
aws ec2 describe-tags --filters "Name=resource-id,Values=${INSTANCE_ID}" \
  --query 'Tags[?Key==`security:quarantine-finding-id`].Value' --output text

# View DynamoDB audit record
aws dynamodb get-item \
  --table-name security-remediation-audit \
  --key '{"finding_id": {"S": "FINDING_ID_HERE"}, "timestamp": {"S": "TIMESTAMP_HERE"}}'

# Access forensic snapshot
aws ec2 describe-snapshots \
  --filters "Name=tag:security:source-instance,Values=${INSTANCE_ID}"

# Once investigation is complete — restore instance
# 1. Remove quarantine tag
aws ec2 delete-tags --resources ${INSTANCE_ID} \
  --tags Key=security:quarantined

# 2. Restore original security groups (stored in tag)
aws ec2 describe-tags --filters "Name=resource-id,Values=${INSTANCE_ID}" \
  --query 'Tags[?Key==`security:original-security-groups`].Value' --output text

Restoring an IAM user after investigation

IAM_USER="compromised-user"

# View all inline policies (find the deny policy)
aws iam list-user-policies --user-name ${IAM_USER}

# Remove the security quarantine policy after investigation
aws iam delete-user-policy \
  --user-name ${IAM_USER} \
  --policy-name "SecurityAutomationExplicitDeny"

# Re-enable access key if cleared
aws iam update-access-key \
  --user-name ${IAM_USER} \
  --access-key-id AKIAIOSFODNN7EXAMPLE \
  --status Active

# Remove quarantine tags
aws iam untag-user \
  --user-name ${IAM_USER} \
  --tag-keys "security:quarantined" "security:quarantine-reason"

Checking the threat intelligence table

# Query all blocked IPs
aws dynamodb scan \
  --table-name security-threat-intel \
  --query 'Items[].{IP:ip_address.S, Type:finding_type.S, Severity:severity.S, Seen:first_seen.S}'

# Remove an IP that was false-positive
aws dynamodb delete-item \
  --table-name security-threat-intel \
  --key '{"ip_address": {"S": "1.2.3.4"}}'

# Also remove from WAF
IP_SET_ID=$(aws wafv2 list-ip-sets --scope REGIONAL \
  --query 'IPSets[?Name==`security-auto-prod-blocked-ips`].Id' --output text)

aws wafv2 get-ip-set --name security-auto-prod-blocked-ips \
  --scope REGIONAL --id ${IP_SET_ID}
# Then update-ip-set removing the IP

Manually triggering a remediation

# Trigger isolate-ec2 manually
aws lambda invoke \
  --function-name security-auto-prod-isolate-ec2 \
  --payload '{"source":"aws.guardduty","detail-type":"GuardDuty Finding","detail":{"id":"manual-test-001","type":"UnauthorizedAccess:EC2/SSHBruteForce","severity":8.0,"accountId":"123456789012","region":"us-east-1","title":"Manual test","description":"Manual trigger","resource":{"instanceDetails":{"instanceId":"i-YOUR_INSTANCE_ID","networkInterfaces":[]}},"service":{"action":{}}}}' \
  output.json

cat output.json | jq

Reviewing the daily security digest

# Manually trigger the findings aggregator
aws lambda invoke \
  --function-name security-auto-prod-findings-aggregator \
  --payload '{}' \
  aggregator-output.json

cat aggregator-output.json | jq

# Check the S3 reports
TODAY=$(date +%Y/%m/%d)
aws s3 ls s3://security-auto-prod-findings-ACCOUNT_ID/findings-summaries/${TODAY}/
aws s3 cp s3://security-auto-prod-findings-ACCOUNT_ID/findings-summaries/${TODAY}/LATEST.json - | jq

Troubleshooting

Lambda is not triggering on GuardDuty findings:

# Verify EventBridge rule is enabled
aws events describe-rule --name security-auto-prod-guardduty-ec2

# Check Lambda permissions
aws lambda get-policy --function-name security-auto-prod-isolate-ec2

# Test EventBridge rule with a sample event
aws events test-event-pattern \
  --event-pattern '{"source":["aws.guardduty"],"detail":{"type":[{"prefix":"UnauthorizedAccess:EC2/"}]}}' \
  --event '{"source":"aws.guardduty","detail":{"type":"UnauthorizedAccess:EC2/SSHBruteForce"}}'

Lambda is failing — check DLQ:

# Get DLQ URL
DLQ_URL=$(aws sqs get-queue-url --queue-name security-auto-prod-lambda-dlq --query QueueUrl --output text)

# Receive failed messages
aws sqs receive-message --queue-url ${DLQ_URL} --max-number-of-messages 10

Lambda cannot modify EC2 instance:

  • Verify the Lambda execution role has ec2:ModifyInstanceAttribute permission
  • Check if the instance has a resource-based policy blocking modifications
  • Verify the Lambda is in a VPC/region that can reach the EC2 API endpoint

GuardDuty not generating findings in test:

# Generate sample findings for all types
aws guardduty create-sample-findings \
  --detector-id DETECTOR_ID \
  --finding-types \
    "UnauthorizedAccess:EC2/SSHBruteForce" \
    "CryptoCurrency:EC2/BitcoinTool.B!DNS" \
    "UnauthorizedAccess:IAMUser/InstanceCredentialExfiltration.OutsideAWS"

Security Considerations

Principle of least privilege — Lambda IAM role: The Lambda execution role grants only the specific actions needed for each remediation type. For example, it allows ec2:ModifyInstanceAttribute (to change security groups) but not ec2:TerminateInstances (intentional — terminating an instance destroys forensic evidence). Review the IAM policy in main.tf before deploying.

Reserved concurrency as a safety control: reserved_concurrent_executions = 10 prevents automated runaway. If 1000 GuardDuty findings fire simultaneously (e.g., during a large-scale attack), only 10 Lambda invocations run concurrently. The rest queue. This prevents both AWS account concurrency exhaustion and inadvertent mass remediation.

Private IP safety check: The block-ip function refuses to block RFC 1918 addresses. A private source IP in a GuardDuty finding often indicates lateral movement (an attacker who already has a foothold in your network) — blocking it in the NACL would block legitimate internal traffic and potentially prevent your own incident response team from accessing affected systems.

Root credential handling: Root credentials cannot be disabled via API. The revoke-iam-keys function never attempts to call iam:UpdateAccessKey for root — doing so would throw an exception that could mask the critical alert. Instead, it sends a maximum-urgency SNS notification and explicitly lists manual investigation steps.

Audit trail integrity: Every DynamoDB audit record includes lambda_function (function name), timestamp, finding_id, actions_taken, and success. The table has point-in-time recovery enabled. Records have a 90-day TTL — long enough for compliance audits but not forever (cost control).


License

MIT

About

Event driven AWS security automation platform GuardDuty threat detection, Security Hub (CIS 1.4 + PCI DSS), Inspector v2, Macie, and IAM Access Analyzer feeding EventBridge rules that trigger Lambda auto remediation: EC2 isolation, IAM credential revocation, malicious IP blocking (WAFv2 + NACL), and S3 hardening. Full audit trail in DynamoDB.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors