AWS DVA-C02 Developer Associate Exam Guide
AWS DVA-C02 Developer Associate Exam Guide
> A complete study guide covering every DVA-C02 domain: compute, storage, databases, serverless, containers, CI/CD, monitoring, security, and more — with 100+ self-exam questions.
Take the Interactive Quiz →
Test your AWS knowledge with the interactive quiz covering all DVA-C02 topics.
Table of Contents
| Section | Topics |
|---------|--------|
| AWS Global Infrastructure | Regions, AZs, Edge Locations |
| IAM | Users, Groups, Policies, Roles |
| EC2 | Instance Types, Purchasing, Security Groups |
| AMI | Custom AMIs, Cross-region |
| EBS | Volume Types, Snapshots, Multi-Attach |
| EFS | Performance Modes, Storage Tiers |
| Storage Comparison | EBS vs EFS vs Instance Store |
| ELB & ASG | ALB, NLB, GLB, Health Checks, Scaling |
| RDS & Aurora | Read Replicas, Multi-AZ, Aurora, Proxy |
| ElastiCache | Redis vs Memcached, Caching Strategies |
| S3 | Storage Classes, Security, Replication, Performance |
| Lambda | Invocations, Concurrency, Layers, VPC |
| API Gateway | Endpoints, Integrations, Security, Caching |
| DynamoDB | Capacity, Indexes, Streams, DAX |
| SQS | Standard/FIFO, Visibility, DLQ |
| SNS | Fan-out, Filtering, FIFO |
| Kinesis | Streams, Firehose, Analytics |
| Step Functions | State Machines, Workflows |
| Containers | ECS, Fargate, ECR |
| CloudFormation | Templates, Functions, Stacks |
| SAM | Serverless Templates, CLI |
| CI/CD | CodeCommit, CodeBuild, CodeDeploy, CodePipeline |
| CloudWatch | Metrics, Logs, Alarms |
| X-Ray | Distributed Tracing, Sampling |
| Cognito | User Pools, Identity Pools |
| KMS | Encryption, Key Types, Envelope Encryption |
| Secrets & Parameters | Secrets Manager, SSM Parameter Store |
| EventBridge | Event Bus, Rules, Targets |
| Elastic Beanstalk | Deployment Policies, Extensions |
| Self-Exam Questions | 100+ questions across all DVA-C02 topics |
AWS Global Infrastructure
Regions
A Region is a geographic area with multiple data centers (e.g., us-east-1, eu-west-2).
| Consideration | Description |
|---------------|-------------|
| Compliance | Data may need to stay in specific countries |
| Latency | Deploy closer to users for better performance |
| Service availability | Not all services available in all regions |
| Pricing | Varies by region |
Availability Zones (AZs)
Each Region has 2-6 AZs (e.g., us-east-1a, us-east-1b). Each AZ = one or more discrete data centers with independent power, networking, and connectivity.
| Key Point | Detail |
|-----------|--------|
| Isolation | AZs are physically separated (disaster protection) |
| Low latency | Connected via high-bandwidth, low-latency networking |
| HA design | Distribute resources across AZs for fault tolerance |
Edge Locations & Global Services
| Concept | Description |
|---------|-------------|
| Edge Locations | CDN endpoints for CloudFront (200+ worldwide) |
| Global services | IAM, Route 53, CloudFront, WAF (not region-specific) |
| Regional services | EC2, RDS, EBS, etc. (bound to a region) |
> 💡 Exam tip: Know which services are global vs regional. IAM is global; EC2 and EBS are regional; EBS is AZ-specific.
IAM (Identity & Access Management)
> Global service — not region-specific
Core Concepts
| Concept | Description |
|---------|-------------|
| Users | Individual identities, can belong to multiple groups |
| Groups | Collections of users (cannot nest groups) |
| Policies | JSON documents defining permissions |
| Inline Policy | Policy attached directly to a user (no group needed) |
Policy Structure
{
"Version": "2012-10-17", // Policy language version
"Statement": [{
"Sid": "StatementId", // Optional identifier
"Effect": "Allow|Deny",
"Principal": "arn:...", // Account/user/role this applies to
"Action": ["s3:Get*"], // API actions
"Resource": ["arn:..."] // Target resources
}]
}
Roles
Security Tools
| Tool | Purpose |
|------|---------|
| Credentials Report | CSV of all users + credential status |
| Access Advisor | Shows service access history per user |
EC2 (Elastic Compute Cloud)
EC2 encompasses: Instances • EBS (drives) • ELB (load balancing) • ASG (auto-scaling)
Configuration Options
User Data — Bootstrap script that runs at launch with root privileges. Use for updates, software installation, config.
Security Groups
Acts as a firewall for EC2 instances.
| Rule Type | Default | Description |
|-----------|---------|-------------|
| Inbound | Blocked | Controls incoming traffic |
| Outbound | Allowed | Controls outgoing traffic |
> ⚠️ Timeout = Security Group issue — If you can't connect (SSH/HTTP/HTTPS), check SG first
Key points:
Common Ports
| Port | Protocol |
|------|----------|
| 22 | SSH / SFTP |
| 21 | FTP |
| 80 | HTTP |
| 443 | HTTPS |
| 3389 | RDP (Windows) |
Instance Types
Naming: m5.2xlarge → m (class) + 5 (generation) + 2xlarge (size)
| Type | Prefix | Use Case |
|------|--------|----------|
| General Purpose | t3, m5 | Balanced workloads |
| Compute Optimized | c5 | Batch processing, high-performance computing |
| Memory Optimized | r5 | In-memory databases, caching |
| Storage Optimized | i3 | High IOPS, data warehousing |
| GPU | p3 | ML/AI, graphics |
| FPGA | f1 | Custom hardware acceleration |
Purchasing Options
| Option | Discount | Commitment | Best For |
|--------|----------|------------|----------|
| On-Demand | None | None | Short, unpredictable workloads |
| Reserved | Up to 72% | 1-3 years | Steady-state workloads |
| Savings Plan | Up to 72% | $/hour for 1-3 years | Flexible long workloads |
| Spot | Up to 90% | None (can be interrupted) | Batch jobs, fault-tolerant |
| Dedicated Host | Varies | Optional 1-3 year | Licensing, compliance |
| Dedicated Instance | Varies | None | Compliance, isolation |
| Capacity Reservation | None | Pay regardless of use | Guaranteed availability |
On-Demand: Linux/Windows billed per second (after 1st min), other OS per hour
Reserved Instances: Reserve specific attributes (type, region, OS). Pay upfront = more discount. Can sell on AWS Marketplace.
Savings Plan: Commit to $/hour spend, locked to instance family + region (e.g., m5 in us-east-1). Excess usage = On-Demand pricing.
Spot: Cheapest option. Lose instance when spot price > your bid. Never use for critical workloads.
Dedicated Host vs Instance:
AMI (Amazon Machine Image)
> Region-specific — must copy to use in another region
AMI = Pre-configured EC2 template (OS + software + config)
| AMI Type | Description |
|----------|-------------|
| Public | AWS-provided (Amazon Linux, Ubuntu, etc.) |
| AWS Marketplace | Third-party, often pre-configured software |
| Custom | Your own, built from an EC2 instance |
Creating a Custom AMI:
1. Launch EC2 → configure/install software
2. Stop instance (for data integrity)
3. Create AMI → creates EBS snapshots automatically
4. Launch new instances from your AMI
> 💡 AMIs speed up boot time since software is pre-baked, not installed via User Data
EBS (Elastic Block Store)
Network-attached storage for EC2 — like a USB stick over the network.
Key characteristics:
EBS Snapshots
Backup mechanism for EBS volumes — can restore to any AZ.
| Feature | Description |
|---------|-------------|
| Cross-AZ restore | Snapshot in us-east-1a → restore in us-east-1b |
| Recycle Bin | Deleted snapshots recoverable (configurable retention) |
| Fast Snapshot Restore | No latency on first use, but expensive |
EC2 Instance Store
Hardware-attached storage (physically on the host) — not network-based.
| Pros | Cons |
|------|------|
| Extremely high IOPS (millions) | Ephemeral — data lost on stop/terminate/hardware failure |
| Low latency (direct attached) | Cannot detach and reattach |
| Included in instance cost | Size tied to instance type |
Use cases: Buffer, cache, scratch data, temporary content
> ⚠️ You are responsible for backups/replication — AWS won't recover this data
Delete on Termination
| Volume | Default Behavior |
|--------|------------------|
| Root EBS | Deleted on termination |
| Additional EBS | Preserved on termination |
Can be changed via console or CLI at launch time.
EBS Volume Types
| Type | Category | IOPS | Throughput | Size | Boot? |
|------|----------|------|------------|------|-------|
| gp3 | General SSD | 3,000–16,000 | 125–1,000 MiB/s | 1 GiB–16 TiB | ✅ |
| gp2 | General SSD | 3 IOPS/GiB (max 16,000) | Linked to IOPS | 1 GiB–16 TiB | ✅ |
| io2 Block Express | Provisioned IOPS | Up to 256,000 | 4,000 MiB/s | 4 GiB–64 TiB | ✅ |
| io1 | Provisioned IOPS | Up to 64,000 | 1,000 MiB/s | 4 GiB–16 TiB | ✅ |
| st1 | Throughput HDD | Max 500 | 500 MiB/s | 125 GiB–16 TiB | ❌ |
| sc1 | Cold HDD | Max 250 | 250 MiB/s | 125 GiB–16 TiB | ❌ |
> 💡 Only SSD types (gp2/gp3/io1/io2) can be boot volumes
gp3 vs gp2: gp3 allows independent IOPS/throughput scaling; gp2 links IOPS to size
Provisioned IOPS (io1/io2): For sustained IOPS needs — databases, critical apps. io2 Block Express offers sub-millisecond latency.
EBS Multi-Attach
EFS (Elastic File System)
Managed NFS that can be mounted on multiple EC2 across multiple AZs.
| Feature | Value |
|---------|-------|
| Compatibility | Linux only (POSIX) |
| Scaling | Automatic, up to petabytes |
| Throughput | Up to 10+ GB/s |
| Pricing | Pay per GB used |
Performance Modes
| Mode | Use Case |
|------|----------|
| General Purpose | Latency-sensitive (web servers, CMS) |
| Max I/O | Higher latency, highly parallel (big data) |
Throughput Modes
| Mode | Description |
|------|-------------|
| Bursting | Scales with storage size |
| Provisioned | Fixed throughput regardless of size |
| Elastic | Auto-scales based on workload (recommended) |
Storage Tiers
| Tier | Cost | Access |
|------|------|--------|
| Standard | Higher | Frequent |
| Infrequent Access (IA) | Lower storage, pay per retrieval | Occasional |
| Archive | ~50% cheaper | Rare |
> 💡 Use lifecycle policies to auto-move files between tiers
Availability
| Option | Description |
|--------|-------------|
| Standard (Multi-AZ) | Production, HA |
| One Zone | Dev/backup, cheaper, single AZ |
EBS vs EFS vs Instance Store
| Feature | EBS | EFS | Instance Store |
|---------|-----|-----|----------------|
| Attach to | 1 instance (io1/io2: multi) | 100s of instances | 1 instance |
| AZ scope | Single AZ | Multi-AZ | Single AZ |
| Persistence | Persists | Persists | Ephemeral |
| Use case | Boot volumes, databases | Shared content, web serving | Cache, temp data |
| Cost | Per provisioned GB | Per used GB | Included |
ELB & ASG (Load Balancing & Auto Scaling)
> Terminology: ELB (Elastic Load Balancing) is the service name, not a load balancer type. The actual LB types are ALB, NLB, GLB, and CLB.
OSI Model Quick Reference
| Layer | Name | Protocol/Example | AWS LB |
|-------|------|------------------|--------|
| 7 | Application | HTTP, HTTPS, WebSocket | ALB |
| 4 | Transport | TCP, UDP, TLS | NLB |
| 3 | Network | IP, ICMP | GLB |
| 2 | Data Link | Ethernet, MAC | — |
| 1 | Physical | Cables, signals | — |
Load Balancer Types
| Type | Layer | Protocols | Use Case |
|------|-------|-----------|----------|
| ALB | 7 | HTTP, HTTPS, WebSocket | Web apps, microservices |
| NLB | 4 | TCP, UDP, TLS | Extreme performance, static IP |
| GLB | 3 | IP (GENEVE) | Firewalls, packet inspection |
| CLB | 4/7 | HTTP, HTTPS, TCP, SSL | Legacy (avoid) |
> ⚠️ CLB = Classic Load Balancer, sometimes called "Classic ELB" — adds to the ELB naming confusion. Avoid for new projects.
ELB Health Checks
LB periodically pings targets to verify they're healthy.
| Setting | Description |
|---------|-------------|
| Protocol | HTTP, HTTPS, TCP |
| Path | e.g., /health (HTTP/HTTPS only) |
| Interval | Time between checks (default: 30s) |
| Threshold | Consecutive successes/failures to change state |
| Timeout | Time to wait for response |
> ⚠️ ELB does NOT terminate unhealthy targets — it only stops routing traffic to them
ASG + ELB Health Checks
ASG can use ELB health status to decide when to terminate/replace instances.
| Health Check Type | Default | Termination Trigger |
|-------------------|---------|---------------------|
| EC2 | ✅ Yes | Instance stopped, impaired, or terminated |
| ELB | ❌ No | Target fails LB health check |
> 💡 Enable ELB health checks on ASG for automatic replacement of app-level failures
Application Load Balancer (ALB)
Layer 7 (HTTP) — routes to target groups:
| Target Type | Example |
|-------------|--------|
| EC2 instances | i-0123... |
| Lambda functions | my-function |
| Private IPs | On-prem servers |
Routing rules based on:
/api/, /images/)api.example.com)?platform=mobile)Key points:
X-Forwarded-For headerNetwork Load Balancer (NLB)
Layer 4 (TCP/UDP) — highest performance LB.
| Feature | Value |
|---------|-------|
| Performance | Millions of requests/sec |
| Latency | ~100ms (vs ~400ms ALB) |
| Static IP | One per AZ |
Target groups: EC2 instances, Private IPs, ALB (NLB → ALB combo)
NLB provides:
> 💡 When to use NLB: Gaming servers, IoT backends, financial trading platforms — anywhere you need ultra-low latency, millions of requests/sec, or must whitelist a static IP for clients/firewalls.
ALB vs NLB Routing
| Routing By | ALB | NLB |
|------------|-----|-----|
| URL path | ✅ | ❌ |
| Hostname | ✅ | ❌ |
| Query strings | ✅ | ❌ |
| HTTP headers | ✅ | ❌ |
| Port | ✅ | ✅ |
> NLB = Layer 4 (sees packets, not HTTP). ALB = Layer 7 (sees HTTP content). Content-based routing → ALB. Static IP + performance → NLB.
Gateway Load Balancer (GLB)
Layer 3 (IP) — for network appliances (firewalls, IDS, packet inspection).
Flow: Traffic → GLB → Security appliances → GLB → Your app
| Feature | Detail |
|---------|--------|
| Protocol | GENEVE (port 6081) |
| Use case | Third-party virtual appliances |
| Layer | 3 (Network) |
> GENEVE encapsulates packets in UDP for cross-host VM/container communication
Sticky Sessions (Session Affinity)
Same client always routed to same target instance.
| Cookie Type | Who Creates | Cookie Name |
|-------------|-------------|-------------|
| Duration-based | ALB | AWSALB (reserved) |
| Application-based (LB) | ALB | AWSALBAPP (reserved) |
| Application-based (App) | Your app | Custom (e.g., SESSIONID) |
> ⚠️ AWSALB* names are AWS-reserved — cannot be used by your app
> 💡 Use for stateful apps; avoid if possible (prefer stateless + external session store)
Cross-Zone Load Balancing
Distributes traffic evenly across all targets in all AZs, regardless of AZ distribution.
| LB Type | Default | Cost |
|---------|---------|------|
| ALB | Enabled | Free |
| NLB | Disabled | Charged |
| GLB | Disabled | Charged |
> Without cross-zone: If AZ-1 has 2 instances and AZ-2 has 8, each AZ gets 50% of traffic (unfair distribution)
SSL/TLS & SNI
SSL Termination: LB decrypts HTTPS traffic, forwards HTTP to targets (offloads CPU from instances).
| Concept | Description |
|---------|-------------|
| SSL Certificate | Loaded on LB via ACM (AWS Certificate Manager) |
| SNI (Server Name Indication) | Allows multiple SSL certs on one LB — client indicates hostname, LB selects correct cert |
SNI Support:
> 💡 Use ACM for free, auto-renewing public certificates
Connection Draining / Deregistration Delay
Time allowed for in-flight requests to complete when a target is deregistering or unhealthy.
| Setting | Default | Range |
|---------|---------|-------|
| Deregistration Delay | 300s (5 min) | 0–3600s |
> 💡 Set to 0 for short-lived requests; increase for long uploads/connections
Auto Scaling Group (ASG)
Automatically adjusts EC2 capacity to match demand. ASG is free — you pay only for instances.
Capacity Settings
| Setting | Description |
|---------|-------------|
| Minimum | Never go below this |
| Desired | Target number of instances |
| Maximum | Never exceed this |
Launch Template
Defines what to launch:
| Setting | Example |
|---------|---------|
| AMI | ami-0123456789 |
| Instance Type | t3.micro |
| IAM Role | MyEC2Role |
| Security Groups | sg-web |
| User Data | Bootstrap script |
| Key Pair | my-key |
| EBS Volumes | gp3, 20 GiB |
> 💡 CloudWatch alarms can trigger ASG scale-out/in based on metrics (CPU, RAM, custom)
ASG Scaling Policies
| Policy | Description | Example |
|--------|-------------|---------|
| Target Tracking | Maintain a target metric value | Keep avg CPU at 40% |
| Step Scaling | Scale based on threshold ranges | CPU > 70% → +2, CPU < 30% → -1 |
| Scheduled | Scale at specific times | Add 3 instances every Friday 5PM |
| Predictive | ML-based forecasting | Pre-scale for predicted daily peaks |
Scaling Metrics
| Metric | Best For |
|--------|----------|
| CPUUtilization | Compute-bound apps |
| RequestCountPerTarget | Web servers behind ALB |
| NetworkIn/Out | Network-bound apps |
| Custom (CloudWatch) | App-specific (queue depth, etc.) |
ASG Cooldown
Prevents rapid successive scaling actions. Default: 300 seconds.
> 💡 Use shorter cooldown with faster-booting AMIs; longer for slow startup apps
ASG Instance Refresh
Rolling update when you change Launch Template — replaces instances gradually.
| Setting | Description |
|---------|-------------|
| Min Healthy % | % of instances that must stay running (e.g., 90%) |
| Warm-up | Time before new instance counts as healthy |
> 💡 Enables zero-downtime deployments for Launch Template changes
RDS (Relational Database Service)
Managed relational database — AWS handles patching, backups, scaling, HA, monitoring.
| Feature | Included |
|---------|----------|
| OS/DB patching | ✅ |
| Automated backups | ✅ |
| Multi-AZ failover | ✅ |
| Read replicas | ✅ (up to 15) |
| Encryption (at-rest & in-flight) | ✅ |
| Performance Insights | ✅ |
It supports MySQL, Postgres, Oracle, MariaDB, MS SQL Server, Aurora.
> ⚠️ No SSH access to the underlying instance
Storage Auto Scaling
RDS automatically increases storage when running low. Set MaxStorageThreshold to cap it.
Read Replicas vs Multi-AZ
| Feature | Read Replicas | Multi-AZ |
|---------|---------------|----------|
| Purpose | Read scaling | Disaster recovery |
| Replication | ASYNC (eventually consistent) | SYNC (immediate) |
| Readable? | ✅ Yes | ❌ Standby only |
| Cross-region? | ✅ Yes (with cost) | ❌ Same region |
| Failover | Manual (promote to standalone) | Automatic |
| Max count | 15 | 1 standby |
Cost: Same-region RR replication = free. Cross-region = network charges.
Multi-AZ setup: Enable in console → snapshot taken → restored to standby AZ → sync begins. Zero downtime.
> 💡 Read replicas can also be Multi-AZ (common exam question)
Amazon Aurora
AWS-built relational DB, compatible with MySQL and PostgreSQL.
| Feature | Value |
|---------|-------|
| Performance | 5x MySQL, 3x PostgreSQL |
| Storage | Auto-scales 10 GB → 128 TiB |
| Replicas | Up to 15 (faster replication than RDS) |
| Failover | < 30 seconds |
| Copies | 6 copies across 3 AZs |
| Cost | ~20% more than RDS |
Self-healing: Corrupted data blocks repaired via peer-to-peer replication.
Aurora Endpoints
| Endpoint | Purpose |
|----------|--------|
| Writer Endpoint | Always points to current master (for writes) |
| Reader Endpoint | Load-balanced across all read replicas |
| Custom Endpoint | Route to specific subset of instances |
> 💡 Use Writer for writes, Reader for reads — endpoints auto-update on failover
RDS & Aurora Security
| Layer | Implementation |
|-------|----------------|
| At-rest encryption | KMS key at launch (encrypts master + replicas + snapshots) |
| In-flight encryption | TLS by default (use AWS TLS root certs) |
| Authentication | Username/password OR IAM DB authentication |
| Network | Security groups control access |
> To encrypt an unencrypted DB: snapshot → copy with encryption → restore
RDS Proxy
Serverless connection pooler in front of RDS/Aurora.
| Benefit | Description |
|---------|-------------|
| Connection pooling | Reduces DB load from many connections |
| Failover | Reduces failover time by 66% |
| IAM auth | Enforce IAM authentication |
| VPC only | Never publicly accessible |
> 💡 Great for Lambda → RDS (Lambda opens many short-lived connections)
AWS ElastiCache
Managed in-memory caching — Redis or Memcached.
Redis vs Memcached
| Feature | Redis | Memcached |
|---------|-------|----------|
| Multi-AZ | ✅ | ❌ |
| Auto Failover | ✅ | ❌ |
| Replication | ✅ | ❌ |
| Persistence | ✅ | ❌ |
| Backup & Restore | ✅ | ✅ |
| Data structures | Complex (lists, sets, sorted sets) | Simple key-value |
| Sharding | Cluster mode | Multi-node |
> 💡 Exam tip: Use Redis for HA, persistence, complex data. Use Memcached for simple caching, multi-threaded, horizontal scaling.
Caching Considerations
| Question | Consider |
|----------|----------|
| Safe to cache? | What if stale data causes security/business issues? |
| Effective? | Best for slow-changing, frequently-read data |
| Structure fit? | Key-value lookups work best; complex joins may not |
| TTL strategy? | How long before data expires? |
Caching Design Patterns
Lazy Loading (Cache-Aside)
App → Cache (miss?) → DB → Cache → App
| Pros | Cons |
|------|------|
| Only requested data cached | Cache miss = 3 network calls |
| Node failure not fatal | Stale data possible |
| Simple to implement | Must handle cache invalidation |
Write-Through
App → DB + Cache (write both)
| Pros | Cons |
|------|------|
| Cache always current | Write penalty (2 writes) |
| No stale data | Cache churn (data may never be read) |
| | Missing data until first write |
> 💡 Combine Write-Through + Lazy Loading for best results
TTL (Time-To-Live)
Set expiration on cached items. Balance between:
Write-Behind (Write-Back)
App → Cache → (async) → DB
| Pros | Cons |
|------|------|
| Fast writes (async to DB) | Data loss risk if cache fails |
| Reduces DB load | Complex to implement |
| Good for write-heavy workloads | Eventually consistent |
Read-Through
App → Cache (auto-fetches from DB on miss)
Cache sits between app and DB. On miss, cache itself fetches from DB and stores. Simpler app logic, but requires cache to understand DB.
ElastiCache Use Cases
| Use Case | Pattern |
|----------|--------|
| Session storage | Redis with TTL |
| Database query caching | Lazy Loading + TTL |
| Real-time leaderboards | Redis Sorted Sets |
| Pub/Sub messaging | Redis Pub/Sub |
| Rate limiting | Redis counters with TTL |
S3 (Simple Storage Service)
Object storage with unlimited storage, highly durable (99.999999999% — 11 9s).
Key Concepts
| Concept | Description |
|---------|-------------|
| Bucket | Container for objects, globally unique name |
| Object | File + metadata, identified by key (full path) |
| Key | Full path including "folders" (e.g., images/2024/photo.jpg) |
| Max object size | 5 TB (use multipart upload for >100 MB, required >5 GB) |
Storage Classes
| Class | Durability | Availability | Use Case |
|-------|------------|--------------|----------|
| S3 Standard | 11 9s | 99.99% | Frequently accessed data |
| S3 Intelligent-Tiering | 11 9s | 99.9% | Unknown/changing access patterns |
| S3 Standard-IA | 11 9s | 99.9% | Infrequent access, rapid retrieval |
| S3 One Zone-IA | 11 9s | 99.5% | Infrequent, non-critical, reproducible |
| S3 Glacier Instant | 11 9s | 99.9% | Archive, millisecond retrieval |
| S3 Glacier Flexible | 11 9s | 99.99% | Archive, minutes to hours retrieval |
| S3 Glacier Deep Archive | 11 9s | 99.99% | Long-term archive, 12-48 hour retrieval |
> 💡 Use Lifecycle Policies to automatically transition objects between classes
S3 Security
| Layer | Mechanism |
|-------|-----------|
| User-based | IAM policies |
| Resource-based | Bucket policies (JSON), Object ACLs, Bucket ACLs |
| Encryption | SSE-S3, SSE-KMS, SSE-C, client-side |
Bucket Policy Structure:
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::my-bucket/*"
}]
}
S3 Encryption
| Type | Key Management | Use Case |
|------|----------------|----------|
| SSE-S3 | AWS managed | Default encryption |
| SSE-KMS | KMS key | Audit trail, fine control |
| SSE-C | Customer-provided | Full key control |
| Client-side | Encrypt before upload | Maximum control |
> 💡 SSE-KMS has API call limits (quota). For high throughput, consider SSE-S3.
S3 Versioning
null version ID for objects uploaded before versioningS3 Replication
| Type | Description |
|------|-------------|
| CRR (Cross-Region) | Compliance, lower latency, replication across accounts |
| SRR (Same-Region) | Log aggregation, live replication between prod/test |
Requirements: Versioning enabled on both buckets, proper IAM permissions
> ⚠️ Only new objects replicated after enabling. Use S3 Batch Replication for existing objects.
S3 Event Notifications
Trigger actions on bucket events (PUT, DELETE, etc.):
| Target | Use Case |
|--------|----------|
| SNS | Fan-out to multiple subscribers |
| SQS | Queue for processing |
| Lambda | Real-time processing |
| EventBridge | Advanced filtering, multiple destinations |
S3 Performance
| Feature | Description |
|---------|-------------|
| Multi-part upload | Parallelize uploads, recommended >100 MB |
| Transfer Acceleration | Use CloudFront edge locations for faster uploads |
| Byte-range fetches | Parallelize downloads by requesting byte ranges |
| S3 Select / Glacier Select | Retrieve subset of data using SQL |
Baseline: 3,500 PUT/COPY/POST/DELETE and 5,500 GET/HEAD requests per second per prefix.
S3 Pre-signed URLs
Temporary access to private objects without changing bucket policy.
aws s3 presign s3://bucket/object --expires-in 3600
| Parameter | Description |
|-----------|-------------|
| Expires | 1 second to 7 days (default: 1 hour) |
| Permissions | Inherits permissions of the user who generated it |
AWS Lambda
Serverless compute — run code without managing servers.
Key Limits
| Limit | Value |
|-------|-------|
| Memory | 128 MB – 10,240 MB (10 GB) |
| Timeout | Max 15 minutes (900 seconds) |
| Environment variables | 4 KB total |
| /tmp storage | 512 MB – 10,240 MB |
| Deployment package | 50 MB zipped, 250 MB unzipped |
| Concurrent executions | 1,000 default (can increase) |
| Layers | Up to 5 per function |
> 💡 CPU scales proportionally with memory. More memory = more CPU = faster execution.
Lambda Invocation Types
| Type | Behavior | Retries | Examples |
|------|----------|---------|----------|
| Synchronous | Caller waits for response | None (caller handles) | API Gateway, SDK |
| Asynchronous | Fire and forget | 2 retries (3 total) | S3, SNS, EventBridge |
| Event Source Mapping | Lambda polls source | Depends on source | SQS, Kinesis, DynamoDB Streams |
Asynchronous Invocation
Event → Lambda (internal queue) → [Retry 1] → [Retry 2] → DLQ/Destination
| Setting | Description |
|---------|-------------|
| Retries | 2 retries with exponential backoff |
| DLQ | Dead Letter Queue (SQS or SNS) for failed events |
| Destinations | Route success/failure to SQS, SNS, Lambda, or EventBridge |
> 💡 Destinations are preferred over DLQ — more features, supports success events
Event Source Mapping
Lambda polls from:
| Source | Behavior |
|--------|----------|
| SQS | Batch processing, long polling |
| SQS FIFO | Lambda scales to # of message groups |
| Kinesis/DynamoDB Streams | Process in order per shard |
Error Handling:
Lambda in VPC
By default, Lambda runs in AWS-managed VPC (has internet). To access private resources:
1. Configure VPC, subnets, security groups
2. Lambda creates ENIs in your subnets
3. Use NAT Gateway for internet access from private subnet
> ⚠️ Lambda in VPC has no internet unless you have NAT Gateway
Lambda Concurrency
| Type | Description |
|------|-------------|
| Unreserved | Shared pool, up to account limit |
| Reserved | Guaranteed minimum for a function |
| Provisioned | Pre-initialized instances, no cold start |
Cold Start: First invocation initializes execution environment (can add seconds). Provisioned concurrency eliminates cold starts.
Lambda Layers
Share code/dependencies across functions:
Function → Layer 1 (libs) → Layer 2 (common code)
Lambda@Edge / CloudFront Functions
| Type | Location | Max Duration | Use Case |
|------|----------|--------------|----------|
| CloudFront Functions | Edge locations | < 1 ms | Simple request/response manipulation |
| Lambda@Edge | Regional edge cache | 5-30 seconds | Complex logic, external calls |
API Gateway
Managed API service — create, publish, secure, and monitor APIs.
Endpoint Types
| Type | Description |
|------|-------------|
| Edge-optimized | Routed through CloudFront (default) |
| Regional | For clients in same region |
| Private | Accessible only from VPC via VPC endpoint |
API Types
| Type | Features | Cost |
|------|----------|------|
| REST API | Full features (caching, API keys, usage plans, request validation) | Higher |
| HTTP API | Simpler, faster, JWT auth only | ~70% cheaper |
| WebSocket API | Real-time two-way communication | Per message |
Integration Types
| Type | Description |
|------|-------------|
| Lambda Proxy | Request passed as-is to Lambda, Lambda returns full response |
| Lambda Custom | Transform request/response with mapping templates |
| HTTP Proxy | Pass through to HTTP endpoint |
| HTTP Custom | Transform with mapping templates |
| AWS Service | Direct integration with AWS services |
| Mock | Return response without backend |
> 💡 Lambda Proxy is most common — simplest setup, Lambda controls response format
API Gateway Security
| Method | Description |
|--------|-------------|
| IAM | AWS Sig v4, good for internal/AWS clients |
| Lambda Authorizer | Custom auth logic (JWT, OAuth, etc.) |
| Cognito User Pools | JWT validation with Cognito |
| API Keys + Usage Plans | Rate limiting per client |
Stages and Deployment
| Concept | Description |
|---------|-------------|
| Stage | Named reference to deployment (dev, prod, v1) |
| Stage Variables | Key-value pairs, like environment variables |
| Canary Deployment | Route % of traffic to new deployment |
Throttling
| Limit | Value |
|-------|-------|
| Account limit | 10,000 requests/second |
| Per-stage limit | Configurable |
| Per-client (Usage Plans) | API key-based throttling |
> 429 Too Many Requests when throttled. Client should retry with exponential backoff.
Caching
> 💡 Reduce backend calls, improve latency. Invalidate with Cache-Control: max-age=0 header.
DynamoDB
Fully managed NoSQL database — millisecond latency at any scale.
Core Concepts
| Concept | Description |
|---------|-------------|
| Table | Collection of items |
| Item | Row (max 400 KB) |
| Attribute | Column (nested up to 32 levels) |
| Primary Key | Partition key (required) + optional sort key |
Primary Key Options
| Type | Components | Use Case |
|------|------------|----------|
| Partition key | Single attribute | Unique identifier |
| Composite | Partition + Sort key | One-to-many relationships |
> 💡 Choose partition key with high cardinality for even distribution
Capacity Modes
| Mode | Description | Use Case |
|------|-------------|----------|
| Provisioned | Set RCU/WCU, auto-scaling available | Predictable workloads |
| On-Demand | Pay per request | Unpredictable, new tables |
Throughput units:
| Unit | Capacity |
|------|----------|
| 1 RCU | 1 strongly consistent read/sec (4 KB) OR 2 eventually consistent |
| 1 WCU | 1 write/sec (1 KB) |
Read Consistency
| Type | Description |
|------|-------------|
| Eventually consistent | Default, might return stale data |
| Strongly consistent | Returns most recent data, uses 2x RCU |
Secondary Indexes
| Type | Partition Key | Sort Key | When Created | Throughput |
|------|---------------|----------|--------------|------------|
| LSI | Same as table | Different | Table creation only | Shares table's |
| GSI | Different | Different | Anytime | Separate (provision separately) |
> ⚠️ GSI throttling can throttle main table writes. Provision GSI capacity carefully.
DynamoDB Streams
Ordered stream of item modifications (insert, update, delete).
| View Type | Content |
|-----------|---------|
| KEYS_ONLY | Just the key attributes |
| NEW_IMAGE | Item after modification |
| OLD_IMAGE | Item before modification |
| NEW_AND_OLD_IMAGES | Both images |
Use cases: Trigger Lambda, replicate to other tables, analytics
DynamoDB Operations
| Operation | Description | Cost |
|-----------|-------------|------|
| GetItem | Single item by primary key | Uses RCU |
| Query | Items by partition key + optional sort key | Efficient, uses RCU |
| Scan | Entire table | Expensive, avoid in production |
| BatchGetItem | Up to 100 items | Parallel GetItem |
| BatchWriteItem | Up to 25 PutItem/DeleteItem | Parallel writes |
Conditional Writes
# Optimistic locking example
response = table.update_item(
Key={'pk': 'item1'},
UpdateExpression='SET #v = :newval, version = version + :inc',
ConditionExpression='version = :expectedVersion',
ExpressionAttributeValues={':expectedVersion': 1, ':newval': 'updated', ':inc': 1}
)
> 💡 Use for optimistic concurrency control — no locking overhead
DynamoDB Accelerator (DAX)
In-memory cache for DynamoDB — microsecond latency.
| Feature | Value |
|---------|-------|
| Latency | Microseconds (vs milliseconds) |
| Cache | Item cache + query cache |
| Compatibility | Drop-in replacement (same API) |
> Use case: Read-heavy workloads, hot keys
Global Tables
Multi-region, multi-active replication.
| Feature | Description |
|---------|-------------|
| Active-Active | Read/write in any region |
| Replication | Sub-second across regions |
| Requirement | DynamoDB Streams must be enabled |
TTL (Time-To-Live)
Auto-delete expired items (no WCU cost).
Set TTL attribute → Store expiry timestamp (epoch) → DynamoDB deletes after expiry
SQS (Simple Queue Service)
Fully managed message queue — decouple applications.
Queue Types
| Type | Throughput | Ordering | Delivery |
|------|------------|----------|----------|
| Standard | Unlimited | Best-effort | At-least-once |
| FIFO | 300 msg/s (3000 batched) | Strict | Exactly-once |
Key Settings
| Setting | Default | Description |
|---------|---------|-------------|
| Visibility Timeout | 30 seconds | Time message is hidden after receive |
| Message Retention | 4 days | Max: 14 days |
| Max Message Size | 256 KB | Use S3 for larger payloads |
| Delay Queue | 0 seconds | Delay before message is visible |
| Long Polling | Disabled | Wait for messages (reduces API calls) |
Visibility Timeout
Receive → Message hidden → Process → Delete
↓
(If timeout expires before delete)
↓
Message reappears in queue
> 💡 If processing takes longer than visibility timeout, call ChangeMessageVisibility
Dead Letter Queue (DLQ)
Messages that fail processing after maxReceiveCount go to DLQ.
| Setting | Description |
|---------|-------------|
| maxReceiveCount | # of receives before sending to DLQ |
| Redrive | Move DLQ messages back to main queue |
FIFO Queues
| Feature | Description |
|---------|-------------|
| MessageGroupId | Messages in same group processed in order |
| MessageDeduplicationId | Prevent duplicates within 5-minute window |
| Naming | Queue name must end with .fifo |
SQS + Lambda
Lambda polls SQS and processes batches:
| Setting | Description |
|---------|-------------|
| Batch size | 1-10 messages per invocation |
| Batch window | Time to wait for batch to fill |
| Concurrency | One invocation per message group (FIFO) |
SNS (Simple Notification Service)
Pub/sub messaging — push to multiple subscribers.
Subscribers
| Type | Use Case |
|------|----------|
| SQS | Queue for processing |
| Lambda | Serverless processing |
| HTTP/S | Webhook endpoints |
| Email/SMS | User notifications |
| Kinesis Data Firehose | Stream to S3, Redshift |
Fan-Out Pattern
Producer → SNS Topic → SQS Queue 1 → Consumer 1
→ SQS Queue 2 → Consumer 2
→ Lambda → Process
> 💡 Decouple, parallel processing, different consumption rates
Message Filtering
Filter messages per subscriber using filter policies:
{
"eventType": ["order_placed"],
"store": [{"prefix": "us-"}]
}
FIFO Topics
.fifoKinesis
Real-time streaming data at scale.
Kinesis Services
| Service | Purpose |
|---------|---------|
| Kinesis Data Streams | Collect and process real-time data |
| Kinesis Data Firehose | Load streams into AWS data stores |
| Kinesis Data Analytics | SQL/Flink analytics on streams |
| Kinesis Video Streams | Stream video for analytics |
Kinesis Data Streams
| Concept | Description |
|---------|-------------|
| Shard | Unit of capacity (1 MB/s in, 2 MB/s out) |
| Partition Key | Determines which shard receives record |
| Sequence Number | Unique ID per record within shard |
| Retention | 1-365 days (default: 24 hours) |
Capacity:
| Direction | Per Shard |
|-----------|-----------|
| Write | 1 MB/s or 1,000 records/s |
| Read | 2 MB/s (shared by all consumers) |
Consumer Types
| Type | Description |
|------|-------------|
| Shared | Multiple consumers share 2 MB/s per shard |
| Enhanced Fan-Out | 2 MB/s per consumer per shard (push model) |
Kinesis Data Firehose
Near real-time delivery (60-900 second buffer) to:
| Destination | Description |
|-------------|-------------|
| S3 | Most common |
| Redshift | Via S3 copy |
| OpenSearch | Search/analytics |
| HTTP endpoint | Custom destinations |
> 💡 Firehose = managed, auto-scaling, no capacity planning. Streams = more control, real-time.
Streams vs Firehose
| Feature | Data Streams | Data Firehose |
|---------|--------------|---------------|
| Latency | ~200 ms | 60-900 seconds |
| Capacity | Provision shards | Auto-scaling |
| Data retention | 1-365 days | No storage |
| Consumer | Custom (Lambda, apps) | Built-in destinations |
| Data transformation | External | Built-in Lambda |
Step Functions
Orchestrate Lambda functions and AWS services with visual workflows.
Key Concepts
| Concept | Description |
|---------|-------------|
| State Machine | Workflow definition (JSON/YAML) |
| State | Individual step in workflow |
| Execution | Running instance of state machine |
| Task | Unit of work (Lambda, AWS service, HTTP) |
State Types
| State | Description |
|-------|-------------|
| Task | Execute work (Lambda, AWS API) |
| Choice | Branch based on condition |
| Parallel | Execute branches in parallel |
| Map | Iterate over array |
| Wait | Delay execution |
| Pass | Pass input to output, inject data |
| Succeed/Fail | End execution |
Workflow Types
| Type | Max Duration | Pricing | Use Case |
|------|--------------|---------|----------|
| Standard | 1 year | Per state transition | Long-running, auditing |
| Express | 5 minutes | Per execution + duration | High-volume, event processing |
Error Handling
| Mechanism | Description |
|-----------|-------------|
| Retry | Retry failed states with backoff |
| Catch | Handle errors, transition to fallback |
"Retry": [{
"ErrorEquals": ["States.TaskFailed"],
"MaxAttempts": 3,
"IntervalSeconds": 1,
"BackoffRate": 2.0
}],
"Catch": [{
"ErrorEquals": ["States.ALL"],
"Next": "HandleError"
}]
Service Integrations
| Pattern | Description |
|---------|-------------|
| Request Response | Call service, wait for response |
| Run a Job (.sync) | Wait for job completion (Batch, ECS, Glue) |
| Wait for Callback | Pause until external callback (Human approval) |
ECS, Fargate & ECR (Containers)
ECS (Elastic Container Service)
Container orchestration on AWS.
| Launch Type | Description |
|-------------|-------------|
| EC2 | You manage EC2 instances, more control |
| Fargate | Serverless, AWS manages infrastructure |
ECS Concepts
| Concept | Description |
|---------|-------------|
| Task Definition | Blueprint for containers (image, CPU, memory, ports) |
| Task | Running instance of Task Definition |
| Service | Maintains desired count of tasks, load balancing |
| Cluster | Logical grouping of tasks/services |
Task Definition Settings
| Setting | Description |
|---------|-------------|
| Image | Docker image (from ECR or public) |
| CPU/Memory | Resource allocation |
| Port Mappings | Container port to host port |
| Environment | Variables, secrets from SSM/Secrets Manager |
| IAM Role | Task role (permissions for containers) |
| Logging | CloudWatch Logs integration |
Fargate
| Feature | Description |
|---------|-------------|
| Serverless | No EC2 management |
| Pricing | Per vCPU + memory per second |
| Scaling | Auto-scaling on CPU/memory metrics |
ECR (Elastic Container Registry)
Private Docker registry:
| Feature | Description |
|---------|-------------|
| Encryption | Images encrypted at rest |
| Scanning | Vulnerability scanning |
| Lifecycle Policies | Auto-delete old images |
| Cross-region | Replicate to other regions |
ECS IAM Roles
| Role | Purpose |
|------|---------|
| Task Execution Role | Pulls images from ECR, sends logs to CloudWatch |
| Task Role | Permissions for the application running in container |
> 💡 Task Role = what container can do. Execution Role = what ECS agent can do.
ECS + Load Balancing
| Feature | Description |
|---------|-------------|
| ALB | Dynamic port mapping, path-based routing |
| NLB | High throughput, static IP |
| Service Discovery | Route 53 DNS for service-to-service |
CloudFormation
Infrastructure as Code — define AWS resources in templates.
Template Structure
AWSTemplateFormatVersion: "2010-09-09"
Description: String
Parameters: # Input values
Resources: # AWS resources (REQUIRED)
Outputs: # Export values
Mappings: # Static variables
Conditions: # Conditional resource creation
Intrinsic Functions
| Function | Purpose | Example |
|----------|---------|---------|
| !Ref | Reference resource/parameter | !Ref MyBucket |
| !GetAtt | Get resource attribute | !GetAtt MyBucket.Arn |
| !Sub | String substitution | !Sub "arn:aws:s3:::${BucketName}" |
| !Join | Join strings | !Join ["-", [a, b, c]] → "a-b-c" |
| !If | Conditional value | !If [Prod, m5.large, t3.micro] |
| !ImportValue | Import from another stack | !ImportValue VPCId |
| !FindInMap | Lookup in Mappings | !FindInMap [RegionMap, !Ref 'AWS::Region', AMI] |
Pseudo Parameters
| Parameter | Value |
|-----------|-------|
| AWS::AccountId | Account ID |
| AWS::Region | Current region |
| AWS::StackName | Stack name |
| AWS::StackId | Stack ID |
| AWS::NoValue | Remove property conditionally |
Cross-Stack References
Stack A (export):
Outputs:
VPCId:
Value: !Ref MyVPC
Export:
Name: SharedVPC
Stack B (import):
VpcId: !ImportValue SharedVPC
Nested Stacks
Reusable components embedded in parent stack:
Resources:
NetworkStack:
Type: AWS::CloudFormation::Stack
Properties:
TemplateURL: https://s3.amazonaws.com/mybucket/network.yaml
> 💡 Nested = component reuse. Cross-stack = share values between independent stacks.
Change Sets
Preview changes before executing:
aws cloudformation create-change-set --stack-name MyStack --template-body file://template.yaml
aws cloudformation describe-change-set --change-set-name MyChangeSet
aws cloudformation execute-change-set --change-set-name MyChangeSet
Drift Detection
Detect if actual resources differ from template definition.
AWS SAM (Serverless Application Model)
Simplified CloudFormation for serverless.
SAM Template
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31 # SAM transform
Globals:
Function:
Timeout: 30
Resources:
MyFunction:
Type: AWS::Serverless::Function
Properties:
Handler: index.handler
Runtime: python3.9
CodeUri: ./src
Events:
Api:
Type: Api
Properties:
Path: /hello
Method: GET
SAM Resource Types
| Type | Creates |
|------|---------|
| AWS::Serverless::Function | Lambda + execution role |
| AWS::Serverless::Api | API Gateway REST API |
| AWS::Serverless::HttpApi | API Gateway HTTP API |
| AWS::Serverless::SimpleTable | DynamoDB table |
| AWS::Serverless::LayerVersion | Lambda Layer |
SAM CLI Commands
| Command | Description |
|---------|-------------|
| sam init | Initialize new project |
| sam build | Build and package |
| sam local invoke | Test locally |
| sam local start-api | Local API Gateway |
| sam deploy --guided | Interactive deployment |
| sam sync | Fast sync for development |
SAM Policy Templates
Built-in policies for common patterns:
Policies:
- S3ReadPolicy:
BucketName: !Ref MyBucket
- DynamoDBCrudPolicy:
TableName: !Ref MyTable
CI/CD: CodeCommit, CodeBuild, CodeDeploy, CodePipeline
CodeCommit
AWS Git repository hosting.
| Feature | Description |
|---------|-------------|
| Auth | HTTPS (Git credentials), SSH (keys), IAM roles |
| Triggers | Lambda, SNS on repository events |
| Notifications | CloudWatch Events/EventBridge |
CodeBuild
Managed build service — compile, test, produce artifacts.
buildspec.yml:
version: 0.2
phases:
install:
runtime-versions:
python: 3.9
pre_build:
commands:
- pip install -r requirements.txt
build:
commands:
- python -m pytest
- sam build
post_build:
commands:
- sam package --s3-bucket $BUCKET
artifacts:
files:
- template.yaml
- '**/*'
cache:
paths:
- '/root/.cache/pip/**/*'
| Section | Purpose |
|---------|---------|
| phases | install, pre_build, build, post_build |
| artifacts | Files to output |
| cache | Speed up builds |
| env | Environment variables |
CodeDeploy
Automated deployment to EC2, Lambda, ECS.
appspec.yml (EC2):
version: 0.0
os: linux
files:
- source: /
destination: /var/www/html
hooks:
BeforeInstall:
- location: scripts/install_dependencies.sh
AfterInstall:
- location: scripts/start_server.sh
Lifecycle Hooks (EC2):
ApplicationStop → DownloadBundle → BeforeInstall → Install → AfterInstall → ApplicationStart → ValidateService
Deployment Types
| Platform | Types | Description |
|----------|-------|-------------|
| EC2 | In-Place, Blue/Green | Rolling update or swap target groups |
| Lambda | AllAtOnce, Canary, Linear | Traffic shifting |
| ECS | Blue/Green | Traffic shifting with ALB |
Lambda deployment:
| Type | Description |
|------|-------------|
| AllAtOnce | Immediate shift to new version |
| Canary | x% for n minutes, then 100% |
| Linear | x% every n minutes |
CodePipeline
Orchestrate CI/CD workflow:
Source → Build → Test → Deploy
↓
[Manual Approval]
| Feature | Description |
|---------|-------------|
| Stages | Sequential groups of actions |
| Actions | Individual tasks (source, build, deploy) |
| Artifacts | Files passed between stages (stored in S3) |
| Manual Approval | Human gate between stages |
CloudWatch
Monitoring, logging, and alarms.
CloudWatch Metrics
| Concept | Description |
|---------|-------------|
| Namespace | Container for metrics (e.g., AWS/EC2) |
| Dimension | Attribute of metric (InstanceId, AutoScalingGroupName) |
| Resolution | Standard (1 min) or High-res (1 sec) |
| Custom Metrics | Your own metrics via PutMetricData API |
EC2 Default Metrics:
CloudWatch Alarms
| State | Description |
|-------|-------------|
| OK | Metric within threshold |
| ALARM | Metric breached threshold |
| INSUFFICIENT_DATA | Not enough data points |
Actions: SNS notification, Auto Scaling, EC2 actions (stop, terminate, reboot)
CloudWatch Logs
| Concept | Description |
|---------|-------------|
| Log Group | Collection of log streams (e.g., per application) |
| Log Stream | Sequence of events from same source |
| Retention | Never expire by default, configure 1 day to 10 years |
| Metric Filters | Extract metrics from log data |
| Subscription Filters | Stream logs to Lambda, Kinesis, OpenSearch |
CloudWatch Logs Insights
Query logs with SQL-like syntax:
fields @timestamp, @message
| filter @message like /ERROR/
| sort @timestamp desc
| limit 20
CloudWatch Agent
Install on EC2/on-premises for:
CloudWatch Container Insights
Monitoring for ECS, EKS, Kubernetes — metrics per container, task, service.
X-Ray
Distributed tracing for debugging and performance analysis.
Key Concepts
| Concept | Description |
|---------|-------------|
| Trace | End-to-end request journey |
| Segment | Work done by one service |
| Subsegment | Granular breakdown (HTTP calls, DB queries) |
| Annotations | Indexed key-value pairs (searchable) |
| Metadata | Non-indexed key-value pairs |
X-Ray Integration
| Service | Setup |
|---------|-------|
| Lambda | Enable active tracing |
| API Gateway | Enable tracing in stage settings |
| EC2/ECS | Install X-Ray daemon + SDK |
| Elastic Beanstalk | Extension configuration |
X-Ray Daemon
Runs on EC2/ECS, buffers and sends trace data to X-Ray API.
App (X-Ray SDK) → UDP port 2000 → X-Ray Daemon → X-Ray API
X-Ray Sampling
Control volume of requests traced:
| Setting | Description |
|---------|-------------|
| Reservoir | Fixed # requests per second traced |
| Rate | Percentage of additional requests traced |
Default: 1 request/sec + 5% additional
X-Ray APIs
| API | Used By |
|-----|---------|
| PutTraceSegments | App/SDK uploads segments |
| GetTraceSummaries | Get list of traces |
| BatchGetTraces | Get full trace details |
Cognito
User identity and access management.
Cognito User Pools (CUP)
Authentication — Sign-up, sign-in, returns JWT tokens.
| Feature | Description |
|---------|-------------|
| Sign-up/Sign-in | Email, phone, username |
| MFA | SMS, TOTP |
| Social login | Google, Facebook, SAML, OIDC |
| Hosted UI | Pre-built login pages |
| Triggers | Lambda on auth events |
JWT Tokens:
Cognito Identity Pools (Federated Identities)
Authorization — Exchange tokens for temporary AWS credentials.
[User] → [CUP/Social] → [ID Token] → [Identity Pool] → [Temp AWS Credentials]
| Feature | Description |
|---------|-------------|
| Federation | CUP, Google, Facebook, SAML, OpenID |
| IAM Roles | Map users to authenticated/unauthenticated roles |
| Fine-grained | Policy variables for row-level access |
User Pools vs Identity Pools
| Feature | User Pools | Identity Pools |
|---------|------------|----------------|
| Purpose | Authentication | Authorization |
| Returns | JWT tokens | AWS credentials |
| Use with | API Gateway, ALB | AWS SDK (S3, DynamoDB) |
KMS (Key Management Service)
Managed encryption keys.
Key Types
| Type | Managed By | Cost | Rotation |
|------|------------|------|----------|
| AWS Owned | AWS | Free | Varies |
| AWS Managed | AWS | Free | Auto yearly |
| Customer Managed | You | $/month + $/API call | Optional/yearly |
KMS API Operations
| API | Purpose |
|-----|---------|
| Encrypt | Encrypt data up to 4 KB |
| Decrypt | Decrypt data |
| GenerateDataKey | Returns plaintext + encrypted data key |
| GenerateDataKeyWithoutPlaintext | Returns only encrypted data key |
Envelope Encryption
For data > 4 KB:
1. GenerateDataKey → plaintext DEK + encrypted DEK
2. Encrypt data with plaintext DEK
3. Store encrypted DEK with encrypted data
4. Decrypt: Use KMS to decrypt DEK → use DEK to decrypt data
KMS Key Policies
| Policy Type | Description |
|-------------|-------------|
| Default | Created automatically, grants access to root user |
| Custom | Define who can access key, required for cross-account |
Encryption Context
Additional authenticated data for extra security:
kms.encrypt(
KeyId='alias/my-key',
Plaintext=data,
EncryptionContext={'department': 'engineering'}
)
> Decryption must include same encryption context
Secrets Manager & SSM Parameter Store
Secrets Manager
| Feature | Description |
|---------|-------------|
| Purpose | Store secrets (passwords, API keys, tokens) |
| Rotation | Automatic rotation with Lambda |
| Integration | RDS, Redshift, DocumentDB automatic rotation |
| Cost | Per secret + per API call |
SSM Parameter Store
| Feature | Description |
|---------|-------------|
| Purpose | Configuration and secrets |
| Types | String, StringList, SecureString (encrypted) |
| Hierarchy | /app/prod/db-connection |
| Cost | Free (standard) or paid (advanced) |
When to Use Which
| Use Case | Service |
|----------|---------|
| Secrets with rotation | Secrets Manager |
| RDS/database credentials | Secrets Manager |
| Configuration values | Parameter Store |
| Cost-sensitive | Parameter Store |
| Simple secrets without rotation | Parameter Store (SecureString) |
EventBridge
Serverless event bus — route events to targets.
Event Sources
| Source | Examples |
|--------|----------|
| AWS Services | EC2, S3, CodePipeline state changes |
| Custom Apps | Your applications via PutEvents API |
| SaaS Partners | Zendesk, Datadog, Auth0 |
| Scheduled | Cron expressions |
Event Rules
| Type | Description |
|------|-------------|
| Event Pattern | Match events by pattern (source, detail-type, etc.) |
| Schedule | Cron or rate expression |
Event Targets
Lambda, SQS, SNS, Step Functions, Kinesis, ECS Tasks, CodePipeline, EC2 Actions, API Gateway, EventBridge in another account/region...
Event Pattern Example
{
"source": ["aws.ec2"],
"detail-type": ["EC2 Instance State-change Notification"],
"detail": {
"state": ["stopped", "terminated"]
}
}
Schema Registry
Elastic Beanstalk
PaaS for deploying web applications.
Deployment Policies
| Policy | Downtime | Description |
|--------|----------|-------------|
| All at once | Yes | Fastest, brief outage |
| Rolling | No | Deploy batch by batch |
| Rolling with additional batch | No | Maintain capacity during deployment |
| Immutable | No | New ASG, swap when healthy |
| Blue/Green | No | Create new environment, swap URL |
Beanstalk Extensions
.ebextensions/*.config files customize environment:
option_settings:
aws:elasticbeanstalk:application:environment:
MY_ENV_VAR: value
packages:
yum:
git: []
container_commands:
01_migrate:
command: "python manage.py migrate"
leader_only: true
Lifecycle Policy
Limit stored application versions (max 1000):
Self-Exam Questions
Click to reveal answers. Includes key DVA-C02 topics beyond the notes above.
AWS Global Infrastructure
Is IAM a global or regional service?
> ✅ Global — IAM users, groups, roles, and policies are not region-specific.
Is EBS regional or AZ-specific?
> ✅ AZ-specific — EBS volumes are bound to a single Availability Zone.
How many AZs does a Region typically have?
> ✅ 2-6 AZs per Region.
IAM
Can an IAM group contain another group?
> ✅ No — Groups can only contain users, not other groups.
What are IAM Roles used for?
> ✅ Services, not users. Roles grant permissions to AWS services (e.g., EC2, Lambda) to perform actions.
EC2
You're trying to SSH into your EC2 and getting a timeout. What's the most likely issue?
> ✅ Security Group — Timeout = 100% a security group issue. Check inbound rules for port 22.
Which EC2 purchasing option offers up to 90% discount but can be interrupted?
> ✅ Spot Instances — Cheapest option, but AWS can reclaim when spot price exceeds your bid.
What's the difference between Dedicated Host and Dedicated Instance?
> ✅ Dedicated Host — Full server control, see sockets/cores (for BYOL licensing)
>
> ✅ Dedicated Instance — Dedicated hardware, no host visibility
Storage (EBS, EFS, Instance Store)
What happens to Instance Store data when you stop an EC2 instance?
> ✅ Data is lost — Instance Store is ephemeral. Data is lost on stop, terminate, or hardware failure.
Which EBS volume types can be used as boot volumes?
> ✅ SSD types only — gp2, gp3, io1, io2. HDD types (st1, sc1) cannot be boot volumes.
What is the max IOPS for gp3?
> ✅ 16,000 IOPS — Can be provisioned independently of volume size.
Can you attach an EBS volume to multiple EC2 instances?
> ✅ Only io1/io2 with Multi-Attach — up to 16 instances, same AZ only.
EFS is compatible with which operating systems?
> ✅ Linux only — EFS is POSIX-compliant, not compatible with Windows.
AMI
Are AMIs region-specific or global?
> ✅ Region-specific — Must copy an AMI to use it in another region.
ELB & ASG
What does ELB stand for and is it a load balancer type?
> ✅ Elastic Load Balancing — It's the service name, not a LB type. Actual types are ALB, NLB, GLB, CLB.
Which load balancer provides a static IP address?
> ✅ NLB — Network Load Balancer provides one static IP per AZ. ALB only provides a static DNS hostname.
NLB operates at which OSI layer? ALB?
> ✅ NLB — Layer 4 (Transport: TCP, UDP)
>
> ✅ ALB — Layer 7 (Application: HTTP, HTTPS)
Will ELB terminate an unhealthy target?
> ✅ No — ELB only stops routing traffic. ASG with ELB health checks enabled will terminate/replace unhealthy instances.
Is Cross-Zone Load Balancing enabled by default for ALB? NLB?
> ✅ ALB — Enabled by default (free)
>
> ✅ NLB — Disabled by default (charged if enabled)
What is the default ASG cooldown period?
> ✅ 300 seconds (5 minutes) — Prevents rapid successive scaling actions.
What scaling policy uses ML to predict load patterns?
> ✅ Predictive Scaling — Analyzes historical patterns and pre-provisions capacity.
RDS & Aurora
Read Replicas use sync or async replication?
> ✅ ASYNC — Data is eventually consistent across read replicas.
Multi-AZ uses sync or async replication?
> ✅ SYNC — Changes are immediately replicated to standby for disaster recovery.
Can you read from a Multi-AZ standby database?
> ✅ No — Standby is only for failover. Use Read Replicas for read scaling.
How many Read Replicas can RDS have? Aurora?
> ✅ Both can have up to 15 Read Replicas.
What's the failover time for Aurora?
> ✅ Less than 30 seconds.
How do you encrypt an existing unencrypted RDS database?
> ✅ Snapshot → Copy with encryption → Restore from encrypted snapshot.
What is RDS Proxy and when should you use it?
> ✅ Serverless connection pooler. Use with Lambda to reduce DB connections (Lambda opens many short-lived connections).
Is RDS Proxy publicly accessible?
> ✅ No — It lives inside your VPC only, never publicly accessible.
Lambda
What is the maximum Lambda execution timeout?
> ✅ 15 minutes (900 seconds).
What is the maximum Lambda memory allocation?
> ✅ 10,240 MB (10 GB). CPU scales proportionally with memory.
What is the /tmp directory size limit in Lambda?
> ✅ 10,240 MB (10 GB) — Use for temporary file processing.
What happens if Lambda runs out of memory?
> ✅ Execution fails with "Process exited before completing request" or OutOfMemoryError.
What are Lambda Layers used for?
> ✅ Share code/dependencies across multiple functions. Up to 5 layers per function.
How do you give Lambda access to resources in a VPC?
> ✅ Configure VPC settings (subnets + security groups). Lambda creates ENIs in your VPC.
What's the difference between synchronous and asynchronous Lambda invocation?
> ✅ Sync — Caller waits for response (API Gateway, SDK invoke)
>
> ✅ Async — Caller doesn't wait, Lambda handles retries (S3, SNS, EventBridge)
How many retries does Lambda do for async invocations?
> ✅ 2 retries (3 total attempts). Failed events can go to DLQ or on-failure destination.
API Gateway
What are the three API Gateway endpoint types?
> ✅ Edge-optimized (CloudFront), Regional, Private (VPC only)
What is the API Gateway default timeout?
> ✅ 29 seconds — Cannot exceed this even if Lambda timeout is higher.
How do you handle CORS in API Gateway?
> ✅ Enable CORS on the resource/method. API Gateway adds Access-Control-Allow-Origin headers.
What's the difference between REST API and HTTP API in API Gateway?
> ✅ HTTP API — Cheaper, faster, simpler (JWT auth, Lambda proxy)
>
> ✅ REST API — Full features (caching, request validation, usage plans, API keys)
How do you implement rate limiting in API Gateway?
> ✅ Usage Plans + API Keys — Set throttling limits per client.
DynamoDB
What are the two capacity modes in DynamoDB?
> ✅ Provisioned (set RCU/WCU) and On-Demand (pay per request).
What is the maximum item size in DynamoDB?
> ✅ 400 KB per item.
What's the difference between Query and Scan?
> ✅ Query — Efficient, uses partition key (and optionally sort key)
>
> ✅ Scan — Reads entire table, expensive, use sparingly
What are DynamoDB Streams used for?
> ✅ Capture item-level changes (insert, update, delete). Trigger Lambda, replicate data, etc.
What is a GSI vs LSI in DynamoDB?
> ✅ GSI — Different partition key, can be added anytime, has own throughput
>
> ✅ LSI — Same partition key, must be created at table creation, shares table throughput
How do you implement optimistic locking in DynamoDB?
> ✅ Use conditional writes with a version attribute. Write fails if version doesn't match.
S3
What is the maximum object size in S3?
> ✅ 5 TB. Use multipart upload for objects > 100 MB (required > 5 GB).
What is S3 Transfer Acceleration?
> ✅ Uses CloudFront edge locations to speed up uploads over long distances.
What's the difference between S3 Standard-IA and S3 One Zone-IA?
> ✅ Standard-IA — Multi-AZ, for infrequent access
>
> ✅ One Zone-IA — Single AZ, cheaper, data lost if AZ fails
What is S3 Object Lock?
> ✅ WORM model (Write Once Read Many). Prevents object deletion/modification for retention period.
How do you enable versioning on an S3 bucket?
> ✅ Enable at bucket level. Once enabled, can only be suspended (not disabled). Protects against accidental deletes.
SQS & SNS
What is the default visibility timeout for SQS?
> ✅ 30 seconds — Time a message is hidden after being read.
What is the maximum retention period for SQS messages?
> ✅ 14 days (default: 4 days).
What's the difference between Standard and FIFO SQS queues?
> ✅ Standard — Unlimited throughput, at-least-once delivery, best-effort ordering
>
> ✅ FIFO — 300 msg/s (3000 with batching), exactly-once, strict ordering
What is a Dead Letter Queue (DLQ)?
> ✅ Queue for messages that failed processing after max retries. Helps debug failures.
What's the difference between SQS and SNS?
> ✅ SQS — Queue, pull-based, messages persist until processed
>
> ✅ SNS — Pub/sub, push-based, messages sent immediately to all subscribers
What is the SNS + SQS fan-out pattern?
> ✅ SNS topic pushes to multiple SQS queues. Decouples publishers from consumers, enables parallel processing.
CI/CD (CodeCommit, CodeBuild, CodeDeploy, CodePipeline)
What is the buildspec.yml file?
> ✅ CodeBuild configuration file. Defines build phases (install, pre_build, build, post_build) and artifacts.
What is the appspec.yml/appspec.yaml file?
> ✅ CodeDeploy configuration. Defines deployment lifecycle hooks and file mappings.
What deployment types does CodeDeploy support for EC2?
> ✅ In-place (rolling) and Blue/Green (traffic shift to new instances).
What deployment types does CodeDeploy support for Lambda?
> ✅ AllAtOnce, Canary (x% then 100%), Linear (x% every n minutes).
CloudFormation & SAM
What is the intrinsic function to reference another resource in CloudFormation?
> ✅ !Ref or Ref: — Returns the physical ID of the resource.
What does !GetAtt do in CloudFormation?
> ✅ Gets an attribute from a resource (e.g., !GetAtt MyBucket.Arn).
What is AWS SAM?
> ✅ Serverless Application Model — Simplified CloudFormation for serverless (Lambda, API Gateway, DynamoDB).
What command packages and deploys a SAM application?
> ✅ sam build → sam deploy (or sam deploy --guided for interactive).
CloudWatch & X-Ray
What is the minimum resolution for CloudWatch custom metrics?
> ✅ 1 second (high-resolution). Standard is 1 minute.
How long are CloudWatch Logs retained by default?
> ✅ Forever (never expire). Must set retention policy to auto-delete.
What is X-Ray used for?
> ✅ Distributed tracing — Visualize requests as they travel through your application. Debug latency issues.
What is the X-Ray daemon?
> ✅ Runs on EC2/ECS, collects trace data from SDK and sends to X-Ray service. Lambda has it built-in.
What are X-Ray segments and subsegments?
> ✅ Segment — Work done by a service/resource
>
> ✅ Subsegment — Granular breakdown (e.g., external HTTP call, DB query)
Cognito
What's the difference between Cognito User Pools and Identity Pools?
> ✅ User Pools — Authentication (sign-up, sign-in, get JWT tokens)
>
> ✅ Identity Pools — Authorization (exchange tokens for temporary AWS credentials)
How do you authenticate API Gateway with Cognito?
> ✅ Use Cognito User Pool Authorizer — Validates JWT tokens from User Pool.
KMS & Encryption
What are the two types of KMS keys?
> ✅ AWS managed (aws/service-name, free) and Customer managed (you control rotation, policies).
What is envelope encryption?
> ✅ Data encrypted with data key, data key encrypted with KMS key. Used for large data.
What is the GenerateDataKey API?
> ✅ Returns a plaintext data key + encrypted copy. Use plaintext to encrypt data, store encrypted key with data.
EventBridge
What is EventBridge (formerly CloudWatch Events)?
> ✅ Serverless event bus. Route events from AWS services, SaaS, custom apps to targets (Lambda, SQS, etc.).
What is an EventBridge rule?
> ✅ Matches incoming events (by pattern or schedule) and routes to target(s).
ElastiCache
What's the difference between Redis and Memcached in ElastiCache?
> ✅ Redis — Multi-AZ, replication, persistence, complex data types
>
> ✅ Memcached — Simple key-value, multi-threaded, no persistence, horizontal scaling
What is Lazy Loading (Cache-Aside) pattern?
> ✅ App checks cache first → on miss, fetches from DB → stores in cache → returns. Only requested data is cached.
What is Write-Through caching?
> ✅ Write to cache AND DB on every update. Cache always current, but write penalty and cache churn.
What is the main drawback of Lazy Loading?
> ✅ Cache miss = 3 network calls (check cache, query DB, write cache). Also, data can become stale.
When would you use Redis over Memcached?
> ✅ When you need: Multi-AZ, persistence, complex data structures (sorted sets, lists), pub/sub, or backup/restore.
What is TTL in caching?
> ✅ Time-To-Live — Automatic expiration of cached items. Balance freshness vs cache hit rate.