AWS Certified Generative AI Developer - Professional (AIP-C01) Ultimate Cheat Sheet

5 Domains • 40 Concepts • Approx. 5 pages

Your Quick Reference Study Guide

This cheat sheet covers the core concepts, terms, and definitions you need to know for the AWS Certified Generative AI Developer - Professional (AIP-C01). We've distilled the most important domains, topics, and critical details to help your exam preparation.

💡 Note: While this study guide highlights essential concepts, it's designed to complement—not replace—comprehensiv e learning materials. Use it for quick reviews, last-minute prep, or to identify areas that need deeper study before your exam.

AWS Certified Generative AI Developer - Professional (AIP-C01) Practice Questions

Access Mock Exams & Comprehensive Question Bank

Listen to Audio Podcasts

Expert summaries for AWS Certified Generative AI Developer - Professional (AIP-C01)

Smart Chunking & Provenance Anchors

Split and normalize docs (fixed, semantic, overlapping, adaptive); attach anchors/timestamps for precise retrieval.

Key Insight

Overlap boosts recall but raises cost and duplicate noise; chunk size must preserve semantics and allow provenance tracing.

Often Confused With

Vector indexingRAG (Retrieval-Augmented Generation)

Common Mistakes

Assuming more overlap always helps — increases cost and retrieval noise.
Using tiny chunks that lose context and reduce relevance.
Skipping source IDs/timestamps — destroys provenance and freshness checks.

Foundation Model Selection & Sizing

Choose FMs by modality, context window, latency, throughput, cost, license and tuning options to meet accuracy and risk.

Key Insight

Match model size and context window to latency/cost/compliance needs; use RAG or PEFT before escalating to a bigger model.

Often Confused With

Fine‑tuning/PEFTRAG (Retrieval-Augmented Generation)

Common Mistakes

Defaulting to the largest model — ignores latency, cost, and diminishing returns.
Skipping retrieval or PEFT and expecting domain accuracy out‑of‑the‑box.
Treating latency and throughput as the same — optimizing one can hurt the other.

Model Lifecycle & Retirement

Version models, provenance, and compatibility; use canaries, validated rollbacks, and retention policies for safe FM ops

Key Insight

True reproducibility = model binary + training-data snapshot + metadata + runtime environment, not just a version number.

Often Confused With

Model VersioningDeployment RollbackModel Lineage

Common Mistakes

Assuming a numeric version or binary-only versioning guarantees reproducibility.
Swapping an alias without canary/validation — causes silent behavior changes in production.
Treating rollback as redeploying a binary — ignores schema, feature-store, and downstream compatibility.

Amazon Bedrock (Managed FM API Layer)

Managed API access to third‑party and AWS foundation models — handles inference/routing, not vector storage or arbitrary

Key Insight

Bedrock exposes provider FMs via API — it is not a vector DB or arbitrary model host; you must implement retries, fallbacks, and governance.

Often Confused With

Amazon SageMakerVector DB (OpenSearch/Custom)

Common Mistakes

Expecting Bedrock to act as a managed vector DB that stores/indexes your app data for RAG.
Assuming you can upload and run arbitrary model binaries in Bedrock.
Thinking Bedrock removes the need for retries, fallbacks, IAM controls, or audit integration.

FM Data-Gate: Validation Workflows

Automated structural, semantic, and safety checks plus synthetic tests to catch tokenization, preprocessing, and multim

Key Insight

Combine targeted synthetic edge-cases with unit + integration + regression tests: unit finds parser/tokenizer bugs; integration/regression catch drift

Often Confused With

Data profilingData augmentationAdversarial testing

Common Mistakes

Relying on synthetic data alone to validate real-world FM behavior.
Using uniform/noise perturbations as adversaries — misses realistic modality corruptions.
Skipping integration/regression tests because unit tests passed.

Bedrock Data Automation (BDA) — Async Multimodal Extractor

Managed async extractor that converts multimodal unstructured content into structured outputs written to S3; needs post‑

Key Insight

BDA is inference/processing only and writes results to S3 asynchronously — it does NOT train models, act as a low‑latency endpoint, or auto‑index into

Often Confused With

Bedrock real-time inferenceModel fine-tuning/trainingAuto RAG/vector indexing

Common Mistakes

Expecting BDA to fine‑tune or train foundation models.
Treating BDA as a low‑latency/synchronous inference endpoint.
Assuming outputs are auto‑indexed into vector stores or are PII‑clean without validation.

Similarity Metrics & Normalization

Compare and preprocess embeddings—pick metric and norm to trade retrieval quality, index size, and latency.

Key Insight

Dot-product == cosine only if vectors are L2-normalized; choose metric to match embedding geometry and index engine.

Often Confused With

Cosine similarityDot productEuclidean distance

Common Mistakes

Treating dot-product and cosine as always interchangeable — only equal with L2-normalized vectors.
Applying L2-normalization blindly — it can hurt models that encode useful magnitude information.
Calling PCA supervised — PCA is unsupervised and preserves variance, not class separability.

OpenSearch + Neural Plugin for Bedrock

OpenSearch vector search (knn_vector/dense_vector): index precomputed embeddings, tune engine, space_type, and shard/rep

Key Insight

knn_vector uses k‑NN engines (FAISS/HNSW) and index.knn params; dense_vector needs script scoring—mapping controls metric/latency tradeoffs.

Often Confused With

dense_vectorknn_vectork‑NN plugin

Common Mistakes

Assuming OpenSearch auto-generates embeddings — embeddings must be produced externally (Bedrock, SageMaker, etc.).
Treating knn_vector and dense_vector as interchangeable — they require different index settings, plugins, and scoring.
Thinking more shards always reduce latency — extra shards increase fan‑out and coordination overhead.

ANN — HNSW, IVF & Quantization Tuning

Approximate nearest-neighbor indexes (HNSW/IVF/quant) trade tiny recall for big latency and memory gains in RAG.

Key Insight

Tuning is a 3‑way tradeoff: recall vs latency vs memory — adjust efConstruction/efSearch, nprobe/cluster count, and quantization accordingly.

Often Confused With

Exact k-NNLocality-Sensitive Hashing (LSH)Brute-force k-NN

Common Mistakes

Expecting ANN to always match exact k-NN top-k — small ranking differences are normal.
Assuming efConstruction only affects build time — poor construction hurts query recall.
Increasing IVF partitions without raising nprobe can reduce, not improve, recall.

Bedrock KB — Managed Vector Store & Provenance

Bedrock provides a hosted vector store with hierarchical docs and provenance-aware retrieval to ground LLM answers and支持

Key Insight

Managed vectors speed integration but still require semantic chunking, a metadata schema, and explicit sync to ensure accurate, provable retrieval.

Often Confused With

Self-hosted vector DBs (Milvus, Pinecone)S3 or document object stores

Common Mistakes

Treating Bedrock's store as a drop‑in replacement for custom sharding or advanced index tuning.
Assuming hierarchical grouping alone guarantees relevance — chunking and metadata design determine quality.
Skipping ingestion/sync setup — Bedrock won't auto-sync source systems unless configured.

Prompt Engineering (Templates & Context Windows)

Design, iterate, and validate instruction templates and context flows to produce predictable, testable FM outputs.

Key Insight

Control intent, output schema, and context order — well-structured templates + retrieval beat just longer prompts.

Often Confused With

Fine-tuningRAGPrompt tuning

Common Mistakes

Assuming longer prompts always improve output quality
Expecting a single prompt to work unchanged across models or contexts
Believing prompting alone can replace fine-tuning or retrieval augmentation

Hallucination Detection & Mitigation

Detect and reduce fabricated outputs using grounding (RAG), verification chains, provenance, and conservative fallbacks.

Key Insight

Grounding + automated verification (source checks, answer validation, conservative responses) is the primary defense — decoding tweaks alone won't fix

Often Confused With

RAGModel calibrationDeterministic decoding

Common Mistakes

Treating model token-level confidence as factual correctness
Assuming retrieval guarantees elimination of hallucinations
Relying on greedy/low-temperature decoding alone to prevent fabrications

Smart Chunking & Provenance Anchors

Split and normalize docs (fixed, semantic, overlapping, adaptive); attach anchors/timestamps for precise retrieval.

Key Insight

Overlap boosts recall but raises cost and duplicate noise; chunk size must preserve semantics and allow provenance tracing.

Often Confused With

Vector indexingRAG (Retrieval-Augmented Generation)

Common Mistakes

Assuming more overlap always helps — increases cost and retrieval noise.
Using tiny chunks that lose context and reduce relevance.
Skipping source IDs/timestamps — destroys provenance and freshness checks.

Foundation Model Selection & Sizing

Choose FMs by modality, context window, latency, throughput, cost, license and tuning options to meet accuracy and risk.

Key Insight

Match model size and context window to latency/cost/compliance needs; use RAG or PEFT before escalating to a bigger model.

Often Confused With

Fine‑tuning/PEFTRAG (Retrieval-Augmented Generation)

Common Mistakes

Defaulting to the largest model — ignores latency, cost, and diminishing returns.
Skipping retrieval or PEFT and expecting domain accuracy out‑of‑the‑box.
Treating latency and throughput as the same — optimizing one can hurt the other.

Model Lifecycle & Retirement

Version models, provenance, and compatibility; use canaries, validated rollbacks, and retention policies for safe FM ops

Key Insight

True reproducibility = model binary + training-data snapshot + metadata + runtime environment, not just a version number.

Often Confused With

Model VersioningDeployment RollbackModel Lineage

Common Mistakes

Assuming a numeric version or binary-only versioning guarantees reproducibility.
Swapping an alias without canary/validation — causes silent behavior changes in production.
Treating rollback as redeploying a binary — ignores schema, feature-store, and downstream compatibility.

Amazon Bedrock (Managed FM API Layer)

Managed API access to third‑party and AWS foundation models — handles inference/routing, not vector storage or arbitrary

Key Insight

Bedrock exposes provider FMs via API — it is not a vector DB or arbitrary model host; you must implement retries, fallbacks, and governance.

Often Confused With

Amazon SageMakerVector DB (OpenSearch/Custom)

Common Mistakes

Expecting Bedrock to act as a managed vector DB that stores/indexes your app data for RAG.
Assuming you can upload and run arbitrary model binaries in Bedrock.
Thinking Bedrock removes the need for retries, fallbacks, IAM controls, or audit integration.

FM Data-Gate: Validation Workflows

Automated structural, semantic, and safety checks plus synthetic tests to catch tokenization, preprocessing, and multim

Key Insight

Combine targeted synthetic edge-cases with unit + integration + regression tests: unit finds parser/tokenizer bugs; integration/regression catch drift

Often Confused With

Data profilingData augmentationAdversarial testing

Common Mistakes

Relying on synthetic data alone to validate real-world FM behavior.
Using uniform/noise perturbations as adversaries — misses realistic modality corruptions.
Skipping integration/regression tests because unit tests passed.

Bedrock Data Automation (BDA) — Async Multimodal Extractor

Managed async extractor that converts multimodal unstructured content into structured outputs written to S3; needs post‑

Key Insight

BDA is inference/processing only and writes results to S3 asynchronously — it does NOT train models, act as a low‑latency endpoint, or auto‑index into

Often Confused With

Bedrock real-time inferenceModel fine-tuning/trainingAuto RAG/vector indexing

Common Mistakes

Expecting BDA to fine‑tune or train foundation models.
Treating BDA as a low‑latency/synchronous inference endpoint.
Assuming outputs are auto‑indexed into vector stores or are PII‑clean without validation.

Similarity Metrics & Normalization

Compare and preprocess embeddings—pick metric and norm to trade retrieval quality, index size, and latency.

Key Insight

Dot-product == cosine only if vectors are L2-normalized; choose metric to match embedding geometry and index engine.

Often Confused With

Cosine similarityDot productEuclidean distance

Common Mistakes

Treating dot-product and cosine as always interchangeable — only equal with L2-normalized vectors.
Applying L2-normalization blindly — it can hurt models that encode useful magnitude information.
Calling PCA supervised — PCA is unsupervised and preserves variance, not class separability.

OpenSearch + Neural Plugin for Bedrock

OpenSearch vector search (knn_vector/dense_vector): index precomputed embeddings, tune engine, space_type, and shard/rep

Key Insight

knn_vector uses k‑NN engines (FAISS/HNSW) and index.knn params; dense_vector needs script scoring—mapping controls metric/latency tradeoffs.

Often Confused With

dense_vectorknn_vectork‑NN plugin

Common Mistakes

Assuming OpenSearch auto-generates embeddings — embeddings must be produced externally (Bedrock, SageMaker, etc.).
Treating knn_vector and dense_vector as interchangeable — they require different index settings, plugins, and scoring.
Thinking more shards always reduce latency — extra shards increase fan‑out and coordination overhead.

ANN — HNSW, IVF & Quantization Tuning

Approximate nearest-neighbor indexes (HNSW/IVF/quant) trade tiny recall for big latency and memory gains in RAG.

Key Insight

Tuning is a 3‑way tradeoff: recall vs latency vs memory — adjust efConstruction/efSearch, nprobe/cluster count, and quantization accordingly.

Often Confused With

Exact k-NNLocality-Sensitive Hashing (LSH)Brute-force k-NN

Common Mistakes

Expecting ANN to always match exact k-NN top-k — small ranking differences are normal.
Assuming efConstruction only affects build time — poor construction hurts query recall.
Increasing IVF partitions without raising nprobe can reduce, not improve, recall.

Bedrock KB — Managed Vector Store & Provenance

Bedrock provides a hosted vector store with hierarchical docs and provenance-aware retrieval to ground LLM answers and支持

Key Insight

Managed vectors speed integration but still require semantic chunking, a metadata schema, and explicit sync to ensure accurate, provable retrieval.

Often Confused With

Self-hosted vector DBs (Milvus, Pinecone)S3 or document object stores

Common Mistakes

Treating Bedrock's store as a drop‑in replacement for custom sharding or advanced index tuning.
Assuming hierarchical grouping alone guarantees relevance — chunking and metadata design determine quality.
Skipping ingestion/sync setup — Bedrock won't auto-sync source systems unless configured.

Prompt Engineering (Templates & Context Windows)

Design, iterate, and validate instruction templates and context flows to produce predictable, testable FM outputs.

Key Insight

Control intent, output schema, and context order — well-structured templates + retrieval beat just longer prompts.

Often Confused With

Fine-tuningRAGPrompt tuning

Common Mistakes

Assuming longer prompts always improve output quality
Expecting a single prompt to work unchanged across models or contexts
Believing prompting alone can replace fine-tuning or retrieval augmentation

Hallucination Detection & Mitigation

Detect and reduce fabricated outputs using grounding (RAG), verification chains, provenance, and conservative fallbacks.

Key Insight

Grounding + automated verification (source checks, answer validation, conservative responses) is the primary defense — decoding tweaks alone won't fix

Often Confused With

RAGModel calibrationDeterministic decoding

Common Mistakes

Treating model token-level confidence as factual correctness
Assuming retrieval guarantees elimination of hallucinations
Relying on greedy/low-temperature decoding alone to prevent fabrications

Agentic AI — Router & Orchestrator

Models + routing rules that map intent to tools/agents, coordinate steps, and manage short‑term state.

Key Insight

Routing, state, and validation are distinct responsibilities — good routing sends work, memory preserves context, validation guarantees correctness.

Often Confused With

Amazon Bedrock AgentsRAG / Vector storesRouting models

Common Mistakes

Assume agents share one implicit global memory — you must design syncing/consistency.
Skip runtime safeguards — omit timeouts, circuit breakers, or result validation.
Swap rule routing for a learned router without benchmarking latency, cost, and error modes.

Bedrock Agents — AWS Managed Orchestration

AWS-managed agent orchestrator (default ReAct) that connects FMs, APIs, and data — integrations required for memory/DBs/

Key Insight

Bedrock provides orchestration and connectors, not built‑in persistent memory or automatic production guardrails.

Often Confused With

Agentic AI solutions (agents)RAG / Vector stores

Common Mistakes

Assume Bedrock Agents include persistent long‑term memory or a built‑in vector DB.
Deploy without developer guardrails — skip timeouts, validation, or circuit breakers.
Think Bedrock is closed‑box and can't call external APIs or third‑party models.

Batching Strategies — Static / Dynamic / Micro / Continuous

Group inference inputs to trade throughput vs per-item latency; pick by SLOs, token variance, and API limits.

Key Insight

Larger batches raise throughput but add queuing and tail latency; use micro/dynamic batching with size/time caps for tight SLOs.

Often Confused With

Autoscaling (compute)Padding/token-length handlingRequest rate-limiting

Common Mistakes

Assuming bigger batches always improve throughput — memory/context or API rate limits often cap gains.
Believing batching always reduces per-request latency — queuing and tail-latency can increase end-to-end time.
Ignoring token-length variance and padding — variable lengths inflate compute and ruin throughput estimates.

Bedrock Provisioned Throughput (Model Units — MUs)

Buy dedicated Bedrock Model Units for guaranteed tokens/minute throughput; intended for sustained, high-volume inference

Key Insight

Provisioned MUs reserve throughput and are billed hourly per model/region — they guarantee capacity, not fixed per-request latency.

Often Confused With

Bedrock on‑demand (serverless)Autoscaling/on‑demand capacity

Common Mistakes

Treating provisioned throughput as the same as on‑demand serverless — provisioned is reserved, not per-call autoscale.
Expecting billing per API call — MUs are hourly charges (commit terms alter price), not per-invocation fees.
Assuming provisioned removes latency variability — input size and model compute still cause per-request latency differences.

Vector Stores (RAG Indexes)

Vector DBs for RAG: index build/update, chunking, similarity tuning, access control, and monitoring.

Key Insight

Embedding-model version + chunking choices set retrieval quality — change either and you must reindex or retune.

Often Confused With

Relational databasesFull-text search engines

Common Mistakes

Swapping embeddings from a new model into an old index without reindexing breaks nearest-neighbor relevance.
Treating reindexing as instant — plan for long rebuilds; use versioned indices and atomic index swaps.
Over-chunking to boost recall — tiny chunks fragment context and reduce coherent answers; match chunk size to query/context length.

GenAI Security & Governance

Runtime sandboxing, IAM/VPC isolation, KMS/secrets controls, tenant routing, quotas, and telemetry partitioning.

Key Insight

Isolation must be layered: network + IAM + runtime sandbox + telemetry scoping — missing any layer enables leakage.

Often Confused With

Network/VPC isolationEncryption at rest/in transitPer-tenant endpoints

Common Mistakes

Relying on namespaces/ACLs alone — logical separation doesn't stop shared-pipeline leaks.
Assuming encryption prevents prompt-injection or inference-time data exfiltration.
Creating per-tenant endpoints but sharing logs/metrics — telemetry still mixes tenant data unless partitioned.

Cost‑Aware Model Cascades

Route requests across models (static or dynamic) to balance cost, latency, and output quality.

Key Insight

Start cheap and escalate using calibrated confidence/telemetry — avoid calling every model in parallel to save compute.

Often Confused With

A/B testingModel ensemblingLoad balancing

Common Mistakes

Always start with the smallest model — triggers costly fallbacks and higher end-to-end latency.
Routing purely by per-call token cost — ignores model capability and confidence, causing quality drops.
Designing dynamic routing that invokes all candidate models in parallel — wastes compute and inflates cost.

API Gateway: FM Front Door

Use API Gateway to validate/transform requests, enforce auth/throttling, and normalize errors for Bedrock FMs.

Key Insight

Gateway offloads shaping/early validation and consistent errors, but it doesn't replace backend safety, retries, or fine-grained auth.

Often Confused With

AWS WAFLambda authorizersClient-side validation

Common Mistakes

Relying on API Gateway to prevent hallucinations or guarantee model input safety.
Skipping backend validation because transformations ran at the gateway.
Expecting API Gateway to auto-retry model invocations or fully hide timeouts from clients.

Token Streaming & Backpressure

Send model output token-by-token (SSE/WebSocket/HTTP stream) to lower perceived latency for real-time UIs.

Key Insight

Streaming improves perceived latency, not always total latency — you must frame/reassemble chunks, enforce backpressure, and integrate with gateways.

Often Confused With

SSEWebSocketHTTP chunked transfer

Common Mistakes

Assuming streaming reduces total compute or end-to-end latency
Treating SSE and WebSocket as functionally identical
Expecting each chunk to be complete JSON — ignoring framing/reassembly

OpenAPI for GenAI (API‑First)

Define FM-facing HTTP/JSON endpoints, schemas and metadata (rate limits, streaming hints) so integrations are consistent

Key Insight

OpenAPI documents the API surface (and can carry streaming/extensions metadata) but does NOT implement runtime enforcement or reveal model internals.

Often Confused With

AsyncAPIgRPC/Protobuf APIsAPI Gateway runtime config

Common Mistakes

Believing an OpenAPI file enforces rate limits/auth at runtime
Thinking OpenAPI describes model internals or training data
Assuming versioned spec auto-resolves backward compatibility

Bedrock Guardrails (ApplyGuardrails)

Runtime pre/post checks that block, redact, label, and enforce decoupled safety policies on Bedrock model calls.

Key Insight

Guardrails are an enforcement middleware around model calls—not model fine‑tuning—and must be orchestrated with logging, redaction, and human review.

Often Confused With

Model fine-tuningContent moderation APIs

Common Mistakes

Thinking guardrails change model weights—they only intercept and transform I/O.
Assuming ApplyGuardrails removes all hallucinations or guarantees accuracy.
Expecting ApplyGuardrails to auto-provide full audit/compliance records without extra config.

Prompt Injection & Jailbreak Defense

Detect and block attempts to override system instructions—apply layered runtime detectors, context separation, and RBAC,

Key Insight

Injection payloads can come from user input, RAG-retrieved docs, or tool outputs—treat all context as untrusted and enforce provenance, signed tool/tX

Often Confused With

Input validationTool access controls

Common Mistakes

Believing keyword removal or a single regex will stop all injections.
Only checking external user prompts—ignoring model-generated context, RAG hits, or tool outputs.
Assuming encrypted logs or 'immutable' system prompts alone prevent runtime exfiltration.

Least‑Privilege for Foundation Models (Bedrock)

Scope Bedrock/FM access with IAM/ABAC, resource policies and short‑lived scoped tokens to separate inference, tuning, &

Key Insight

AuthN ≠ AuthZ: combine ABAC attributes, resource policies and scoped STS tokens to restrict inference vs customization.

Often Confused With

AWS security best practices and identity managementResource-based policies / SCPs

Common Mistakes

Relying on TLS or a shared API key as 'secure enough' — still need fine‑grained authorization
Using one broad IAM role or wildcard (e.g., "bedrock:*") across tenants to 'simplify' access
Assuming ABAC alone removes the need for scoped policy statements or explicit denies

IAM for GenAI: Roles, Policies & MFA

Apply least‑privilege JSON policies, use short‑lived roles/STSp, federation for users, and MFA for humans in GenAI flows

Key Insight

Policy eval rules matter: permissions union; explicit Deny overrides. Use role separation, STS sessions and SCPs for boundaries.

Often Confused With

Least-privilege API access to foundation modelsIAM users vs IAM roles

Common Mistakes

Using the root account or long‑lived IAM user keys for routine tasks
Assuming MFA changes authorization or reduces granted permissions
Thinking multiple attached policies conflict — they combine; explicit Deny still beats Allows

Forensic Traceability — Hash Chains, Merkle, WORM & KMS

Provable, append-only records of prompts/interactions using hash chains, Merkle proofs, signatures, WORM storage, and K

Key Insight

Integrity requires cryptographic anchors + secure key separation + independent notarization; storage alone isn't proof.

Often Confused With

AWS CloudTrailBackups/SnapshotsEncryption

Common Mistakes

Assuming raw, unredacted prompts are safe to store for debugging or compliance.
Believing a single S3 bucket automatically makes logs immutable and tamper-proof.
Thinking encryption or a hash chain alone proves integrity without key separation or notarization.

CloudTrail — API & Config Audit Trail

AWS service that records API calls and config changes; enable data events to capture S3/Lambda and FM-related activity.

Key Insight

CloudTrail records and retains events but doesn't enforce actions; data events and full payload capture must be explicitly enabled.

Often Confused With

AWS ConfigS3 Access LogsCloudWatch Logs

Common Mistakes

Expecting CloudTrail to include full request/response payloads by default.
Relying on CloudTrail to prevent or block unauthorized actions.
Assuming data-plane events (S3/Lambda object details) are recorded without enabling them.

Data Masking & Privacy Tech (DP, SMPC, HE)

Pick masking, differential privacy, or crypto (SMPC/HE) by data type — balance re‑id risk vs utility.

Key Insight

Noise must be calibrated to sensitivity and ε,δ; cryptographic methods protect computation/privacy but add cost and limit analytic utility.

Often Confused With

PseudonymizationEncryptionFederated Learning

Common Mistakes

Assuming any added noise = differential privacy — noise must match sensitivity and ε,δ.
Believing DP eliminates all re‑identification risk — it bounds worst‑case leakage, not absolute safety.
Treating masking/token removal as the same as DP — masking hides fields; DP provides statistical privacy guarantees.

Grounding & Source Attribution (RAG Provenance)

Ensure outputs link to retrieved passages, citations and retrieval metadata so answers are verifiable and auditable.

Key Insight

A citation or URL alone isn't proof — surface the supporting passage, retrieval context, and metadata (score, timestamp, doc id).

Often Confused With

CitationRetrieval ScoreFact‑checking

Common Mistakes

Assuming any citation or URL guarantees the answer is correct.
Thinking grounding removes all hallucinations so human review isn't needed.
Relying on a high retrieval score alone as proof that a source is authoritative.

Inference Deployment Patterns (SageMaker & Bedrock)

Map real‑time, async/queue, serverless and multi‑model patterns to SageMaker/Bedrock using latency vs cost tradeoffs.

Key Insight

Pick by traffic shape: steady high‑QPS → dedicated real‑time; bursty/low‑QPS → serverless or async; MMEs save model disk loads but don't remove GPU/mv

Often Confused With

SageMaker Serverless InferenceSageMaker Multi-Model EndpointsBedrock model hosting

Common Mistakes

Assuming MMEs are always cheaper — ignores request frequency, model load latency and caching
Believing MMEs remove GPU/memory limits — instance sizing, sharding, and model size still constrain you
Treating asynchronous inference as streaming/real‑time — async adds queueing and higher end‑to‑end latency

Token Accounting & Tracking

Log and reconcile input + output + system tokens per request with the model tokenizer and monitor aggregated trends for費

Key Insight

Billing = input + output + hidden/system tokens; use the model's tokenizer server‑side and time‑series alerts to catch drift and leaks

Often Confused With

Character countPayload byte size

Common Mistakes

Estimating tokens from character count — tokenizer rules (byte‑pair) differ widely
Trusting client‑side estimates without server reconciliation — billed tokens may differ
Ignoring output and system tokens — they can be a large portion of cost

Token Window & Quota Control (TPM/RPM)

Measure and control every token (system, assistant, retrieved) to meet context windows, cost, and quota SLAs.

Key Insight

All messages and retrieved context consume tokens — use the model tokenizer to count, then truncate, compress, chunk, cache, or stream to preserve the

Often Confused With

Prompt engineeringRetrieval-Augmented Generation (RAG)

Common Mistakes

Treating character count as tokens — always measure with the target model's tokenizer.
Over-compressing/truncating context and losing required facts — test output quality after each reduction.
Caching prompts without versioning/validation — caches go stale or leak private data.

Model & Infra Right-Sizing (GPU, Inferentia, Graviton)

Match model variant and accelerator to SLAs: benchmark accuracy vs latency/cost, then optimize with quantization, comp‑r

Key Insight

The cheapest/fastest real-world choice is empirical — benchmark model variants on target instance/accelerator, use quantization/compilation and right‑

Often Confused With

AutoscalingBatching strategies

Common Mistakes

Deploying the largest model by default — may break latency and cost SLAs without benchmark data.
Choosing hardware by peak FLOPS only — memory bandwidth, drivers, and kernel support change real latency.
Assuming batching always lowers latency — batching can increase per-request and tail latency if misused.

GenAI Observability — Tokens, Traces & SLOs

Collect traces, metrics and token-aware telemetry (TTF, per-token latency/cost, quality) tied by causal IDs for SLO-led

Key Insight

Correlate prompt → tokens → response with causal IDs; instrument model latency, time-to-first-token, per-token latency, hallucination rates and token‑

Often Confused With

APM (Application Performance Monitoring)Logging-only observabilitySLOs vs SLAs

Common Mistakes

Logging prompts alone won't reconstruct flows—use traces, causal IDs and metrics.
Don't equate SLOs with SLAs—alert on measurable SLOs and error budgets, not legal SLAs.
Watching only latency/availability misses quality and token-cost signals (hallucination, relevance, per-token cost).

Vector DB Monitoring — Latency, Recall & Index Health

Track p50/p95/p99 latency, QPS, embedding-similarity distributions, recall@k/MRR/nDCG, freshness, ingestion and compacts

Key Insight

Low latency ≠ good retrieval—combine operational metrics (p95/p99, compaction, replication, ingestion lag) with retrieval quality (recall@k, MRR, nDCG

Often Confused With

Search engine monitoringRanking metrics (MRR, nDCG)

Common Mistakes

Only monitor query latency—ignore recall, freshness and embedding drift.
Assuming high similarity scores guarantee correct answers—ranking and context matter.
Believing an index exists equals healthy—check staleness, compaction and replication status.

Responsible Model Validation (CI + H2H)

CI-driven automated + human-in-the-loop tests for quality, safety, hallucination checks, and controlled rollouts.

Key Insight

Automated metrics catch numeric regressions; human review and adversarial/synthetic tests expose hallucinations and safety failures.

Often Confused With

A/B testingModel monitoringUnit testing

Common Mistakes

Relying only on automated metrics to declare production readiness.
Treating p < 0.05 as proof of business impact without effect-size/context.
Relying only on unit tests; skipping integration/canary and human review.

Drift Detection & Remediation

Continuously detect input, concept, and performance drift via stats, embedding divergence, telemetry, and triggerable SL

Key Insight

Correlate embedding/statistical shifts with labeled performance and infra telemetry to separate transient anomalies from real drift.

Often Confused With

Data validationPerformance monitoringRetraining

Common Mistakes

Retraining immediately on any statistical shift without impact analysis.
Monitoring inputs only; ignoring outputs and labeled performance.
Treating a single spike as persistent drift; not correlating with infra or user-change signals.

RAG (Retrieval‑Augmented Generation): Index→Embed→Ground

End-to-end retrieval + grounding: index, chunk, embed, and trace to locate and stop hallucinations.

Key Insight

Hallucinations often originate in retrieval—indexing, chunking, embedding model or similarity params—not only the LLM.

Often Confused With

Prompt management and governanceEmbedding modelsHybrid search / reranking

Common Mistakes

Blame the LLM first—skip inspecting index quality and retrieval logs.
Default to larger chunks; oversized chunks dilute context and reduce relevance.
Treat retrieved sources as authoritative; citations ≠ correctness.

Prompt Governance: Versioning, Testing & Rollouts

Manage prompts like code: version, template, test, stage rollouts, and monitor metrics to prevent regressions.

Key Insight

Treat prompts as deployable artifacts—use CI/CD, unit/regression tests and canary/A‑B rollouts to trace regressions to wording.

Often Confused With

Prompt engineeringModel fine-tuningA/B/canary testing

Common Mistakes

Treat prompts as informal—skip versioning and approvals.
Hot-fix tiny wording changes in prod without tests or canary rollout.
Rely only on raw I/O logs; skip structured tests and metrics for regressions.

AWS Certified Generative AI Developer - Professional (AIP-C01) Practice Questions

Access Mock Exams & Comprehensive Question Bank

Listen to Audio Podcasts

Expert summaries for AWS Certified Generative AI Developer - Professional (AIP-C01)

Certification Overview

Duration:120 min

Questions:75

Passing:75%

Level:Advanced

Cheat Sheet Content

40Key Concepts

5Exam Domains

AWS Certified Generative AI Developer - Professional (AIP-C01) Ultimate Cheat Sheet

Your Quick Reference Study Guide

AWS Certified Generative AI Developer - Professional (AIP-C01)

Foundation Model Integration, Data Management, and Compliance

Foundation Model Integration, Data Management, and Compliance

Smart Chunking & Provenance Anchors

Foundation Model Selection & Sizing

Model Lifecycle & Retirement

Amazon Bedrock (Managed FM API Layer)

FM Data-Gate: Validation Workflows

Bedrock Data Automation (BDA) — Async Multimodal Extractor

Similarity Metrics & Normalization

OpenSearch + Neural Plugin for Bedrock

ANN — HNSW, IVF & Quantization Tuning

Bedrock KB — Managed Vector Store & Provenance

Prompt Engineering (Templates & Context Windows)

Hallucination Detection & Mitigation

Smart Chunking & Provenance Anchors

Foundation Model Selection & Sizing

Model Lifecycle & Retirement

Amazon Bedrock (Managed FM API Layer)

FM Data-Gate: Validation Workflows

Bedrock Data Automation (BDA) — Async Multimodal Extractor

Similarity Metrics & Normalization

OpenSearch + Neural Plugin for Bedrock

ANN — HNSW, IVF & Quantization Tuning

Bedrock KB — Managed Vector Store & Provenance

Prompt Engineering (Templates & Context Windows)

Hallucination Detection & Mitigation

Implementation and Integration

Implementation and Integration

Agentic AI — Router & Orchestrator

Bedrock Agents — AWS Managed Orchestration

Batching Strategies — Static / Dynamic / Micro / Continuous

Bedrock Provisioned Throughput (Model Units — MUs)

Vector Stores (RAG Indexes)

GenAI Security & Governance

Cost‑Aware Model Cascades

API Gateway: FM Front Door

Token Streaming & Backpressure

OpenAPI for GenAI (API‑First)

AI Safety, Security, and Governance

AI Safety, Security, and Governance

Bedrock Guardrails (ApplyGuardrails)

Prompt Injection & Jailbreak Defense

Least‑Privilege for Foundation Models (Bedrock)

IAM for GenAI: Roles, Policies & MFA

Forensic Traceability — Hash Chains, Merkle, WORM & KMS

CloudTrail — API & Config Audit Trail

Data Masking & Privacy Tech (DP, SMPC, HE)

Grounding & Source Attribution (RAG Provenance)

Operational Efficiency and Optimization for GenAI Applications

Operational Efficiency and Optimization for GenAI Applications

Inference Deployment Patterns (SageMaker & Bedrock)

Token Accounting & Tracking

Token Window & Quota Control (TPM/RPM)

Model & Infra Right-Sizing (GPU, Inferentia, Graviton)

GenAI Observability — Tokens, Traces & SLOs

Vector DB Monitoring — Latency, Recall & Index Health

Testing, Validation, and Troubleshooting

Testing, Validation, and Troubleshooting

Responsible Model Validation (CI + H2H)

Drift Detection & Remediation

RAG (Retrieval‑Augmented Generation): Index→Embed→Ground

Prompt Governance: Versioning, Testing & Rollouts

Certification Overview

Cheat Sheet Content

Similar Cheat Sheets