AWS Certified AI Practitioner (AIF-C01) Ultimate Cheat Sheet
Your Quick Reference Study Guide
This cheat sheet covers the core concepts, terms, and definitions you need to know for the AWS Certified AI Practitioner (AIF-C01). We've distilled the most important domains, topics, and critical details to help your exam preparation.
💡 Note: While this study guide highlights essential concepts, it's designed to complement—not replace—comprehensiv e learning materials. Use it for quick reviews, last-minute prep, or to identify areas that need deeper study before your exam.
About This Cheat Sheet: This study guide covers core concepts for AWS Certified AI Practitioner (AIF-C01). It highlights key terms, definitions, common mistakes, and frequently confused topics to support your exam preparation.
Use this as a quick reference alongside comprehensive study materials.
AWS Certified AI Practitioner (AIF-C01)
Cheat Sheet •
About This Cheat Sheet: This study guide covers core concepts for AWS Certified AI Practitioner (AIF-C01). It highlights key terms, definitions, common mistakes, and frequently confused topics to support your exam preparation.
Use this as a quick reference alongside comprehensive study materials.
Fundamentals of AI and ML
20%Data Preprocessing (ETL vs ELT & Pipelines)
Clean, validate and transform raw data; pick ETL/ELT and maintain identical pipelines for training and inference.
Key Insight
Training and inference must use the same transforms; choose ETL vs ELT by scale, latency and storage strategy.
Often Confused With
Common Mistakes
- Assume ETL must transform before load; ELT (load-first) is valid for data lakes.
- Skip preprocessing at inference — mismatched transforms break predictions.
- Believe more transformations always improve models — they can overfit or slow pipelines.
Feature Engineering & Leakage Avoidance
Create predictive inputs: encode, scale, impute and avoid leakage so reported metrics reflect real performance.
Key Insight
Never include target-derived data in features — leakage inflates metrics; choose encodings by cardinality.
Often Confused With
Common Mistakes
- Adding features indiscriminately hurts generalization and increases noise.
- Defaulting to one-hot for high-cardinality categories — use hashing, target encoding or embeddings.
- Assume features must be numeric — encode/bucket categorical and time features appropriately.
Responsible AI Controls — Explainability, Fairness, Privacy
AWS-aligned principles and controls across the ML lifecycle to reduce bias, harm, and compliance risk.
Key Insight
Responsible AI = technical controls + governance + continuous monitoring; transparency is contextual, not dump-and-publish.
Often Confused With
Common Mistakes
- Assuming technical fixes alone solve responsibility — governance and stakeholder processes are required
- Treating transparency as publishing weights or raw training data in all cases
- Relying only on output filters for toxicity — model-level mitigation and evaluation are needed
Amazon SageMaker — End-to-end ML Platform
Managed AWS service for building, training, tuning, deploying, and monitoring ML at scale (includes costs).
Key Insight
SageMaker spans the full ML lifecycle, but resources (notebooks, endpoints, storage) can incur continuous charges and need ops/config.
Often Confused With
Common Mistakes
- Thinking idle SageMaker notebooks cost nothing — volumes, endpoints, and attached resources can still bill
- Believing SageMaker is only for deployment — it also supports training, tuning, and model monitoring
- Assuming Model Monitor auto-fixes drift — it only detects and alerts unless you implement remediation
Data Preprocessing (ETL vs ELT & Pipelines)
Clean, validate and transform raw data; pick ETL/ELT and maintain identical pipelines for training and inference.
Key Insight
Training and inference must use the same transforms; choose ETL vs ELT by scale, latency and storage strategy.
Often Confused With
Common Mistakes
- Assume ETL must transform before load; ELT (load-first) is valid for data lakes.
- Skip preprocessing at inference — mismatched transforms break predictions.
- Believe more transformations always improve models — they can overfit or slow pipelines.
Feature Engineering & Leakage Avoidance
Create predictive inputs: encode, scale, impute and avoid leakage so reported metrics reflect real performance.
Key Insight
Never include target-derived data in features — leakage inflates metrics; choose encodings by cardinality.
Often Confused With
Common Mistakes
- Adding features indiscriminately hurts generalization and increases noise.
- Defaulting to one-hot for high-cardinality categories — use hashing, target encoding or embeddings.
- Assume features must be numeric — encode/bucket categorical and time features appropriately.
Responsible AI Controls — Explainability, Fairness, Privacy
AWS-aligned principles and controls across the ML lifecycle to reduce bias, harm, and compliance risk.
Key Insight
Responsible AI = technical controls + governance + continuous monitoring; transparency is contextual, not dump-and-publish.
Often Confused With
Common Mistakes
- Assuming technical fixes alone solve responsibility — governance and stakeholder processes are required
- Treating transparency as publishing weights or raw training data in all cases
- Relying only on output filters for toxicity — model-level mitigation and evaluation are needed
Amazon SageMaker — End-to-end ML Platform
Managed AWS service for building, training, tuning, deploying, and monitoring ML at scale (includes costs).
Key Insight
SageMaker spans the full ML lifecycle, but resources (notebooks, endpoints, storage) can incur continuous charges and need ops/config.
Often Confused With
Common Mistakes
- Thinking idle SageMaker notebooks cost nothing — volumes, endpoints, and attached resources can still bill
- Believing SageMaker is only for deployment — it also supports training, tuning, and model monitoring
- Assuming Model Monitor auto-fixes drift — it only detects and alerts unless you implement remediation
Fundamentals of GenAI
24%Transformer (Self‑Attention Core)
Neural architecture using multi‑head self‑attention and positional encodings to model long‑range token dependencies.
Key Insight
Self‑attention connects every token pair; positional encodings provide order and attention costs O(n²) per layer.
Often Confused With
Common Mistakes
- Assuming attention alone encodes token order — positional encodings still required.
- Using attention weights as literal explanations of model 'reasoning'.
- Thinking transformers scale linearly with sequence length — attention is O(n²).
RAG — Retrieval‑Augmented Generation
Combine a retrieval step that supplies relevant documents with a generator that conditions on them to produce grounded L
Key Insight
RAG augments the model at inference with retrieved context (local/cloud index); it reduces hallucinations but doesn't eliminate them.
Often Confused With
Common Mistakes
- Confusing RAG with fine‑tuning — RAG adds context at inference, it doesn't retrain model weights.
- Believing RAG guarantees factual accuracy — the generator can still hallucinate or misinterpret sources.
- Assuming RAG requires live web access — retrieval can use pre-indexed local or cloud stores.
Foundation Models (FMs) & Amazon Bedrock
Pre‑trained large models accessed via Bedrock's managed, serverless API for selection, customization, and agents.
Key Insight
Bedrock is serverless managed access to multiple FMs — you don't provision GPUs; agents orchestrate FMs, not replace them.
Often Confused With
Common Mistakes
- Assuming Bedrock requires you to provision/manage EC2 GPUs — it's serverless managed.
- Treating Agents as FMs or interchangeable with Lambda — agents orchestrate FMs and use connectors.
- Believing submitted customization data trains Bedrock's public base models by default.
Generative AI: Models, Use Cases & Risks
Models that synthesize novel content (text, images, code); choose by modality, fidelity, and control needs.
Key Insight
Generative models generalize to create new outputs — pick type (autoregressive, diffusion, encoder–decoder) by output control and modality.
Often Confused With
Common Mistakes
- Thinking GenAI replaces discriminative models for structured prediction tasks.
- Assuming every generative model is a deep neural net — hybrids and classical approaches exist.
- Limiting diffusion models to images — they can be adapted to other modalities.
Applications of Foundation Models
28%Textract, Comprehend & Transcribe — AWS IDP Trio
Managed OCR, language, and speech services to extract text, structure, entities and transcriptions.
Key Insight
Textract = structured doc extraction (text/tables/forms); Comprehend = NLU (entities/sentiment); Transcribe = audio->text.
Often Confused With
Common Mistakes
- Assuming OCR is flawless regardless of image quality or layout.
- Treating managed services as equivalent to building custom ML models.
- Expecting perfect semantic field extraction without layout parsing and validation steps.
Unsupervised Learning — Find Structure, Not Labels
Clustering, anomaly detection, and dimensionality reduction to discover patterns in unlabeled data.
Key Insight
No labels — use for exploration, segmentation or outlier detection; evaluation and feature prep remain essential.
Often Confused With
Common Mistakes
- Treating unsupervised methods like supervised training without label strategies.
- Assuming dimensionality reduction preserves original feature interpretability.
- Skipping preprocessing or scaling because the data is unlabeled.
Model Size → Compute & Cost
Parameter count drives memory, compute, latency and cost — optimize with precision, batching, and caching.
Key Insight
Params matter, but activations, precision, sequence length and batching often dominate resource use and cost.
Often Confused With
Common Mistakes
- Assume memory = parameter bytes only (ignores activations & runtime/framework overhead).
- Treat two models with the same parameter count as identical in resource needs or accuracy.
- Believe bigger parameter count always guarantees better accuracy or ROI.
Zero-shot Prompting
Instruction-only prompting to get tasks done from a pretrained model without task-specific labeled data.
Key Insight
Zero-shot exploits pretrained knowledge and instruction-following — prompt clarity, constraints and format drive success.
Often Confused With
Common Mistakes
- Assume zero-shot means the model had no prior training.
- Treat zero-shot prompting as identical to zero-shot learning about unseen classes.
- Expect zero-shot to match fine-tuned accuracy for niche or domain-specific tasks.
Guidelines for Responsible AI
14%Model Metrics — Classification vs Regression (MSE)
Choose metrics tied to business cost: precision/recall for imbalance, AUC for ranking, MSE/MAE for regression.
Key Insight
Match metric to business impact (missed cost, false-alarm cost); monitor drift and set data-driven alert thresholds.
Often Confused With
Common Mistakes
- Applying a universal threshold (e.g., 90% accuracy) regardless of context.
- Using accuracy as the sole metric for imbalanced classes.
- Assuming monitoring tools will auto-fix drift without human playbooks.
AI Governance & Accountability
Assigned roles, policies, audit trails and remediation plans that ensure safe, auditable AI — ongoing and cross‑team.
Key Insight
Accountability = assigned decision authority + evidence trails + cross‑functional oversight; transparency alone doesn't suffice.
Often Confused With
Common Mistakes
- Thinking transparency (explanations) equals accountability.
- Treating governance as a one-time project rather than continuous oversight.
- Assuming only technical teams are responsible for governance.
Security, Compliance, and Governance for AI Solutions
14%AWS Shared Responsibility
Who secures what: AWS handles cloud infrastructure; you secure guest OS, apps, data — responsibility grows toward IaaS.
Key Insight
Security boundary shifts: customer responsibility increases moving SaaS → PaaS → IaaS; know who patches and who configures.
Often Confused With
Common Mistakes
- Assuming AWS patches guest OS and apps on EC2
- Believing AWS certifications automatically make your workload compliant
- Thinking SaaS means AWS manages your application permissions and keys
IAM & MFA — Access Rules
Manage identities, roles and JSON policies (Effect/Action/Resource/Condition); enable MFA and enforce least-privilege.
Key Insight
Permissions are additive; explicit Deny, permission boundaries or Org SCPs can block Allows — and the root account is special.
Often Confused With
Common Mistakes
- Treating root like a normal IAM user — it can't be fully constrained by IAM policies
- Believing user policies always override group policies — permissions add and Deny wins
- Thinking MFA eliminates the need for strong passwords
Model Pricing & TCO Calculator
Estimate TCO: base‑model fees, fine‑tuning (HP searches & runs), checkpoint storage, inference, and egress.
Key Insight
Fine‑tuning cost = base fees + many training runs (hyperparameter searches) + checkpoint storage + egress.
Often Confused With
Common Mistakes
- Treating inference as free — inference has per‑token/request or per‑sec GPU charges.
- Counting only the final training run — hyperparameter searches and repeated experiments multiply compute.
- Omitting checkpoint storage and cross‑region egress from TCO estimates.
Pay‑As‑You‑Go (Opex) Model
Pay for consumed units (compute/storage); ideal for variable AI workloads but requires active cost controls.
Key Insight
Shifts CapEx to Opex—flexible but can spike; use monitoring, budgets, and commit/volume discounts for steady loads.
Often Confused With
Common Mistakes
- Assuming pay‑as‑you‑go is always cheaper than buying hardware or reserved instances.
- Skipping cost monitoring because 'you only pay for use' — unexpected spikes still occur.
- Ignoring commitment/volume discounts — long‑term options and savings plans exist.
Certification Overview
Cheat Sheet Content
Similar Cheat Sheets
- CCNA Exam v1.1 (200-301) Cheat Sheet
- AWS Certified Cloud Practitioner (CLF-C02) Cheat Sheet
- Exam AI-900: Microsoft Azure AI Fundamentals Cheat Sheet
- Google Cloud Professional Cloud Architect Cheat Sheet
- Google Cloud Security Operations Engineer Exam Cheat Sheet
- Google Cloud Professional Cloud Security Engineer Cheat Sheet