AWS Certified AI Practitioner (AIF-C01) Ultimate Cheat Sheet

5 Domains • 18 Concepts • Approx. 3 pages

Your Quick Reference Study Guide

This cheat sheet covers the core concepts, terms, and definitions you need to know for the AWS Certified AI Practitioner (AIF-C01). We've distilled the most important domains, topics, and critical details to help your exam preparation.

💡 Note: While this study guide highlights essential concepts, it's designed to complement—not replace—comprehensiv e learning materials. Use it for quick reviews, last-minute prep, or to identify areas that need deeper study before your exam.

AWS Certified AI Practitioner (AIF-C01) Practice Questions

Access Mock Exams & Comprehensive Question Bank

Listen to Audio Podcasts

Expert summaries for AWS Certified AI Practitioner (AIF-C01)

Data Preprocessing (ETL vs ELT & Pipelines)

Clean, validate and transform raw data; pick ETL/ELT and maintain identical pipelines for training and inference.

Key Insight

Training and inference must use the same transforms; choose ETL vs ELT by scale, latency and storage strategy.

Often Confused With

ETL/ELTFeature engineeringSchema design

Common Mistakes

Assume ETL must transform before load; ELT (load-first) is valid for data lakes.
Skip preprocessing at inference — mismatched transforms break predictions.
Believe more transformations always improve models — they can overfit or slow pipelines.

Feature Engineering & Leakage Avoidance

Create predictive inputs: encode, scale, impute and avoid leakage so reported metrics reflect real performance.

Key Insight

Never include target-derived data in features — leakage inflates metrics; choose encodings by cardinality.

Often Confused With

Data preprocessingLabel encodingModel selection

Common Mistakes

Adding features indiscriminately hurts generalization and increases noise.
Defaulting to one-hot for high-cardinality categories — use hashing, target encoding or embeddings.
Assume features must be numeric — encode/bucket categorical and time features appropriately.

Responsible AI Controls — Explainability, Fairness, Privacy

AWS-aligned principles and controls across the ML lifecycle to reduce bias, harm, and compliance risk.

Key Insight

Responsible AI = technical controls + governance + continuous monitoring; transparency is contextual, not dump-and-publish.

Often Confused With

Data PrivacyModel SecurityExplainability

Common Mistakes

Assuming technical fixes alone solve responsibility — governance and stakeholder processes are required
Treating transparency as publishing weights or raw training data in all cases
Relying only on output filters for toxicity — model-level mitigation and evaluation are needed

Amazon SageMaker — End-to-end ML Platform

Managed AWS service for building, training, tuning, deploying, and monitoring ML at scale (includes costs).

Key Insight

SageMaker spans the full ML lifecycle, but resources (notebooks, endpoints, storage) can incur continuous charges and need ops/config.

Often Confused With

Amazon BedrockAWS Lambda

Common Mistakes

Thinking idle SageMaker notebooks cost nothing — volumes, endpoints, and attached resources can still bill
Believing SageMaker is only for deployment — it also supports training, tuning, and model monitoring
Assuming Model Monitor auto-fixes drift — it only detects and alerts unless you implement remediation

Data Preprocessing (ETL vs ELT & Pipelines)

Clean, validate and transform raw data; pick ETL/ELT and maintain identical pipelines for training and inference.

Key Insight

Training and inference must use the same transforms; choose ETL vs ELT by scale, latency and storage strategy.

Often Confused With

ETL/ELTFeature engineeringSchema design

Common Mistakes

Assume ETL must transform before load; ELT (load-first) is valid for data lakes.
Skip preprocessing at inference — mismatched transforms break predictions.
Believe more transformations always improve models — they can overfit or slow pipelines.

Feature Engineering & Leakage Avoidance

Create predictive inputs: encode, scale, impute and avoid leakage so reported metrics reflect real performance.

Key Insight

Never include target-derived data in features — leakage inflates metrics; choose encodings by cardinality.

Often Confused With

Data preprocessingLabel encodingModel selection

Common Mistakes

Adding features indiscriminately hurts generalization and increases noise.
Defaulting to one-hot for high-cardinality categories — use hashing, target encoding or embeddings.
Assume features must be numeric — encode/bucket categorical and time features appropriately.

Responsible AI Controls — Explainability, Fairness, Privacy

AWS-aligned principles and controls across the ML lifecycle to reduce bias, harm, and compliance risk.

Key Insight

Responsible AI = technical controls + governance + continuous monitoring; transparency is contextual, not dump-and-publish.

Often Confused With

Data PrivacyModel SecurityExplainability

Common Mistakes

Assuming technical fixes alone solve responsibility — governance and stakeholder processes are required
Treating transparency as publishing weights or raw training data in all cases
Relying only on output filters for toxicity — model-level mitigation and evaluation are needed

Amazon SageMaker — End-to-end ML Platform

Managed AWS service for building, training, tuning, deploying, and monitoring ML at scale (includes costs).

Key Insight

SageMaker spans the full ML lifecycle, but resources (notebooks, endpoints, storage) can incur continuous charges and need ops/config.

Often Confused With

Amazon BedrockAWS Lambda

Common Mistakes

Thinking idle SageMaker notebooks cost nothing — volumes, endpoints, and attached resources can still bill
Believing SageMaker is only for deployment — it also supports training, tuning, and model monitoring
Assuming Model Monitor auto-fixes drift — it only detects and alerts unless you implement remediation

Transformer (Self‑Attention Core)

Neural architecture using multi‑head self‑attention and positional encodings to model long‑range token dependencies.

Key Insight

Self‑attention connects every token pair; positional encodings provide order and attention costs O(n²) per layer.

Often Confused With

RNNCNNSeq2Seq encoder-decoder

Common Mistakes

Assuming attention alone encodes token order — positional encodings still required.
Using attention weights as literal explanations of model 'reasoning'.
Thinking transformers scale linearly with sequence length — attention is O(n²).

RAG — Retrieval‑Augmented Generation

Combine a retrieval step that supplies relevant documents with a generator that conditions on them to produce grounded L

Key Insight

RAG augments the model at inference with retrieved context (local/cloud index); it reduces hallucinations but doesn't eliminate them.

Often Confused With

Fine-tuningWeb search (live retrieval)

Common Mistakes

Confusing RAG with fine‑tuning — RAG adds context at inference, it doesn't retrain model weights.
Believing RAG guarantees factual accuracy — the generator can still hallucinate or misinterpret sources.
Assuming RAG requires live web access — retrieval can use pre-indexed local or cloud stores.

Foundation Models (FMs) & Amazon Bedrock

Pre‑trained large models accessed via Bedrock's managed, serverless API for selection, customization, and agents.

Key Insight

Bedrock is serverless managed access to multiple FMs — you don't provision GPUs; agents orchestrate FMs, not replace them.

Often Confused With

Amazon SageMakerAWS LambdaEC2/GPU provisioning

Common Mistakes

Assuming Bedrock requires you to provision/manage EC2 GPUs — it's serverless managed.
Treating Agents as FMs or interchangeable with Lambda — agents orchestrate FMs and use connectors.
Believing submitted customization data trains Bedrock's public base models by default.

Generative AI: Models, Use Cases & Risks

Models that synthesize novel content (text, images, code); choose by modality, fidelity, and control needs.

Key Insight

Generative models generalize to create new outputs — pick type (autoregressive, diffusion, encoder–decoder) by output control and modality.

Often Confused With

Discriminative (predictive) modelsClassical probabilistic ML

Common Mistakes

Thinking GenAI replaces discriminative models for structured prediction tasks.
Assuming every generative model is a deep neural net — hybrids and classical approaches exist.
Limiting diffusion models to images — they can be adapted to other modalities.

Textract, Comprehend & Transcribe — AWS IDP Trio

Managed OCR, language, and speech services to extract text, structure, entities and transcriptions.

Key Insight

Textract = structured doc extraction (text/tables/forms); Comprehend = NLU (entities/sentiment); Transcribe = audio->text.

Often Confused With

Amazon RekognitionCustom ML models

Common Mistakes

Assuming OCR is flawless regardless of image quality or layout.
Treating managed services as equivalent to building custom ML models.
Expecting perfect semantic field extraction without layout parsing and validation steps.

Unsupervised Learning — Find Structure, Not Labels

Clustering, anomaly detection, and dimensionality reduction to discover patterns in unlabeled data.

Key Insight

No labels — use for exploration, segmentation or outlier detection; evaluation and feature prep remain essential.

Often Confused With

Supervised learningFeature selection / dimensionality reduction (PCA)

Common Mistakes

Treating unsupervised methods like supervised training without label strategies.
Assuming dimensionality reduction preserves original feature interpretability.
Skipping preprocessing or scaling because the data is unlabeled.

Model Size → Compute & Cost

Parameter count drives memory, compute, latency and cost — optimize with precision, batching, and caching.

Key Insight

Params matter, but activations, precision, sequence length and batching often dominate resource use and cost.

Often Confused With

Model architectureQuantizationSparsity and pruning

Common Mistakes

Assume memory = parameter bytes only (ignores activations & runtime/framework overhead).
Treat two models with the same parameter count as identical in resource needs or accuracy.
Believe bigger parameter count always guarantees better accuracy or ROI.

Zero-shot Prompting

Instruction-only prompting to get tasks done from a pretrained model without task-specific labeled data.

Key Insight

Zero-shot exploits pretrained knowledge and instruction-following — prompt clarity, constraints and format drive success.

Often Confused With

Few-shot promptingFine-tuningZero-shot learning (ML literature)

Common Mistakes

Assume zero-shot means the model had no prior training.
Treat zero-shot prompting as identical to zero-shot learning about unseen classes.
Expect zero-shot to match fine-tuned accuracy for niche or domain-specific tasks.

AWS Shared Responsibility

Who secures what: AWS handles cloud infrastructure; you secure guest OS, apps, data — responsibility grows toward IaaS.

Key Insight

Security boundary shifts: customer responsibility increases moving SaaS → PaaS → IaaS; know who patches and who configures.

Often Confused With

IaaS vs SaaSVPC securityCompliance certifications

Common Mistakes

Assuming AWS patches guest OS and apps on EC2
Believing AWS certifications automatically make your workload compliant
Thinking SaaS means AWS manages your application permissions and keys

IAM & MFA — Access Rules

Manage identities, roles and JSON policies (Effect/Action/Resource/Condition); enable MFA and enforce least-privilege.

Key Insight

Permissions are additive; explicit Deny, permission boundaries or Org SCPs can block Allows — and the root account is special.

Often Confused With

AWS Organizations SCPsPermission boundaries

Common Mistakes

Treating root like a normal IAM user — it can't be fully constrained by IAM policies
Believing user policies always override group policies — permissions add and Deny wins
Thinking MFA eliminates the need for strong passwords

Model Pricing & TCO Calculator

Estimate TCO: base‑model fees, fine‑tuning (HP searches & runs), checkpoint storage, inference, and egress.

Key Insight

Fine‑tuning cost = base fees + many training runs (hyperparameter searches) + checkpoint storage + egress.

Often Confused With

Inference‑only pricingStorage & Data‑transfer pricingHosted model vs fine‑tuned model costs

Common Mistakes

Treating inference as free — inference has per‑token/request or per‑sec GPU charges.
Counting only the final training run — hyperparameter searches and repeated experiments multiply compute.
Omitting checkpoint storage and cross‑region egress from TCO estimates.

Pay‑As‑You‑Go (Opex) Model

Pay for consumed units (compute/storage); ideal for variable AI workloads but requires active cost controls.

Key Insight

Shifts CapEx to Opex—flexible but can spike; use monitoring, budgets, and commit/volume discounts for steady loads.

Often Confused With

Reserved / Committed pricingOn‑premises hardware purchase

Common Mistakes

Assuming pay‑as‑you‑go is always cheaper than buying hardware or reserved instances.
Skipping cost monitoring because 'you only pay for use' — unexpected spikes still occur.
Ignoring commitment/volume discounts — long‑term options and savings plans exist.

AWS Certified AI Practitioner (AIF-C01) Practice Questions

Access Mock Exams & Comprehensive Question Bank

Listen to Audio Podcasts

Expert summaries for AWS Certified AI Practitioner (AIF-C01)

Certification Overview

Duration:120 min

Questions:65

Passing:70%

Level:Basic

Cheat Sheet Content

18Key Concepts

5Exam Domains

AWS Certified AI Practitioner (AIF-C01) Ultimate Cheat Sheet

Your Quick Reference Study Guide

AWS Certified AI Practitioner (AIF-C01)

Fundamentals of AI and ML

Fundamentals of AI and ML

Data Preprocessing (ETL vs ELT & Pipelines)

Feature Engineering & Leakage Avoidance

Responsible AI Controls — Explainability, Fairness, Privacy

Amazon SageMaker — End-to-end ML Platform

Data Preprocessing (ETL vs ELT & Pipelines)

Feature Engineering & Leakage Avoidance

Responsible AI Controls — Explainability, Fairness, Privacy

Amazon SageMaker — End-to-end ML Platform

Fundamentals of GenAI

Fundamentals of GenAI

Transformer (Self‑Attention Core)

RAG — Retrieval‑Augmented Generation

Foundation Models (FMs) & Amazon Bedrock

Generative AI: Models, Use Cases & Risks

Applications of Foundation Models

Applications of Foundation Models

Textract, Comprehend & Transcribe — AWS IDP Trio

Unsupervised Learning — Find Structure, Not Labels

Model Size → Compute & Cost

Zero-shot Prompting

Guidelines for Responsible AI

Guidelines for Responsible AI

Model Metrics — Classification vs Regression (MSE)

AI Governance & Accountability

Security, Compliance, and Governance for AI Solutions

Security, Compliance, and Governance for AI Solutions

AWS Shared Responsibility

IAM & MFA — Access Rules

Model Pricing & TCO Calculator

Pay‑As‑You‑Go (Opex) Model

Certification Overview

Cheat Sheet Content

Similar Cheat Sheets