Google Cloud Professional Cloud Architect Ultimate Cheat Sheet

6 Domains • 44 Concepts • Approx. 6 pages

Your Quick Reference Study Guide

This cheat sheet covers the core concepts, terms, and definitions you need to know for the Google Cloud Professional Cloud Architect. We've distilled the most important domains, topics, and critical details to help your exam preparation.

💡 Note: While this study guide highlights essential concepts, it's designed to complement—not replace—comprehensiv e learning materials. Use it for quick reviews, last-minute prep, or to identify areas that need deeper study before your exam.

Google Cloud Professional Cloud Architect Practice Questions

Access Mock Exams & Comprehensive Question Bank

Listen to Audio Podcasts

Expert summaries for Google Cloud Professional Cloud Architect

TCO — Full Lifecycle Cost

Estimate all lifecycle costs — migration, licensing, infra, ops, training, downtime — to compare architectures and make,

Key Insight

TCO = upfront + ongoing + hidden costs (migration, training, downtime). Lower initial spend can raise lifecycle cost.

Often Confused With

CapEx vs OpExReturn on Investment (ROI)Cloud billing

Common Mistakes

Counting only upfront cloud bills or CapEx and ignoring migration, ops, training, and downtime.
Assuming lift-and-shift always yields lower TCO than refactor or platform-native redesign.

Scalability & Performance Targets

Quantified load, concurrency, latency and throughput targets that drive autoscaling, partitioning, caching and resource選

Key Insight

Translate SLA numbers into autoscale thresholds, partitioning/sharding and cache tiers — trade-offs exist between latency, throughput, consistency.

Often Confused With

Capacity PlanningHigh Availability (HA)Autoscaling

Common Mistakes

Treating autoscaling as a substitute for capacity planning and load/performance testing.
Blaming latency solely on the network instead of app design, caching, or storage choices.
Provisioning for peak as 'typical' instead of using elasticity and cost-aware scaling policies.

Inference Modes: Batch vs Online vs Cache

Choose online for low‑latency SLAs, batch for high throughput/cost savings, caching/hybrid for repeated or bursty loads.

Key Insight

Match SLA to traffic: provisioned online (Vertex AI Endpoints) for steady low latency; Vertex AI Batch Prediction for throughput; add caching (Memorеs

Often Confused With

Model CachingEdge/On‑device InferenceStreaming Inference

Common Mistakes

Assuming autoscaling removes cold starts — model load and I/O still cause latency spikes.
Writing off batch as 'too slow' — micro‑batches/streaming and precompute can meet near‑real‑time needs.
Assuming serverless endpoints always beat VMs — cold starts and platform limits impact latency and cost.

HA & Failover: Patterns and RTO/RPO Tradeoffs

Design multi‑zone/region redundancy, global LBs, and replicated storage (Spanner/Cloud SQL HA/Cloud Storage) with tested

Key Insight

HA = redundancy + orchestrated failover: active‑active for minimal RTO, active‑passive to save cost; data layer choices (sync vs async) set RPO.

Often Confused With

Disaster Recovery (DR)BackupsAutoscaling

Common Mistakes

Spreading VMs across zones but keeping a single‑region DB — you still have a single point of failure.
Treating more replicas as instant recovery — ignores replication lag and promotion orchestration.
Relying on backups as HA — restores are manual with high RTO, not automatic failover.

VPC Design (Virtual Private Cloud)

Plan subnets/IPs, routing, peering/Shared VPC, hybrid links and firewall zones to meet connectivity and security.

Key Insight

Design IP addressing and routing first: peering isn't transitive, Shared VPC centralizes control, and VPN/Interconnect don't replace firewalls.

Often Confused With

Shared VPCCloud VPN / InterconnectFirewall rules

Common Mistakes

Assuming VPC peering is transitive and will route via a third VPC.
Treating cloud firewalls as stateless — GCP firewalls track sessions (return traffic allowed).
Believing VPN/Interconnect provide app-layer segmentation so you can skip firewall policies.

VPC Network Peering (Private Backbone)

Private, high‑bandwidth internal link between VPCs using Google’s backbone; exchanges routes directly but has topology/D

Key Insight

Peering exchanges routes but is non‑transitive, rejects overlapping CIDRs, and does not provide private DNS or internet/on‑prem transit.

Often Confused With

Shared VPCCloud VPN / InterconnectPrivate DNS / Cloud DNS

Common Mistakes

Treating peering as transitive (A→B and B→C ⇒ A→C).
Expecting automatic private DNS name resolution across peered VPCs.
Attempting peering with overlapping IP ranges — routes will be rejected.

Multicloud Integration: Data Gravity & Trust

Place and operate workloads across on‑prem and clouds by balancing data gravity, identity, networking, and migration tr

Key Insight

Data gravity dictates placement: keep compute near large datasets, federate identity, and minimize cross‑cloud egress.

Often Confused With

Hybrid cloudCloud bursting

Common Mistakes

Assuming identical ML pipelines run unchanged across clouds
Treating 'retire' as immediate deletion without retention/compliance check
Believing one integration approach fits all workloads

Anthos — Hybrid & Multicloud Kubernetes

Anthos provides a consistent Kubernetes control plane, policy, and lifecycle across on‑prem and other clouds.

Key Insight

Anthos centralizes control and policy but does not remove node/hardware ops; use it to run workloads where data lives, not to magically move data.

Often Confused With

GKECloud Run (fully managed)

Common Mistakes

Treating Anthos as fully managed GKE with no infrastructure operations
Assuming Anthos automatically migrates or replicates data to GCP
Expecting Cloud Run on Anthos to behave identical to fully managed Cloud Run

Compute Platform Selection — GKE • Cloud Run • App Engine • Functions • VMs

Map workload traits to GCP compute: VMs for control, GKE for containers, Cloud Run/App Engine/Functions for managed ops.

Key Insight

Trade control vs. managed: Compute Engine = max control; GKE = orchestrated containers; Cloud Run = stateless containers; App Engine = opinionated Paa

Often Confused With

Lift‑and‑shift (Rehost)Refactor (Partial rewrite)Managed service ≠ no‑ops / vendor lock‑in

Common Mistakes

Assuming modernization requires a full rewrite — use Strangler Fig for incremental moves
Treating managed services as zero‑ops; they still need integration, config, and can lock you in
Equating lift‑and‑shift with cloud‑native; misses operational, scaling, and cost tradeoffs

Data Migration & Schema Evolution — CDC, Dual‑Write, Expand→Contract

Move and evolve data with minimal downtime: CDC, dual‑write, expand‑then‑contract, backfills, schema registries and cut‑

Key Insight

Design as expand‑then‑contract + schema versioning + automated reconciliation; assume transient inconsistency and plan rollback

Often Confused With

Dual‑writeCDC (Change Data Capture)Bulk dump/restore (offline migration)

Common Mistakes

Treating dual‑writes as automatically consistent — they introduce drift and need reconciliation
Assuming CDC guarantees cross‑system transactional consistency
Skipping rollback or verification because staged cutover 'should' be safe

TCO — Full Lifecycle Cost

Estimate all lifecycle costs — migration, licensing, infra, ops, training, downtime — to compare architectures and make,

Key Insight

TCO = upfront + ongoing + hidden costs (migration, training, downtime). Lower initial spend can raise lifecycle cost.

Often Confused With

CapEx vs OpExReturn on Investment (ROI)Cloud billing

Common Mistakes

Counting only upfront cloud bills or CapEx and ignoring migration, ops, training, and downtime.
Assuming lift-and-shift always yields lower TCO than refactor or platform-native redesign.

Scalability & Performance Targets

Quantified load, concurrency, latency and throughput targets that drive autoscaling, partitioning, caching and resource選

Key Insight

Translate SLA numbers into autoscale thresholds, partitioning/sharding and cache tiers — trade-offs exist between latency, throughput, consistency.

Often Confused With

Capacity PlanningHigh Availability (HA)Autoscaling

Common Mistakes

Treating autoscaling as a substitute for capacity planning and load/performance testing.
Blaming latency solely on the network instead of app design, caching, or storage choices.
Provisioning for peak as 'typical' instead of using elasticity and cost-aware scaling policies.

Inference Modes: Batch vs Online vs Cache

Choose online for low‑latency SLAs, batch for high throughput/cost savings, caching/hybrid for repeated or bursty loads.

Key Insight

Match SLA to traffic: provisioned online (Vertex AI Endpoints) for steady low latency; Vertex AI Batch Prediction for throughput; add caching (Memorеs

Often Confused With

Model CachingEdge/On‑device InferenceStreaming Inference

Common Mistakes

Assuming autoscaling removes cold starts — model load and I/O still cause latency spikes.
Writing off batch as 'too slow' — micro‑batches/streaming and precompute can meet near‑real‑time needs.
Assuming serverless endpoints always beat VMs — cold starts and platform limits impact latency and cost.

HA & Failover: Patterns and RTO/RPO Tradeoffs

Design multi‑zone/region redundancy, global LBs, and replicated storage (Spanner/Cloud SQL HA/Cloud Storage) with tested

Key Insight

HA = redundancy + orchestrated failover: active‑active for minimal RTO, active‑passive to save cost; data layer choices (sync vs async) set RPO.

Often Confused With

Disaster Recovery (DR)BackupsAutoscaling

Common Mistakes

Spreading VMs across zones but keeping a single‑region DB — you still have a single point of failure.
Treating more replicas as instant recovery — ignores replication lag and promotion orchestration.
Relying on backups as HA — restores are manual with high RTO, not automatic failover.

VPC Design (Virtual Private Cloud)

Plan subnets/IPs, routing, peering/Shared VPC, hybrid links and firewall zones to meet connectivity and security.

Key Insight

Design IP addressing and routing first: peering isn't transitive, Shared VPC centralizes control, and VPN/Interconnect don't replace firewalls.

Often Confused With

Shared VPCCloud VPN / InterconnectFirewall rules

Common Mistakes

Assuming VPC peering is transitive and will route via a third VPC.
Treating cloud firewalls as stateless — GCP firewalls track sessions (return traffic allowed).
Believing VPN/Interconnect provide app-layer segmentation so you can skip firewall policies.

VPC Network Peering (Private Backbone)

Private, high‑bandwidth internal link between VPCs using Google’s backbone; exchanges routes directly but has topology/D

Key Insight

Peering exchanges routes but is non‑transitive, rejects overlapping CIDRs, and does not provide private DNS or internet/on‑prem transit.

Often Confused With

Shared VPCCloud VPN / InterconnectPrivate DNS / Cloud DNS

Common Mistakes

Treating peering as transitive (A→B and B→C ⇒ A→C).
Expecting automatic private DNS name resolution across peered VPCs.
Attempting peering with overlapping IP ranges — routes will be rejected.

Multicloud Integration: Data Gravity & Trust

Place and operate workloads across on‑prem and clouds by balancing data gravity, identity, networking, and migration tr

Key Insight

Data gravity dictates placement: keep compute near large datasets, federate identity, and minimize cross‑cloud egress.

Often Confused With

Hybrid cloudCloud bursting

Common Mistakes

Assuming identical ML pipelines run unchanged across clouds
Treating 'retire' as immediate deletion without retention/compliance check
Believing one integration approach fits all workloads

Anthos — Hybrid & Multicloud Kubernetes

Anthos provides a consistent Kubernetes control plane, policy, and lifecycle across on‑prem and other clouds.

Key Insight

Anthos centralizes control and policy but does not remove node/hardware ops; use it to run workloads where data lives, not to magically move data.

Often Confused With

GKECloud Run (fully managed)

Common Mistakes

Treating Anthos as fully managed GKE with no infrastructure operations
Assuming Anthos automatically migrates or replicates data to GCP
Expecting Cloud Run on Anthos to behave identical to fully managed Cloud Run

Compute Platform Selection — GKE • Cloud Run • App Engine • Functions • VMs

Map workload traits to GCP compute: VMs for control, GKE for containers, Cloud Run/App Engine/Functions for managed ops.

Key Insight

Trade control vs. managed: Compute Engine = max control; GKE = orchestrated containers; Cloud Run = stateless containers; App Engine = opinionated Paa

Often Confused With

Lift‑and‑shift (Rehost)Refactor (Partial rewrite)Managed service ≠ no‑ops / vendor lock‑in

Common Mistakes

Assuming modernization requires a full rewrite — use Strangler Fig for incremental moves
Treating managed services as zero‑ops; they still need integration, config, and can lock you in
Equating lift‑and‑shift with cloud‑native; misses operational, scaling, and cost tradeoffs

Data Migration & Schema Evolution — CDC, Dual‑Write, Expand→Contract

Move and evolve data with minimal downtime: CDC, dual‑write, expand‑then‑contract, backfills, schema registries and cut‑

Key Insight

Design as expand‑then‑contract + schema versioning + automated reconciliation; assume transient inconsistency and plan rollback

Often Confused With

Dual‑writeCDC (Change Data Capture)Bulk dump/restore (offline migration)

Common Mistakes

Treating dual‑writes as automatically consistent — they introduce drift and need reconciliation
Assuming CDC guarantees cross‑system transactional consistency
Skipping rollback or verification because staged cutover 'should' be safe

Hybrid Connectivity — Interconnect, HA‑VPN & Cloud Router

Link on‑prem/multi‑cloud: Interconnect for bandwidth/SLA, HA‑VPN for encrypted failover, Cloud Router for BGP.

Key Insight

Physical links (Dedicated/Partner Interconnect) give capacity & SLA; Cloud Router only exchanges BGP routes — it doesn't add redundancy or bandwidth.

Often Confused With

Dedicated InterconnectPartner InterconnectCloud VPN (HA‑VPN)

Common Mistakes

Assume Partner Interconnect is always cheaper or lower latency than Dedicated.
Rely on Cloud Router for physical redundancy — it's a routing control plane only.
Pick VPN because it's 'cheapest'—ignore sustained throughput, latency, SLA, and operational cost of many tunnels.

Subnet & IP Design — VPC, GKE Pods/Services

Plan CIDRs for VPCs, nodes, pods and services; reserve GKE secondary ranges and prevent overlaps across projects/VPCs.

Key Insight

Primary subnet CIDR is effectively immutable for planning; use reserved secondary ranges for GKE and avoid overlaps—overlaps break peering and complic

Often Confused With

Alias IPsGKE Pod/Service CIDRsVPC Peering

Common Mistakes

Assume alias IPs auto‑resolve cross‑project or cross‑VPC CIDR overlaps.
Treat ClusterIP addresses as coming from the node subnet — they come from the service CIDR.
Think subnet size only affects IP count — it also affects peering, routing, firewall rules, and migrations.

BigQuery — ML Canonical Store

Structured, analytical store for ML: partition/cluster, SQL feature engineering, and Vertex AI batch I/O.

Key Insight

Partition+cluster to cut scan costs; ideal for batch training/feature joins — not for sub‑100ms online lookups.

Often Confused With

Vertex AI Feature StoreCloud StorageBigtable

Common Mistakes

Believing batch predictions can only write to Cloud Storage — BigQuery can be a direct prediction target.
Assuming streaming inserts are zero-latency or unlimited — they incur latency and are quota‑controlled.
Using BigQuery for low‑latency online prediction/lookup — it's analytical, not an OLTP or real‑time KV store.

GCS — Model Artifacts & Batch Storage

Durable object storage for model artifacts: choose storage class + lifecycle + versioning for cost and compliance.

Key Insight

Pick storage class for access pattern and use lifecycle to automate cost cuts; versioning/retention holds block deletions.

Often Confused With

Persistent Disk (PD)Filestore (NFS)BigQuery

Common Mistakes

Trying to delete noncurrent versions without bucket versioning — versioning must be enabled.
Expecting POSIX-style, low-latency behavior for many small reads/writes — object storage suits large objects/sequential I/O.
Assuming lifecycle rules run instantly or override holds — lifecycle is asynchronous and cannot bypass retention holds/locks.

Accelerator Choice — CPU, GPU, TPU, Edge

Pick CPU/GPU/TPU/edge based on model precision, memory/IO, latency, throughput, and provisioning cost limits.

Key Insight

Match the bottleneck: FLOPS-bound → GPU/TPU; memory/IO-bound → more RAM or better interconnect; latency-sensitive → edge/quantized models.

Often Confused With

vCPU scalingTPU vs GPUEdge inference

Common Mistakes

Assuming adding accelerators always gives linear speedup and lower cost.
Treating TPUs as drop-in GPU replacements without code/XLA/op changes.
Ignoring memory, network and IO limits — they can dominate latency and cost.

Cloud Load Balancers — Type, Scope, Health

Choose HTTP(S)/TCP/UDP and global vs regional LB, wire health checks to autoscalers, and weigh CDN/SSL trade-offs for HA

Key Insight

Global LB provides anycast IP and cross-region failover but NOT automatic app-state replication; scope choice must consider latency, compliance, and D

Often Confused With

Traffic DirectorCloud CDNRegional Internal Load Balancer

Common Mistakes

Expecting perfectly even traffic distribution — weights, capacity and proximity affect routing.
Assuming a global LB obviates cross-region data replication or DR configuration.
Using only TCP/ICMP health checks and missing HTTP(S) endpoint or status-code validations.

Vertex AI Pipelines — Compile, Submit, Schedule

Managed service to compose, compile (YAML/JSON), submit and schedule end-to-end ML pipelines on Vertex.

Key Insight

A compiled pipeline is an artifact, not a running job — you must submit it and configure per-step resources, retries, and retention.

Often Confused With

Kubeflow PipelinesVertex CustomJobsAI Platform Pipelines

Common Mistakes

Assuming a compiled pipeline auto-runs without submission to Vertex.
Expecting automatic per-step scaling without specifying machine types/worker pools.
Believing pipelines must run on a separate Kubeflow cluster and can't call CustomJobs.

Dataflow (Apache Beam): Batch & Streaming Prep

Cloud Dataflow is the managed Apache Beam runner for scalable batch/stream transforms with BigQuery, GCS, Pub/Sub, Big‑/

Key Insight

Beam = SDK/model; Dataflow = managed runner — choose windowing, triggers, connectors and autoscaling settings to meet latency and consistency SLAs.

Often Confused With

Apache BeamCloud Pub/SubCloud Dataproc

Common Mistakes

Thinking Dataflow is streaming-only; it also handles optimized bounded (batch) jobs.
Believing you must write raw Beam code every time — templates, SQL, and Flex options exist.
Relying on autoscaling alone for bursts — use windowing, triggers and backpressure controls.

Vertex AI Prediction: Online vs Batch

Managed serving: low‑latency online endpoints for real‑time; batch jobs (GCS I/O) for high‑throughput offline inference.

Key Insight

Pick by SLA: online for sub‑second user requests; batch for cost‑efficient bulk inference that tolerates minutes/hours delay.

Often Confused With

Online EndpointsBatch Prediction Jobs

Common Mistakes

Expecting batch jobs to meet sub‑second, user‑facing latency
Assuming online and batch have identical cost, SLA, and performance profiles
Ignoring endpoint cold‑starts and autoscaler warm‑up delays

Vertex AI Prebuilt APIs — Use & Tradeoffs

Managed multimodal ML APIs for fast integration — tradeoffs: cost, latency, rate limits, and domain accuracy.

Key Insight

Fastest to deploy but always model cost/latency/throughput, secure auth/data flow, and plan fallbacks for domain gaps.

Often Confused With

Custom‑trained Vertex modelsSelf‑hosted / On‑prem inference

Common Mistakes

Assuming prebuilt APIs run locally or on‑prem by default
Skipping call‑volume, payload size, or feature‑choice cost modelling
Treating remote API latency, rate limits, and cold‑starts as negligible

IAM — Roles, Service Accounts & Least Privilege

Control which principals do what on GCP resources; use scoped roles, service accounts, and Workload Identity to enforce

Key Insight

Grant the minimal role at the narrowest scope; prefer predefined/custom roles + Workload Identity over primitive roles or long‑lived keys.

Often Confused With

Service AccountsWorkload IdentityOrganization Policy

Common Mistakes

Assuming a custom role created in one project is automatically available org‑wide.
Treating predefined roles as always least‑privilege; many include extra permissions.
Using long‑lived service account keys for GKE instead of Workload Identity.

ML Threats — Data, Model & Supply‑Chain Attacks

Attacks can poison data, extract or steal models, or exploit CI/CD/runtime; defend via access control, KMS, monitoring,&

Key Insight

Threats span the entire ML lifecycle — prevention + provenance + detection are required; access controls alone don’t stop extraction or insider/supply

Often Confused With

Adversarial MLData PrivacyModel Governance

Common Mistakes

Believing adversarial attacks only affect image models; text, tabular, and time‑series are vulnerable.
Thinking encrypting model artifacts prevents poisoning; poisoning occurs in training/supply chain before encryption.
Assuming anonymization or strong access controls alone prevent model leakage or extraction.

Cloud DLP (Data Loss Prevention) — PII Obscuring

Detect, classify and transform PII on GCP (redact, mask, tokenize, pseudonymize) chosen by re‑ID risk.

Key Insight

Pseudonymization keeps re‑ID paths; anonymization aims to break them irreversibly — pick technique by legal risk, re‑ID likelihood, and control of key

Often Confused With

TokenizationEncryptionAnonymization

Common Mistakes

Treating pseudonymization as irreversible anonymization
Assuming enabling DLP alone prevents data exfiltration (no access, logging, perimeter controls)
Believing aggregation/k‑anonymity guarantees zero re‑identification risk

ML Privacy: PII/PHI/PCI Controls

Apply classification, minimization, de‑id, DP, tokenization, encryption and strict model access to stop sensitive‑data泄露

Key Insight

Models can memorize and leak training records — combine de‑identification, differential privacy, access controls, audit logs and validation to reduce泄

Often Confused With

Synthetic dataDe‑identificationEncryption

Common Mistakes

Assuming de‑identification permanently removes regulatory obligations
Believing synthetic data eliminates all privacy risk
Thinking trained models can't leak PII so no model access controls are needed

CI/CD Pipelines — Gated, Immutable Releases

Automate build→test→release with immutable artifacts, environment gates, security scans, and rollback plans.

Key Insight

CI produces signed immutable artifacts; CD must promote those through gated stages (tests, canary, approvals) not rebuild per env.

Often Confused With

DevOpsGitOpsRelease Management

Common Mistakes

Installing a CI tool equals CI/CD — process, tests, and gating must be designed too.
Treating Continuous Delivery as auto-deploy to prod — require explicit gating/approvals for production.
Using one pipeline template for all services — ignore per-service tests, quotas, and rollback needs.

Model & Data Lineage — Reproducible Provenance

Capture end-to-end provenance (data, transforms, code, params, checksums) so models and datasets are reproducible andaud

Key Insight

Lineage = directed causal graph linking inputs, transforms, runs, and artifacts; snapshots (versioning) alone don't show causality.

Often Confused With

VersioningAudit LogsMetadata Catalogs

Common Mistakes

Assuming cloud auto-captures complete lineage — you must instrument pipelines and record hashes, params, and run IDs.
Storing raw file copies only — missing schema, transform params, and code hashes breaks reproducibility.
Believing lineage metadata alone ensures compliance — combine with IAM, retention, and signed evidence.

Autoscaling Patterns — MIGs & GKE HPA/VPA

Pick horizontal vs vertical scaling; use MIGs/HPA/VPA plus forecasting, headroom, triggers, cooldowns to meet SLOs.

Key Insight

Reserve forecasted base capacity (procure/reserve) and handle spikes with reactive HPA/VPA; always add headroom and cooldowns to protect SLOs.

Often Confused With

Vertical resizing (VM resize)HPA vs VPAZonal vs Regional MIGs

Common Mistakes

Assuming horizontal scaling always outperforms vertical resizing
Scaling to observed peak with no headroom or variability analysis
Believing HPA only uses CPU metrics

FinOps: Cost ↔ Capacity Trade-offs

Balance cost, capacity, availability and performance with budgeting, forecasting, labels, showback/chargeback and policy

Key Insight

Match discounts and procurement to stable patterns, use autoscaling for variability, and enforce labels + showback to tie spend to outcomes.

Often Confused With

Committed Use Discounts (CUDs)Autoscaling policiesBackup vs DR replication

Common Mistakes

Assuming autoscaling always lowers cloud costs
Buying CUDs without verifying steady usage windows
Treating labels and tagging as optional bookkeeping

IaC — Versioned, Reproducible Infra

Define cloud infrastructure in versioned code to provision, audit, and reproduce environments automatically.

Key Insight

Idempotent declarations + remote state and locking = safe multi‑env rollouts, plan reviews, and drift detection.

Often Confused With

Configuration ManagementGitOps

Common Mistakes

Assume IaC always means declarative; some tools are imperative with different guarantees
Checking secrets into code or state files instead of using a secrets manager
Skipping plan/review/CI and applying changes directly to production

Blue‑Green Releases — Swap Whole Envs

Run two production-identical environments and flip traffic to deploy or rollback with near-zero downtime.

Key Insight

Blue‑green swaps entire environments — it's fast for stateless services but requires DB/compatibility strategies for stateful systems.

Often Confused With

Canary deploymentRolling updates

Common Mistakes

Underestimate the cost of running duplicate production infrastructure
Assume rollback is trivial when databases or external state are involved
Forget session affinity, connection draining, and health checks when switching traffic

Terraform IaC — GCP Remote State & Least-Privilege

Declarative provisioning with Terraform on GCP; manage remote state, per-workspace SAs, and secret handling for safe ops

Key Insight

State is the source-of-truth and may contain secrets—use encrypted remote state, locking, per-workspace service accounts, and Secret Manager.

Often Confused With

Deployment ManagerCloud Foundation ToolkitManual console changes

Common Mistakes

Assuming Terraform state contains no sensitive data; state can include secrets and identifiers.
Granting Terraform a broad Owner role to avoid permission errors instead of least-privilege SAs.
Committing secrets or plain variables to repo instead of using Secret Manager or encrypted backends.

Audit & Access Transparency Logs — The Evidence Chain

Admin, Data Access, and Access Transparency logs show who/what accessed resources; export, retain, and protect them for

Key Insight

Admin Activity is on by default; Data Access often isn’t. Access Transparency shows Google staff access. Exports + IAM + retention policies are needed

Often Confused With

Data Access logsAdmin Activity logsVPC Flow Logs

Common Mistakes

Expecting Access Transparency to include full request/response payloads — it typically shows access events and metadata.
Treating exported logs as tamper-proof without bucket/object locks, strict IAM, and retention settings.
Relying on monitoring alerts as a replacement for durable, exported audit logs in post-incident forensics.

Backup & DR Postures (RTO/RPO + Runbooks)

Backups, replication, retention and runbooks tailored to meet RTO/RPO; test full failovers, not just file restores.

Key Insight

RTO/RPO drive posture choice—snapshots/log‑shipping vs pilot‑light/warm‑standby/multi‑site; runbooks/automation set real RTO.

Often Confused With

High Availability (HA)ReplicationBusiness Continuity

Common Mistakes

Relying on frequent snapshots alone—ignore transaction logs and consistency.
Equating single-file restore tests with full DR readiness.
Assuming failover is automatic—forget DNS, certs, data lag and tested runbooks.

Error Budget (SLO-driven Risk Control)

Allowed unreliability (1−SLO) measured via SLIs/rolling windows; use burn‑rate to gate releases.

Key Insight

Error budget = complement of SLO; measure all SLIs (errors, latency, correctness), track burn rate, and apply tiered actions when thresholds hit.

Often Confused With

SLOSLASLI

Common Mistakes

Treating error budget as the SLO instead of its complement.
Counting only outages—ignoring latency, incorrect responses, and partial failures.
Assuming budgets reset instantly; ignore rolling windows and burn‑rate math.

SLIs / SLOs / SLAs — Error Budgets

Quantitative health metrics (SLIs), targets (SLOs) and contracts (SLAs); use error budgets to drive releases and alerts.

Key Insight

SLO = allowed unreliability over a rolling window; error budget = remaining allowed failure; burn rate dictates throttling/rollback.

Often Confused With

MonitoringSLAs

Common Mistakes

Treating error budget as financial budget instead of allowed unreliability.
Mixing SLOs with SLAs — SLOs are internal targets; SLAs are contractual with penalties.
Assuming error budgets reset instantly at window boundaries rather than using rolling/defined windows.

Data Drift — Input & Label Shifts

Distribution shifts in inputs or labels that reduce model generalization; detect at feature-level, validate impact, then

Key Insight

Types matter: covariate (inputs), prior (label freq), concept (label meaning). Use feature stats, PSI, and unlabeled proxies to detect early.

Often Confused With

Concept DriftModel Degradation

Common Mistakes

Assuming all detected drift immediately breaks model performance — always quantify impact first.
Monitoring only outputs/ops metrics (latency/throughput) and ignoring feature-distribution checks.
Automatically retraining on drift without diagnosing data quality, features, or label issues first.

Cloud Build CI/CD (cloudbuild.yaml & triggers)

Managed CI/CD that runs cloudbuild.yaml steps, produces artifacts, and can invoke deployments or Vertex AI pipelines.

Key Insight

Build steps share one workspace; triggers are filterable/skippable; Cloud Build invokes training but won't long‑term store models or run heavy GPU/TPU

Often Confused With

Cloud DeployVertex AI PipelinesArtifact Registry

Common Mistakes

Assuming triggers run on every commit — forget to filter by branch/tag or allow skip flags.
Expecting Cloud Build to auto-version/store models long-term instead of pushing to Artifact Registry/GCS.
Thinking Cloud Build runs long GPU/TPU training jobs directly (it invokes, doesn't replace training infra).

Gated Deployments & Pipeline Parameters

Use conditional steps, automated/manual gates, scoped params and runbooks to coordinate code, schema, and data changes.

Key Insight

Promotions must coordinate binaries + DB/state migrations + runbook steps; params need scoping and secret handling for reproducibility and safety.

Often Confused With

Feature FlagsDB Migration ScriptsInfrastructure as Code

Common Mistakes

Assuming promotion only moves binaries — skipping DB/state migration coordination breaks releases.
Treating pipeline params as non-sensitive or ephemeral — expose secrets or lose reproducibility.
Believing automation removes the need for human approvals during high-risk rollouts.

Monitoring & Alerting — Noise Suppression

Choose metrics/logs/traces, set severity-based alerts and routes, and suppress noise to cut false pages.

Key Insight

Alert on customer impact (SLO breaches), not every symptom; map severity → responders → channel.

Often Confused With

LoggingTracingSLIs/SLOs

Common Mistakes

Paging on every alert instead of using severities and escalation paths
Relying only on logs for diagnosis; no metrics/traces for fast root cause
Overzealous dedup/suppression that hides distinct incidents or escalations

ML Inference Latency Troubleshooting

Profile CPU/GPU, threads, queues, I/O and network; reproduce single-request and synthetic tails to isolate root cause.

Key Insight

High tail latency usually stems from single-threading, queuing, or client/network—scaling replicas often won’t fix it.

Often Confused With

AutoscalingBatchingGPU provisioning

Common Mistakes

Assuming undersized VM/instance is always the cause without profiling
Switching to GPUs without benchmarked per-request and cold-start checks
Adding instances to hide latency without addressing single-thread, queue, or client-side delays

Regression Testing — Stage-Gated, Fast, Targeted

Automated suites that catch reintroduced bugs; run the right scope at the right pipeline stage to keep gates fast.

Key Insight

Run narrow, fail-fast regression pre-merge; run full suites in CI/pre-prod and gate promotions by risk and test stability.

Often Confused With

Unit testsIntegration testsEnd-to-end tests

Common Mistakes

Treating regression suite as only unit tests
Assuming a larger suite always improves safety (ignores feedback speed)
Running regression checks only post-release instead of pre-merge/pre-deploy

ML Metrics — AUC-ROC/PR & MAE/RMSE/RMSLE

Classification ranking metrics (AUC-ROC/AUC-PR) and regression error metrics (MAE/RMSE/RMSLE); choose by class imbalance

Key Insight

Use AUC-PR when positives are rare; RMSE penalizes large errors; RMSLE measures relative error but breaks with zeros/negatives.

Often Confused With

AccuracyF1-scoreMAPE

Common Mistakes

Relying on AUC-ROC for highly imbalanced positive classes
Treating AUC-PR and AUC-ROC as interchangeable
Using RMSLE on negative or zero-valued targets (invalid)

Progressive Rollouts & Blast‑Radius Control

Use blue‑green/canary/rolling + feature flags, kill‑switches and traffic shaping; gate with SLIs/SLOs and observability.

Key Insight

Roll out to small, observable traffic slices and require thresholded SLI gates — rollback only with aggregated, contextual signals to avoid flapping.

Often Confused With

Blue–Green DeploymentCanary DeploymentRolling Update

Common Mistakes

Treat readiness and liveness probes as interchangeable.
Apply steady‑state SLI thresholds to deployment verification unchanged.
Trigger immediate rollback on any single error — causes flapping.

Unified Telemetry: Metrics, Logs, Traces & Probes

Instrument metrics, structured logs, traces and synthetic probes; correlate them to detect anomalies, find root cause, &

Key Insight

Correlate golden signals with domain metrics, traces and probes; interpret SLI deltas with load/noise/upstream context before blaming code.

Often Confused With

Golden SignalsTracingSynthetic Monitoring

Common Mistakes

Only monitor golden signals; ignore domain-specific metrics and logs.
Use metrics alone for root-cause diagnosis; skip logs and traces.
Treat every SLI delta as a deployment regression; ignore noise/upstream causes.

Google Cloud Professional Cloud Architect Practice Questions

Access Mock Exams & Comprehensive Question Bank

Listen to Audio Podcasts

Expert summaries for Google Cloud Professional Cloud Architect

Certification Overview

Duration:120 min

Questions:60

Passing:70%

Level:Advanced

Cheat Sheet Content

44Key Concepts

6Exam Domains

Google Cloud Professional Cloud Architect Ultimate Cheat Sheet

Your Quick Reference Study Guide

Google Cloud Professional Cloud Architect

Designing and Planning a Cloud Solution Architecture

Designing and Planning a Cloud Solution Architecture

TCO — Full Lifecycle Cost

Scalability & Performance Targets

Inference Modes: Batch vs Online vs Cache

HA & Failover: Patterns and RTO/RPO Tradeoffs

VPC Design (Virtual Private Cloud)

VPC Network Peering (Private Backbone)

Multicloud Integration: Data Gravity & Trust

Anthos — Hybrid & Multicloud Kubernetes

Compute Platform Selection — GKE • Cloud Run • App Engine • Functions • VMs

Data Migration & Schema Evolution — CDC, Dual‑Write, Expand→Contract

TCO — Full Lifecycle Cost

Scalability & Performance Targets

Inference Modes: Batch vs Online vs Cache

HA & Failover: Patterns and RTO/RPO Tradeoffs

VPC Design (Virtual Private Cloud)

VPC Network Peering (Private Backbone)

Multicloud Integration: Data Gravity & Trust

Anthos — Hybrid & Multicloud Kubernetes

Compute Platform Selection — GKE • Cloud Run • App Engine • Functions • VMs

Data Migration & Schema Evolution — CDC, Dual‑Write, Expand→Contract

Managing and Provisioning Infrastructure

Managing and Provisioning Infrastructure

Hybrid Connectivity — Interconnect, HA‑VPN & Cloud Router

Subnet & IP Design — VPC, GKE Pods/Services

BigQuery — ML Canonical Store

GCS — Model Artifacts & Batch Storage

Accelerator Choice — CPU, GPU, TPU, Edge

Cloud Load Balancers — Type, Scope, Health

Vertex AI Pipelines — Compile, Submit, Schedule

Dataflow (Apache Beam): Batch & Streaming Prep

Vertex AI Prediction: Online vs Batch

Vertex AI Prebuilt APIs — Use & Tradeoffs

Security and Compliance

Security and Compliance

IAM — Roles, Service Accounts & Least Privilege

ML Threats — Data, Model & Supply‑Chain Attacks

Cloud DLP (Data Loss Prevention) — PII Obscuring

ML Privacy: PII/PHI/PCI Controls

Analyzing and Optimizing Processes

Analyzing and Optimizing Processes

CI/CD Pipelines — Gated, Immutable Releases

Model & Data Lineage — Reproducible Provenance

Autoscaling Patterns — MIGs & GKE HPA/VPA

FinOps: Cost ↔ Capacity Trade-offs

Managing Implementation

Managing Implementation

IaC — Versioned, Reproducible Infra

Blue‑Green Releases — Swap Whole Envs

Terraform IaC — GCP Remote State & Least-Privilege

Audit & Access Transparency Logs — The Evidence Chain

Solution and Operations Excellence

Solution and Operations Excellence

Backup & DR Postures (RTO/RPO + Runbooks)

Error Budget (SLO-driven Risk Control)

SLIs / SLOs / SLAs — Error Budgets

Data Drift — Input & Label Shifts

Cloud Build CI/CD (cloudbuild.yaml & triggers)

Gated Deployments & Pipeline Parameters

Monitoring & Alerting — Noise Suppression

ML Inference Latency Troubleshooting

Regression Testing — Stage-Gated, Fast, Targeted

ML Metrics — AUC-ROC/PR & MAE/RMSE/RMSLE

Progressive Rollouts & Blast‑Radius Control

Unified Telemetry: Metrics, Logs, Traces & Probes

Certification Overview

Cheat Sheet Content

Similar Cheat Sheets