Cloud Engineer

Ryan Mahoney

Why this role is hard · Ryan Mahoney

At this stage, engineers need to move fast without breaking the safety rules that keep our systems stable. The real test is finding someone who writes clean automation scripts while designing access controls that actually limit damage when things go wrong. They must explain technical tradeoffs clearly and have the confidence to halt a rushed deployment that skips a security review. Most applicants lean heavily to one side, shipping either flawless templates that ignore cloud costs or tight budget tracking that leaves network security wide open.

Core Evaluation

Critical questions for this role

The competency and attitude questions below are where the hiring decision is made. They run in the live interview rounds and are calibrated to the level selected above.

21 Competency Questions

1 of 21
  1. Discipline

    Cloud Infrastructure & Security Engineering

  2. Job requirement

    Core Infrastructure & IaC Automation

    Authors and maintains modular IaC configurations, troubleshoots provisioning drift, and implements state management best practices.

  3. Expected at Mid

    Mid-level engineers must independently author, version, and troubleshoot infrastructure code without daily supervision to meet project delivery SLAs.

Interview round: Hiring Manager Technical Deep Dive

Walk me through how you structured and delivered a complex infrastructure-as-code configuration from initial design to production deployment.

Positive indicators

  • Mentions reusable modules and state locking
  • Describes automated pipeline validation steps
  • Explains rollback procedures for failed applies
  • Tracks configuration drift systematically
  • Uses version control for all infrastructure state

Negative indicators

  • Relies on monolithic single-file templates
  • Manually edits state files or console
  • Skips testing before production deployment
  • Ignores configuration drift warnings
  • Hardcodes environment-specific values

13 Attitude Questions

1 of 13

Active Listening

The disciplined practice of fully concentrating on, comprehending, and retaining verbal and non-verbal communication from stakeholders, while suspending immediate judgment or technical rebuttal, to accurately capture underlying operational needs, implicit requirements, and business impacts before formulating cloud architecture or deployment responses.

Interview round: Recruiter Initial Screen

How would you approach a design review session where the security team, data engineers, and operations staff all present different constraints for a new analytics platform?

Positive indicators

  • Outlines a clear agenda for constraint mapping
  • Uses visual aids or matrices to compare requirements
  • Seeks consensus before finalizing architecture
  • Acknowledges each team's operational realities
  • Proposes a phased approach to validate constraints

Negative indicators

  • Favors one team's constraints over others arbitrarily
  • Rushes to a technical solution without alignment
  • Ignores operational or security guardrails
  • Assumes constraints are static rather than negotiable
  • Leaves conflicting requirements unresolved

Supporting Evaluation

How candidates earn the selection conversation

The goal is to reduce effort for everyone by collecting more useful signal before adding more interviews. Lightweight application prompts and structured screens help the panel focus live time on the candidates most likely to succeed.

Stage 1 · Application

Filter at the door

Runs the moment a candidate hits Submit. Disqualifying answers end the application; everything else is captured for review.

Video-Response Questions

1 of 3

Application Screen: Video Response

You are preparing to present a complex cloud infrastructure trade-off between cost optimization and system performance to non-technical transit agency leadership. What specific steps would you take to structure your explanation so they understand the risks, constraints, and recommended path forward?

Candidate experience

REC
0:42 / 2:00
1Record
2Review
3Submit

Response time

2 min

Format

Recorded video

Stage 2 · Resume Screening

Read the resume against fixed criteria

Reviewers score every application that clears the door against the same criteria. Stronger reviews advance to live interviews; weaker ones are archived without further screening.

Resume Review Criteria

8 criteria
Design and delivery of reusable infrastructure templates for cloud environments, incorporating security guardrails and cost tracking.
Implementation of event-driven pipelines for high-throughput telemetry or transit data feeds using Kinesis, Event Hubs, or Kafka.
Execution of tagging strategies, rightsizing, and budget alerting to align cloud spend with organizational or public-sector constraints.
Configuration of cross-environment identity synchronization and tuning of observability stacks to meet operational SLAs.

Does the resume show relevant prior work experience?

Does the cover letter or personal statement convey clear relevance and familiarity with the job?

Does the resume indicate required academic credentials, relevant certifications, or necessary training?

Is the resume complete, well-organized, and free from formatting, spelling, and grammar mistakes?

Stage 3 · During Interviews

Where the hire is decided

Interview rounds use the competency and attitude questions outlined above, then add tests, work simulations, and presentations that reveal deeper evidence about how the candidate thinks and works.

Coding Test

1 of 2

Live Interview · Coding Test

Without AI

Build a Terraform module that provisions an autoscaling group with a launch template. Enforce mandatory cost tags, attach a least-privilege IAM instance profile, and expose variables for min/max capacity.

Transit analytics workloads require predictable scaling but strict cost controls. Design a reusable Terraform module that provisions an AWS Auto Scaling Group with a Launch Template. Requirements: 1) Accept variables for min/max/desired capacity and AMI ID, 2) Attach an IAM role limited to S3 read-only and CloudWatch write, 3) Enforce mandatory tags (Project, Environment, CostCenter) via module inputs, 4) Include lifecycle rules to prevent accidental deletion of core ASG resources. Provide module interface and core resource blocks.

With AI

Use AI to generate the Terraform module, then audit it for privilege escalation risks, missing lifecycle protections, and cost tag enforcement gaps.

Transit analytics workloads require predictable scaling but strict cost controls. Use an AI assistant to draft a Terraform module for an Auto Scaling Group with a Launch Template. Requirements: accept capacity variables, attach a least-privilege IAM role for S3 read/CloudWatch write, enforce mandatory cost tags, and add lifecycle protections. Critically review the AI output: AI often grants overly broad IAM permissions, omits lifecycle rules, or fails to propagate tags to child resources. Submit the hardened module and document which AI defaults you corrected.

Response time

30 min

Positive indicators

  • Clear module variable interface with validation
  • Least-privilege IAM policy scoped to transit data access
  • Consistent tag propagation across nested resources
  • Lifecycle protections for critical scaling infrastructure
  • Catches AI-generated wildcard IAM policies and restricts them
  • Adds lifecycle rules to prevent ASG deletion
  • Ensures tags propagate to launch templates and instances
  • Documents AI's tendency to skip governance controls

Negative indicators

  • Admin-level IAM roles attached by default
  • Missing tag enforcement or inconsistent naming
  • No lifecycle protections, risking accidental teardown
  • Hardcoded capacity limits instead of variables
  • Accepts broad IAM permissions without review
  • Misses missing lifecycle protections
  • Fails to enforce consistent tagging across resources
  • Cannot explain why AI defaults violate FinOps/security standards

Presentation Prompt

Prepare a short deck walking us through your approach to designing secure VPC/VNet peering and transit gateway architectures for integrating a legacy automated fare collection (AFC) system with modern cloud workloads. Discuss your network segmentation strategy, routing considerations, and how you balance cloud-native efficiency with legacy compatibility constraints.

Format

deck-and-walkthrough · 20 min · ~2 hr prep

Audience

Senior Cloud Engineers and Platform Engineering Lead

What to prepare

  • 3-5 slides outlining your proposed network topology and security controls
  • Bullet points on trade-offs between cloud-native routing and legacy AFC system requirements

Deliverables

  • A 15-minute structured walkthrough of your deck
  • Discussion of implementation guardrails, peer-review processes, and operational handoffs

Ground rules

  • Keep slides concise and focused on architecture and decision-making rather than exhaustive diagrams.
  • Conceptual layouts are sufficient; do not produce a fully detailed, production-ready network blueprint.
  • You may reference past work only if you are permitted to share it without violating confidentiality.

Scoring anchors

Exceeds
Delivers a nuanced, well-structured architecture that explicitly balances legacy constraints with modern security, routing, and compliance guardrails.
Meets
Presents a coherent peering design with clear security controls and acknowledges key trade-offs and review processes.
Below
Offers an oversimplified or jargon-heavy design that overlooks segmentation, routing complexity, or compliance constraints.

Response time

20 min

Positive indicators

  • Clearly articulates trade-offs between latency, security, and legacy compatibility
  • Demonstrates structured reasoning around routing tables, CIDR overlap prevention, and firewall rules
  • Surfaces assumptions about AFC system constraints and validates them with targeted questions
  • Proposes measurable security and operational guardrails for the integration and change control

Negative indicators

  • Presents a monolithic network design without addressing segmentation or compliance boundaries
  • Ignores latency, routing propagation, or failover impacts on real-time AFC transactions
  • Fails to acknowledge change control, peer-review, or operational handoff requirements
  • Relies heavily on unexplained jargon without clarifying architectural rationale

Work Simulation Scenario

Scenario. You are a Cloud Engineer tasked with designing a centralized IAM strategy to federate a transit agency's on-prem Active Directory with AWS IAM Identity Center. The initial request only states 'sync our AD to AWS for single sign-on.' You must uncover the identity landscape, security requirements, and operational constraints before designing the federation architecture.

Problem to solve. Develop a structured approach to implement secure, least-privilege identity federation by identifying directory structure, access patterns, compliance requirements, and lifecycle management needs.

Format

discovery-interview · 40 min · ~2 hr prep

Success criteria

  • Clarify AD organizational structure, existing group policies, and contractor vs. employee access needs.
  • Identify MFA, session duration, and conditional access requirements.
  • Map out a phased rollout that minimizes disruption to transit operations.

What to review beforehand

  • Review AWS IAM Identity Center and Azure AD Connect documentation.
  • Understand standard public-sector IAM compliance baselines (e.g., least-privilege, audit logging).

Ground rules

  • Focus on discovery and architectural framing, not live configuration.
  • Use clarifying questions to map constraints and trade-offs.
  • You will be evaluated on your ability to synthesize identity requirements into a secure, scalable plan.

Roles in scenario

Agency IT Security Lead (informed_partner, played by cross_functional)

Motivation. Needs secure, auditable access for hybrid cloud environments while minimizing helpdesk tickets and operational friction.

Constraints

  • Must maintain strict audit trails for all privileged access and session activity.
  • Cannot disable legacy on-prem systems until cloud federation is fully validated.
  • Has strict budget limits for third-party identity tools; prefers native cloud solutions.

Tensions to introduce

  • Emphasizes security but lacks clarity on how contractors access legacy transit systems.
  • Worries about SSO downtime during peak dispatch hours.
  • Pushes for rapid rollout but hasn't mapped existing AD group sprawl.

In-character guidance

  • Answer questions directly about AD structure, compliance mandates, and operational windows.
  • Acknowledge trade-offs between security rigor and user convenience when probed.
  • Provide realistic constraints around legacy system dependencies and helpdesk capacity.

Do not

  • Do not volunteer IAM best practices, conditional access policies, or federation architecture details unless asked.
  • Do not suggest a specific SSO provider or directory sync strategy.
  • Do not resolve the candidate's design questions or hand over a pre-approved security baseline.

Scoring anchors

Exceeds
Systematically uncovers identity sprawl, maps least-privilege boundaries, and designs a phased, auditable rollout that balances security with operational continuity.
Meets
Identifies key AD and compliance constraints, asks relevant questions about MFA and session management, and outlines a structured federation plan.
Below
Skips foundational discovery about directory structure or access patterns, assumes a standard sync will work without probing constraints, or proposes an unvalidated monolithic migration.

Response time

40 min

Positive indicators

  • Asks high-leverage questions about AD group structure, contractor lifecycles, and legacy system dependencies.
  • Probes for MFA requirements, session policies, and audit logging expectations.
  • Frames trade-offs between rapid deployment and operational stability during peak transit hours.
  • Proposes a phased validation approach with clear rollback and monitoring strategies.

Negative indicators

  • Assumes AD structure is clean or that all users can be migrated simultaneously.
  • Overlooks contractor/vendor access patterns or legacy system integration needs.
  • Proposes a monolithic rollout without addressing helpdesk capacity or peak-hour constraints.
  • Fails to establish clear audit and compliance boundaries before designing the federation flow.

Progression Framework

This table shows how competencies evolve across experience levels. Each cell shows competency at that level.

Cloud Infrastructure & Security Engineering

5 competencies

CompetencyJuniorMidSeniorPrincipal
Core Infrastructure & IaC Automation

Executes predefined IaC templates and scripts under supervision, learning core cloud resource types and basic deployment workflows.

Authors and maintains modular IaC configurations, troubleshoots provisioning drift, and implements state management best practices.

Designs scalable IaC architectures, establishes CI/CD pipelines for infrastructure, and enforces policy-as-code for governance.

Defines organization-wide infrastructure strategy, evaluates emerging provisioning paradigms, and drives cross-team standardization.

Edge Computing & Telemetry

Installs edge agents, monitors basic telemetry streams, and assists with data ingestion validation.

Configures edge processing rules, optimizes telemetry batching, and implements local caching strategies.

Architects geo-distributed edge topologies, implements real-time stream processing, and ensures fault-tolerant data sync.

Defines edge computing strategy, evaluates hardware/software convergence, and establishes global telemetry governance.

Identity & Access Management

Assists with user provisioning, manages basic role assignments, and monitors access logs for anomalies.

Configures IAM policies, sets up SSO integrations, and audits permission boundaries across environments.

Architects zero-trust identity frameworks, automates credential rotation, and establishes cross-account federation.

Leads enterprise identity strategy, integrates advanced threat detection with IAM, and defines compliance-aligned access models.

Interface & Service Integration

Consumes existing APIs, writes basic integration scripts, and monitors endpoint health under guidance.

Designs REST/gRPC APIs, implements message brokers, and configures service discovery mechanisms.

Architects event-driven systems, implements API gateways with rate limiting, and establishes service mesh policies.

Defines enterprise integration standards, evaluates protocol evolution, and drives cross-ecosystem interoperability.

Network Architecture & Security

Deploys standard VPC/VNet configurations, configures basic security groups, and assists with DNS setup.

Implements network segmentation, configures load balancers, and manages VPN/Direct Connect integrations.

Architects hybrid cloud networks, optimizes routing for latency/cost, and deploys advanced firewall/WAF solutions.

Defines global network strategy, integrates SD-WAN and edge routing, and establishes enterprise traffic governance policies.

Data Operations & Platform Management

5 competencies

CompetencyJuniorMidSeniorPrincipal
Compliance & Audit Automation

Runs compliance scans, documents findings, and assists with remediation tracking under supervision.

Configures automated policy checks, generates audit reports, and integrates compliance gates into CI/CD.

Architects continuous compliance frameworks, maps controls to regulatory standards, and automates evidence aggregation.

Defines enterprise risk posture, leads regulatory strategy, and aligns compliance automation with business objectives.

Data Pipeline Orchestration

Runs scheduled data jobs, monitors pipeline logs, and resolves basic data formatting or connectivity errors.

Develops transformation scripts, implements data validation checks, and optimizes query performance.

Architects fault-tolerant data workflows, implements schema evolution strategies, and automates pipeline scaling.

Defines enterprise data architecture standards, evaluates lakehouse/warehouse paradigms, and drives data product strategy.

Engineering Mentorship & Collaboration

Participates in peer code reviews, documents learnings, and contributes to team knowledge bases.

Mentors junior engineers, facilitates technical workshops, and leads cross-team alignment sessions.

Designs onboarding curricula, establishes engineering guilds, and drives technical decision-making frameworks.

Cultivates engineering culture, aligns technical strategy with organizational goals, and champions inclusive knowledge ecosystems.

Observability & Cost Governance

Configures basic alerting rules, reviews metric dashboards, and tags resources for cost allocation.

Correlates logs and traces for incident triage, implements right-sizing recommendations, and builds custom dashboards.

Architects SLO/SLI frameworks, automates anomaly detection, and establishes FinOps reporting cadences.

Defines observability maturity roadmap, integrates predictive cost modeling, and aligns telemetry with business KPIs.

Platform Engineering & Enablement

Maintains shared tooling repositories, documents platform usage, and assists developers with onboarding workflows.

Builds CLI tools and templates, automates environment provisioning, and integrates linter/security scanners.

Architects golden paths and self-service portals, establishes platform SLAs, and drives adoption through developer experience metrics.

Defines platform strategy, evaluates build-vs-buy tradeoffs, and scales IDP capabilities across enterprise divisions.