Transit Technology Infrastructure Engineer

Ryan Mahoney

Why this role is hard · Ryan Mahoney

It's rare to find someone who keeps the buses running while also talking clearly to non-technical teams. At this level, we need an engineer who can own a subsystem without needing constant hand-holding. They have to balance fixing a broken real-time feed with explaining the delay to an operations manager. Too many candidates focus only on code and ignore how infrastructure affects people. We need people who listen well enough to understand the actual impact of a server outage on riders.

Core Evaluation

Critical questions for this role

The competency and attitude questions below are where the hiring decision is made. They run in the live interview rounds and are calibrated to the level selected above.

15 Competency Questions

1 of 15
  1. Discipline

    Transit Technology Infrastructure & Operations

  2. Job requirement

    Core Infrastructure Operations

    Independently manages infrastructure lifecycle tasks and troubleshoots common operational issues across physical and virtual components supporting transit systems.

  3. Expected at Mid

    At the mid-level, engineers are expected to independently manage the infrastructure lifecycle and troubleshoot operational issues without constant supervision. This aligns with the level's focus on independent subsystem management and on-call rotations, directly mitigating the risk of prolonged service outages and high MTTR that impact riders.

Interview round: Hiring Manager Technical

Walk me through how you managed infrastructure spanning both cloud providers and on-premises hardware.

Positive indicators

  • Mentions specific hybrid architecture patterns
  • Discusses latency considerations for legacy integrations
  • Describes clear ownership boundaries with traditional IT

Negative indicators

  • Treats cloud and on-prem as identical without nuance
  • Ignores legacy system limitations
  • Unable to articulate workload placement logic

14 Attitude Questions

1 of 14

Active Listening

The intentional cognitive process of fully concentrating on, understanding, and responding to verbal and non-verbal communication to accurately translate operational realities into technical requirements while maintaining safety and system integrity.

Interview round: Recruiter Screen

Describe a situation where you had to understand a complex issue reported by a non-technical user.

Positive indicators

  • Translates technical symptoms to operational impact
  • Validates user experience before troubleshooting
  • Maintains calm demeanor

Negative indicators

  • Talks over the user
  • Uses technical terms to explain user errors
  • Dismisses report as user error too quickly

Supporting Evaluation

How candidates earn the selection conversation

The goal is to reduce effort for everyone by collecting more useful signal before adding more interviews. Lightweight application prompts and structured screens help the panel focus live time on the candidates most likely to succeed.

Stage 1 · Application

Filter at the door

Runs the moment a candidate hits Submit. Disqualifying answers end the application; everything else is captured for review.

Knock-out Questions

1 of 2

Application Screen: Knock-out

Do you have at least three years of hands-on production experience deploying and managing infrastructure using Terraform?

Yes
Qualifies
No
Auto-decline

Video-Response Questions

1 of 3

Application Screen: Video Response

Describe a time you had to explain a critical infrastructure failure or potential system downtime to non-technical executive leadership. What specific steps did you take to ensure they understood the business impact and agreed on a recovery plan without causing panic?

Candidate experience

REC
0:42 / 2:00
1Record
2Review
3Submit

Response time

2 min

Format

Recorded video

Stage 2 · Resume Screening

Read the resume against fixed criteria

Reviewers score every application that clears the door against the same criteria. Stronger reviews advance to live interviews; weaker ones are archived without further screening.

Resume Review Criteria

8 criteria
Designs, implements, and maintains streaming data pipelines that ingest and distribute GTFS-Realtime vehicle positions and trip updates to consumer applications.
Monitors API latency, on-prem server connectivity, and SLOs using observability platforms, and independently troubleshoots live system errors during operational rotations.
Maintains stable data exchange between modern cloud platforms and legacy operational systems using scripting, database management, and API configuration.
Implements ADA technical compliance checks and open source security reviews during release cycles, ensuring applications meet legal and security standards.

Is the resume complete, well-organized, and free from formatting, spelling, and grammar mistakes?

Does the cover letter or personal statement convey clear relevance and familiarity with the job?

Does the resume show relevant prior work experience?

Does the resume indicate required academic credentials, relevant certifications, or necessary training?

Stage 3 · During Interviews

Where the hire is decided

Interview rounds use the competency and attitude questions outlined above, then add tests, work simulations, and presentations that reveal deeper evidence about how the candidate thinks and works.

Coding Test

Live Interview · Coding Test

Without AI

Implement a validation function that checks for stale or future-dated vehicle positions. Filter out invalid records and return a clean dataset. Prioritize data accuracy and clear error reporting.

Write a function validate_vehicle_positions(raw_records: list[dict]) -> list[dict] that filters out records where the timestamp is older than 120 seconds or more than 60 seconds in the future. Log rejected records with reasons. Return only valid records.

With AI

Use AI to draft validation logic, but critically review timestamp handling and edge cases. Transit systems span timezones and experience clock skew. Ensure your solution is robust, not just syntactically correct.

Write a function validate_vehicle_positions(raw_records: list[dict]) -> list[dict] that filters out records where the timestamp is older than 120 seconds or more than 60 seconds in the future. Log rejected records with reasons. Return only valid records.

Response time

20 min

Positive indicators

  • Correctly calculates time deltas using timezone-aware or consistent UTC logic
  • Logs rejection reasons clearly for ops troubleshooting
  • Handles missing or malformed timestamp fields gracefully
  • Maintains O(n) performance suitable for streaming pipelines
  • Identifies AI's naive timezone assumptions and corrects them
  • Adds explicit handling for clock skew and leap seconds
  • Structures logs for compliance auditing rather than debugging
  • Documents why certain AI-generated edge cases were kept or modified

Negative indicators

  • Compares naive and aware datetimes incorrectly
  • Silently drops records without audit trails
  • Uses inefficient nested loops or regex for parsing
  • Fails to handle missing keys or type errors
  • Accepts AI's generic datetime comparison without transit context
  • Overcomplicates validation with unnecessary regex or external libraries
  • Fails to log rejection metadata required by transit data standards
  • Cannot articulate how AI suggestions were adapted for GTFS-RT constraints

Presentation Prompt

Talk us through how you would approach implementing and validating a new GTFS-Realtime trip update pipeline when integrating with a legacy agency dispatch system that has intermittent connectivity. Slides are optional; focus on your architectural framing, validation steps, and fallback strategy.

Format

approach-walkthrough · 20 min · ~2 hr prep

Audience

Engineering managers and senior infrastructure engineers

What to prepare

  • Optional 1-2 page architecture sketch
  • Mental map of validation checkpoints and failure modes

Deliverables

  • A 20-minute verbal walkthrough of your implementation approach, tradeoffs, and validation plan

Ground rules

  • Do not build or code a pipeline; discuss your approach only
  • Use only work you are permitted to share
  • Assume standard cloud and on-prem integration constraints apply

Scoring anchors

Exceeds
Architects a resilient pipeline with explicit validation gates, automated rollback triggers, and clear degraded-mode operations that protect rider experience during legacy outages.
Meets
Designs a functional pipeline with reasonable validation steps and acknowledges legacy constraints, though fallback mechanisms lack full automation.
Below
Assumes stable connectivity, omits validation gates, and lacks any rollback or degraded-service strategy.

Response time

20 min

Positive indicators

  • Surfaces assumptions about legacy connectivity and upstream data quality
  • Proposes clear validation gates and automated rollback triggers
  • Asks clarifying questions about throughput, latency SLAs, and peak-hour constraints
  • Balances technical elegance with operational reliability and degraded-mode planning

Negative indicators

  • Assumes perfect upstream data without a robust validation strategy
  • Jumps to implementation details before framing architectural constraints
  • Lacks a clear fallback, circuit-breaker, or degraded-mode plan
  • Overlooks data schema evolution and backward compatibility

Work Simulation Scenario

Scenario. A regional transit partner is joining your MaaS platform and requires a real-time trip update pipeline. Their legacy AVL system outputs data via a proprietary protocol with known latency spikes during peak hours. You own the end-to-end delivery of this integration.

Problem to solve. Construct an approach to ingest, normalize, and publish the partner's real-time data to the GTFS-RT standard while managing latency risks, ensuring SLO compliance, and defining a testing strategy.

Format

discovery-interview · 35 min · ~2 hr prep

Success criteria

  • Identifies key constraints around legacy protocol parsing and peak-hour latency
  • Designs a resilient ingestion strategy with appropriate buffering or fallback mechanisms
  • Defines clear SLOs and testing gates before production deployment

What to review beforehand

  • GTFS-Realtime specification overview
  • Common data pipeline patterns (Kafka, Protobuf)
  • SLO and error budget fundamentals

Ground rules

  • Drive a structured discovery conversation. Ask questions to understand technical constraints, data quality, and operational expectations.
  • Focus on your approach and tradeoffs rather than producing a full spec.

Roles in scenario

Partner Systems Architect (informed_partner, played by hiring_manager)

Motivation. Wants a reliable, low-maintenance integration that doesn't require expensive legacy system upgrades.

Constraints

  • Legacy AVL system cannot be modified or patched by your team
  • Peak hour data volume spikes 3x normal baseline
  • Strict 24-hour SLA for feed availability once live

Tensions to introduce

  • Initially pushes for a simple direct API pull without considering buffering
  • Reveals intermittent data gaps only when asked about historical reliability
  • Questions the necessity of strict SLO monitoring for a 'beta' integration

In-character guidance

  • Provide honest answers about legacy system limitations, data formats, and historical failure patterns when asked
  • Do not suggest the optimal architecture
  • Stay focused on partner operational realities

Do not

  • Do not volunteer the ideal pipeline architecture
  • Do not coach the candidate on GTFS-RT compliance
  • Do not withhold information that is directly asked for

Scoring anchors

Exceeds
Constructs a resilient, well-scoped pipeline strategy by systematically uncovering constraints, proposing pragmatic buffering/fallbacks, and defining clear SLOs and testing gates.
Meets
Identifies core integration challenges, proposes a standard ingestion approach, and outlines basic monitoring and testing steps.
Below
Overlooks legacy constraints, proposes fragile direct integrations, or fails to establish reliability and testing standards.

Response time

35 min

Positive indicators

  • Asks high-information questions about data format, volume, failure modes, and historical latency patterns
  • Surfaces assumptions about peak-hour load and proposes buffering/fallback strategies
  • Defines concrete SLOs, error budgets, and testing gates aligned with independent delivery scope
  • Balances technical rigor with partner constraints

Negative indicators

  • Assumes the legacy system can be easily modified or replaced
  • Proposes overly complex solutions without validating partner constraints first
  • Fails to define testing or rollback strategies for production deployment
  • Freezes when presented with conflicting latency and SLA requirements

Progression Framework

This table shows how competencies evolve across experience levels. Each cell shows competency at that level.

Transit Technology Infrastructure & Operations

6 competencies

CompetencyJuniorMidSeniorPrincipal
Core Infrastructure Operations

Performs routine maintenance and monitoring of infrastructure components under supervision, including server health checks, firmware deployments, and VM provisioning following established SOPs.

Independently manages infrastructure lifecycle tasks and troubleshoots common operational issues across physical and virtual components supporting transit systems.

Designs infrastructure upgrades and leads complex incident resolution efforts for physical and virtual infrastructure components supporting transit systems.

Defines long-term infrastructure strategy and drives innovation in transit hardware and systems.

Open Source Software Management

Uses open source tools following established guidelines and licenses, managing dependencies for internal engineering tools and ensuring version control compliance.

Manages dependencies and contributes bug fixes to upstream projects within the transit technology stack.

Selects strategic open source solutions and manages community relationships for transit technology stack components.

Drives open source adoption strategy and influences project roadmaps.

Performance & Reliability Engineering

Monitors system metrics and alerts on performance thresholds, including AVL server health monitoring and automated performance report generation.

Tunes system parameters and implements reliability improvements to optimize system performance and ensure high availability of critical transit technology services.

Leads capacity planning and designs for fault tolerance to optimize system performance and ensure high availability of critical transit technology services.

Defines reliability targets (SLOs) and drives continuous improvement culture.

Security, Compliance & Governance

Follows security checklists and reports vulnerabilities, including data anonymization for privacy protection and compliance procedure documentation for audits.

Implements security controls and conducts compliance audits to ensure transit technology systems adhere to security protocols and privacy regulations.

Designs security architectures and leads incident response planning ensuring transit technology systems adhere to security protocols, privacy regulations, and governance frameworks.

Sets organizational security policy and manages regulatory relationships.

System Integration & MaaS

Supports integration testing and documents interface specifications for transit system integrations and Mobility as a Service platforms under guidance from senior staff.

Builds integration connectors and manages data flow between disparate transit systems and Mobility as a Service platforms.

Architects multi-vendor integration solutions and manages stakeholder alignment for disparate transit systems and Mobility as a Service platforms.

Defines ecosystem integration strategy and partners with external MaaS providers.

Transit Data Standards & APIs

Assists in documenting API endpoints and validating data formats against standards, including GTFS feed validation, schedule data pipeline updates, and open data publishing following established procedures.

Develops and maintains API integrations and ensures data quality compliance across transit data systems and rider-facing applications.

Architects data exchange frameworks and resolves complex interoperability challenges ensuring data standards and API interfaces work across transit data systems.

Leads industry working groups on data standards and defines organizational data strategy.