BI Developer

Ryan Mahoney

Why this role is hard · Ryan Mahoney

Finding the right mid-level BI developer is tricky because you need someone who can build reliable data pipelines and model complex warehouses independently while still listening to what stakeholders actually need. Plenty of candidates write clean SQL, but they struggle to turn vague business requests into stable data structures. You spot this when they build a dashboard that answers a question nobody asked or spend time optimizing a query that never runs in production. The real test is whether they can sit with a transit planner, hear the actual constraint behind the request, and adjust the pipeline schedule before it breaks downstream systems.

Core Evaluation

Critical questions for this role

The competency and attitude questions below are where the hiring decision is made. They run in the live interview rounds and are calibrated to the level selected above.

18 Competency Questions

1 of 18
  1. Discipline

    Data Engineering & Architecture

  2. Job requirement

    Data Pipeline Construction & ETL

    Develops scalable pipeline workflows, implements automated error handling, and optimizes transformation logic for performance.

  3. Expected at Mid

    Mid-level developers must independently handle routine to moderately complex pipeline workflows, ensuring reliable data delivery without senior oversight.

Interview round: Hiring Manager Technical Deep Dive

Walk me through a recent data ingestion workflow you built from scratch. What were the source characteristics, and how did you structure the transformation steps?

Positive indicators

  • Describes modular, reusable pipeline components
  • Mentions automated testing or validation checkpoints
  • References specific query optimization techniques used
  • Explains how they documented pipeline logic for others
  • Details a clear rollback or recovery strategy

Negative indicators

  • Relies on manual scripts with no automated error handling
  • Cannot explain how data lineage or quality is tracked
  • Focuses only on extraction without discussing transformation
  • Lacks awareness of idempotency or duplicate handling
  • Vague about performance impacts or optimization efforts

9 Attitude Questions

1 of 9

Active Listening

The disciplined cognitive and interpersonal practice of fully concentrating on, comprehending, and retaining stakeholder communications to accurately translate operational realities, implicit business constraints, and qualitative insights into precise data requirements, analytical logic, and BI deliverables.

Interview round: Recruiter Screen & Role Alignment

You’re tasked with rebuilding a legacy farebox recovery report. The original creator is unavailable, and current users have conflicting definitions. What steps do you take to align on the metric?

Positive indicators

  • Maps conflicting definitions to specific business processes
  • Creates a comparison matrix of old vs new logic
  • Secures explicit approval before finalizing the rebuild

Negative indicators

  • Picks one definition arbitrarily without consulting stakeholders
  • Ignores legacy documentation and starts from scratch
  • Delivers the new report without explaining the definition shift

Supporting Evaluation

How candidates earn the selection conversation

The goal is to reduce effort for everyone by collecting more useful signal before adding more interviews. Lightweight application prompts and structured screens help the panel focus live time on the candidates most likely to succeed.

Stage 1 · Application

Filter at the door

Runs the moment a candidate hits Submit. Disqualifying answers end the application; everything else is captured for review.

Knock-out Questions

1 of 2

Application Screen: Knock-out

Do you have at least 3 years of professional, production-level experience developing and optimizing BI data models using dbt and Snowflake?

Yes
Qualifies
No
Auto-decline

Video-Response Questions

1 of 3

Application Screen: Video Response

Describe a time when your data pipeline failed to meet a critical dashboard refresh SLA. How did you communicate the technical root cause and revised timeline to non-technical operations leaders, and what steps did you take to rebuild their confidence in the system?

Candidate experience

REC
0:42 / 2:00
1Record
2Review
3Submit

Response time

2 min

Format

Recorded video

Stage 2 · Resume Screening

Read the resume against fixed criteria

Reviewers score every application that clears the door against the same criteria. Stronger reviews advance to live interviews; weaker ones are archived without further screening.

Resume Review Criteria

8 criteria
Designs and implements structured data models that reconcile disparate operational and financial systems into unified, auditable fact tables.
Configures automated scheduling, testing, and deployment workflows to ensure reliable, version-controlled data delivery across environments.
Refactors analytical queries and data models to meet latency targets and reduce compute costs on high-volume datasets.
Partners with non-technical stakeholders to convert ambiguous business needs into structured, auditable data models and self-service reporting tools.

Is the resume complete, well-organized, and free from formatting, spelling, and grammar mistakes?

Does the cover letter or personal statement convey clear relevance and familiarity with the job?

Does the resume show relevant prior work experience?

Does the resume indicate required academic credentials, relevant certifications, or necessary training?

Stage 3 · During Interviews

Where the hire is decided

Interview rounds use the competency and attitude questions outlined above, then add tests, work simulations, and presentations that reveal deeper evidence about how the candidate thinks and works.

Coding Test

Live Interview · Coding Test

Without AI

Write the solution independently. Focus on incremental materialization strategy, window function usage, and explicit handling of out-of-order telemetry records.

Using the starter code as a base, draft a dbt incremental SQL model that calculates latency_minutes and a rolling 7-day average latency per route. Ensure late-arriving records are merged correctly without duplicating daily aggregates.

With AI

You may use AI to generate boilerplate or suggest syntax, but you must explicitly document how you verified the output, adjusted it for edge cases (like out-of-order telemetry), and ensured it aligns with dbt incremental best practices. We are evaluating your critical review and adaptation skills.

Using the starter code as a base, draft a dbt incremental SQL model that calculates latency_minutes and a rolling 7-day average latency per route. Ensure late-arriving records are merged correctly without duplicating daily aggregates.

Response time

20 min

Positive indicators

  • Correct use of `is_incremental()` macro for dbt incremental logic
  • Window functions partitioned correctly by route and ordered by timestamp
  • Explicit `UNIQUE_KEY` or `MERGE` strategy to prevent duplicate aggregation
  • Clear handling of late-arriving telemetry via partition filters
  • Targeted AI prompts for window function syntax rather than full model generation
  • Manual validation of incremental merge logic against late-arrival scenarios
  • Clear documentation of AI-assisted vs candidate-designed components
  • Adjustment of AI output to enforce dbt-specific partitioning and unique keys

Negative indicators

  • Full table scans instead of incremental partitioning
  • Missing window frame definitions causing incorrect rolling averages
  • Ignoring late-arriving data, leading to skewed historical metrics
  • Hardcoded dates that break dbt's incremental execution model
  • Blindly accepting AI-generated dbt configs without verifying incremental behavior
  • Failing to test window functions against out-of-order timestamp edge cases
  • Missing documentation of verification steps or AI reliance
  • Overcomplicating the model with unnecessary AI-suggested CTEs

Presentation Prompt

Walk us through your approach to structuring performant dbt models for real-time vehicle arrival latency tracking across a specific transit domain. Discuss your dimensional modeling choices, how you would manage high-concurrency data refreshes, and your rationale for selecting appropriate BI visualizations for dispatch and finance teams.

Format

deck-and-walkthrough · 20 min · ~2 hr prep

Audience

Data Engineering Leads, Product Managers, and BI Stakeholders

What to prepare

  • 3-5 slides outlining your dimensional model structure, refresh strategy, and visualization choices
  • A structured narrative connecting technical decisions to stakeholder workflows

Deliverables

  • A brief slide deck and a structured verbal walkthrough of your modeling approach

Ground rules

  • Use anonymized or hypothetical transit data schemas
  • Focus on design rationale, trade-offs, and stakeholder alignment rather than producing a production-ready schema
  • Do not build actual dbt projects or Looker dashboards

Scoring anchors

Exceeds
Balances performance with maintainability, explicitly maps business metrics to model layers, anticipates scaling needs, and demonstrates strong cross-functional stakeholder empathy.
Meets
Presents a functional dimensional model, addresses basic refresh and concurrency needs, aligns visuals with user roles, and maintains a coherent technical narrative.
Below
Lacks a clear modeling strategy, overlooks performance constraints, misaligns visuals with business questions, or presents fragmented logic without stakeholder context.

Response time

20 min

Positive indicators

  • Clearly explains star or snowflake schema choices tied to query patterns
  • Addresses concurrency limits and refresh SLOs with specific mitigation tactics
  • Ties model design and visual selections directly to dispatch and finance workflows
  • Anticipates downstream reporting bottlenecks and proposes incremental refresh strategies

Negative indicators

  • Proposes overly normalized or denormalized structures without clear justification
  • Ignoes refresh latency impacts on dashboard performance
  • Selects visuals that do not match the operational or financial use case
  • Fails to address data quality checks or incremental load considerations

Work Simulation Scenario

Scenario. You own the end-to-end delivery for the farebox recovery analytics domain. Finance and dispatch teams are reporting conflicting timestamp definitions between CAD/AVL and AFC vault systems, causing dashboard latency and reconciliation disputes during peak service windows. You need to architect optimized dbt models and Looker dashboards that maintain sub-5-minute refresh SLOs while ensuring financial accuracy.

Problem to solve. Drive a structured discovery conversation to understand the root causes of timestamp mismatches, define acceptable trade-offs between latency and accuracy, and determine the data modeling approach that will satisfy both finance and operations without exceeding compute budgets.

Format

discovery-interview · 35 min · ~2 hr prep

Success criteria

  • Identifies the precise nature of timestamp discrepancies and their operational impact
  • Clarifies sub-5-minute SLO requirements and compute constraints
  • Proposes a data modeling strategy that balances freshness, accuracy, and cost
  • Establishes clear stakeholder alignment on metric definitions and refresh cadence

What to review beforehand

  • Understanding of dbt incremental models and Looker performance tuning
  • Basic knowledge of farebox reconciliation workflows and AFC vault data

Ground rules

  • Focus on scoping the problem, defining trade-offs, and aligning stakeholders
  • The partner will answer only what you ask; drive the conversation with targeted questions
  • This is a discovery and framing exercise, not a live coding session

Roles in scenario

Finance Analytics Manager (informed_partner, played by cross_functional)

Motivation. Needs accurate, timely fare recovery metrics for daily cash reconciliation and board reporting, but is frustrated by dashboard latency and metric confusion.

Constraints

  • Finance requires exact timestamp alignment for audit trails
  • Dashboards must refresh within 5 minutes during 6-9 AM and 3-6 PM peaks
  • Compute budget is capped; expensive full-refresh strategies are not viable

Tensions to introduce

  • Points out that AFC vault timestamps are batch-processed at 15-minute intervals, conflicting with real-time CAD/AVL streams
  • Emphasizes that finance prioritizes accuracy over sub-second latency, but dispatch demands near real-time visibility
  • Will only clarify batch processing windows if asked directly

In-character guidance

  • Provide honest answers about finance priorities and audit requirements
  • Explain the operational impact of latency and timestamp mismatches when queried
  • Maintain a collaborative but firm stance on audit compliance

Do not

  • Do not volunteer the exact batch processing schedule unless asked
  • Do not suggest specific Looker or dbt optimization techniques
  • Do not coach the candidate on how to balance finance vs dispatch needs

Scoring anchors

Exceeds
Systematically uncovers hidden batch processing constraints, frames a clear latency vs. accuracy trade-off strategy, and proposes a scalable, cost-aware incremental modeling approach that satisfies both audit and operational needs.
Meets
Identifies key timestamp discrepancies, asks relevant questions about SLOs and compute limits, and outlines a reasonable dbt/Looker optimization strategy with stakeholder alignment.
Below
Makes unfounded assumptions about data freshness, ignores compute budget constraints, fails to clarify audit requirements, or proposes a rigid solution that doesn't address the core reconciliation conflict.

Response time

35 min

Positive indicators

  • Asks targeted questions to uncover the root cause of timestamp mismatches and batch processing constraints
  • Explicitly defines and negotiates the trade-off between latency, accuracy, and compute cost
  • Proposes a structured approach to incremental modeling that aligns with peak-hour SLOs
  • Checks for stakeholder understanding and aligns on metric definitions before finalizing scope

Negative indicators

  • Assumes finance and dispatch share identical latency expectations without verifying
  • Proposes expensive full-refresh models without inquiring about compute constraints
  • Uses technical jargon without translating it into business impact or operational timelines
  • Fails to establish a clear reconciliation logic or fallback for batch-processed timestamps

Progression Framework

This table shows how competencies evolve across experience levels. Each cell shows competency at that level.

Data Engineering & Architecture

4 competencies

CompetencyJuniorMidSeniorPrincipal
Data Pipeline Construction & ETL

Writes and debugs basic ETL scripts under supervision, ensuring scheduled data loads complete without errors.

Develops scalable pipeline workflows, implements automated error handling, and optimizes transformation logic for performance.

Architects end-to-end ingestion frameworks, establishes data quality gates, and leads modernization of legacy batch processes to streaming.

Defines enterprise data pipeline strategy, drives adoption of advanced orchestration patterns, and mentors teams on resilient data architecture.

Dimensional Modeling & Data Warehousing

Assists in building and documenting basic fact and dimension tables under established modeling standards.

Designs normalized and denormalized schemas, manages slowly changing dimensions, and optimizes query performance.

Architects enterprise data warehouse models, implements data vault methodologies, and governs schema evolution across domains.

Sets organization-wide modeling standards, evaluates semantic layer technologies, and aligns data architecture with long-term business intelligence strategy.

Energy Telemetry & Fleet Electrification Analytics

Collects and formats raw telematics and charging data, generates basic energy consumption reports.

Builds analytical models for charge cycle efficiency, integrates CAN bus data streams, and develops fleet energy dashboards.

Architects IoT telemetry pipelines, optimizes battery degradation models, and aligns energy analytics with operational scheduling.

Defines enterprise energy data strategy, integrates predictive maintenance with charging infrastructure planning, and leads sustainability reporting frameworks.

Spatial Analytics & GTFS Processing

Cleans and validates GTFS feeds, performs basic geocoding, and supports spatial data visualization tasks.

Develops spatial transformation pipelines, integrates real-time vehicle positions, and builds GIS-enabled analytical datasets.

Architects geospatial data platforms, optimizes spatial indexing for large transit networks, and leads location intelligence initiatives.

Drives adoption of advanced spatial analytics frameworks, standardizes transit geospatial data contracts, and pioneers predictive routing models.

Operations, Compliance & Emerging Mobility

4 competencies

CompetencyJuniorMidSeniorPrincipal
Autonomous & Connected Transit Data Platforms

Processes raw autonomous vehicle logs, formats sensor data for analysis, and supports basic safety metric calculations.

Builds pipelines for V2X communication data, implements edge-to-cloud synchronization, and develops autonomous fleet performance dashboards.

Architects real-time autonomous data platforms, optimizes high-frequency sensor storage, and leads safety compliance data frameworks.

Defines enterprise AV data strategy, integrates predictive safety models with operational planning, and pioneers standardized connected transit data contracts.

Microtransit & DRT Data Integration

Ingests microtransit booking and routing datasets, performs basic data validation, and supports ad-hoc analysis.

Develops APIs for real-time DRT data exchange, builds unified demand-supply matching datasets, and optimizes integration workflows.

Architects scalable microtransit data platforms, implements dynamic pricing/routing analytics, and leads cross-system interoperability.

Defines enterprise mobility-as-a-service data strategy, pioneers predictive demand modeling for DRT networks, and establishes industry integration standards.

Regulatory Audit & Compliance Reporting

Compiles standard compliance datasets, verifies data completeness, and assists in audit documentation.

Automates regulatory report generation, implements data retention policies, and validates audit log integrity.

Architects compliance data frameworks, establishes data lineage for audit readiness, and leads regulatory impact assessments.

Defines enterprise data governance strategy, aligns compliance architectures with evolving transit regulations, and advises leadership on risk mitigation.

System Reliability & Performance Monitoring

Monitors system dashboards, responds to basic alerts, and documents performance incidents.

Configures monitoring thresholds, develops automated alerting rules, and performs root cause analysis for performance degradation.

Designs comprehensive observability architectures, establishes SLO/SLA frameworks, and leads incident response post-mortems.

Sets enterprise reliability engineering standards, drives chaos testing initiatives, and aligns system resilience with operational continuity goals.