Case Study 02 · WayBetter · Consumer Health / SaaS

BI Stack
Modernization

Rebuilt WayBetter's entire analytics layer — replacing fragmented Google Data Studio reports with a production dbt + Looker platform that eliminated manual reporting, surfaced churn drivers, and gave every team self-service access to reliable data.

Company
WayBetter
Timeline
August 2020 – June 2022
Role
Business Intelligence Analyst
Stack
dbt · Looker · Databricks · SQL · Segment · Python
01

The Problem

WayBetter ran a behavioral change platform — people bet money on achieving their health goals. The product had strong engagement, but the analytics stack was a liability. Twenty-plus live reports were built directly in Google Data Studio against raw database tables, with SQL written inline and no version control. Any schema change downstream could silently break reports that leadership checked every morning.

The manual reporting burden was severe. Ten hours per week went to pulling data, formatting spreadsheets, and emailing metrics to stakeholders who could have looked it up themselves. The analytics team was reactive — fielding ad-hoc requests all day instead of doing analysis that moved the business.

More critically: no one had a clean picture of why users churned. There was event data in Segment, but it wasn't modeled. Anonymous user sessions weren't being joined to authenticated accounts. The behavioral signals that predicted churn were sitting in raw tables, untouched.

"Ten hours per week went to manual reporting. The analytics team was fully reactive — fielding ad-hoc data requests instead of asking and answering the questions that mattered."

02

Before & After

Before After
20+ reports in Google Data Studio with inline SQL 20+ reports in Looker built on versioned dbt models
No testing — broken reports discovered by stakeholders 300+ dbt tests running in CI; failures block deployment
10 hours/week of manual reporting tasks Fully automated; zero manual exports
Anonymous Segment events not joined to users Sessionized event pipeline with anonymous→user ID mapping
No self-service — every question required an analyst Business users answer their own questions in Looker
No documented churn model Churn driver analysis informed a retention campaign: −10% churn Q4 2021
03

How We Built It

The project ran in four phases over roughly five months. The core principle was to deliver value incrementally — starting with the highest-traffic reports and the most broken data pipelines, not trying to migrate everything at once.

01
Source Audit & dbt Project Foundation
Inventoried all Segment event types, PostgreSQL source tables, and existing Google Data Studio reports. Ranked reports by usage frequency and business criticality. Set up the dbt project structure — staging, intermediate, and mart layers — with CI running tests on every pull request from day one. Fixed three broken source pipelines before building any models on top of them.
02
Event Sessionization & Identity Resolution
Built the intermediate layer's most complex piece: joining anonymous Segment events to authenticated user records. This required resolving the identity graph — finding the anonymous IDs that preceded each user's first login and backfilling pre-signup behavioral events to the correct user. The result was a complete behavioral timeline per user, which became the foundation for all retention analysis.
03
Mart Layer & dbt Model Build-Out
Built production dbt models covering user acquisition, engagement, challenge completion, and customer lifecycle. Each mart model had 300+ cumulative tests across the project — uniqueness, not-null, referential integrity, and custom business-logic validations. Implemented incremental models for high-volume event tables to keep pipeline runs under 15 minutes.
04
Looker Migration, Training & Deprecation
Rebuilt all 20+ reports in Looker on top of the new dbt models, running both versions in parallel for two weeks to validate number consistency. Ran 45-minute training sessions with every stakeholder team — not screen-share demos, but hands-on sessions where users built their own Explores. Formally deprecated the Google Data Studio reports once each team signed off.
04

The Migration Approach

The biggest mistake I see in BI migrations is trying to lift-and-shift: taking every existing report and rebuilding it exactly in the new tool. That approach preserves all the debt and misses the opportunity to fix the underlying problems.

Instead, I started with the most-used, highest-trust reports and rebuilt them properly — staging models from source, intermediate transformations, mart-level fact and dimension tables, and then Looker Explores on top. Each migration was a chance to add tests, clean up column names, and document what the data actually meant.

ELT Pipeline — Segment to Looker
Segment
raw events
Databricks
raw landing
dbt staging
stg_events
dbt intermediate
sessionization
dbt mart
fct_user_behavior
Looker
self-service

The most technically involved piece was the event sessionization pipeline. WayBetter's product collected rich behavioral data via Segment — page views, game actions, challenge completions — but anonymous and authenticated events weren't linked. A user who browsed the app before signing up left a trail of anonymous events that were completely disconnected from their post-signup behavior.

I built an intermediate dbt model that resolved this identity graph: for each authenticated user, find the anonymous ID(s) that preceded their first login using a window function approach, then backfill the pre-signup events with the resolved user ID. This gave us a complete behavioral timeline per user — from first touch through churn or retention.

models/intermediate/int_user_identity_graph.sql
-- Resolve anonymous → authenticated user identity
WITH login_events AS (
  -- First time we see an authenticated user_id for an anonymous_id
  SELECT
    anonymous_id,
    user_id,
    MIN(event_timestamp) AS identified_at
  FROM {{ ref('stg_segment_identify') }}
  WHERE user_id IS NOT NULL
  GROUP BY 1, 2
),

resolved AS (
  -- Join all events back to their resolved user_id
  SELECT
    e.event_id,
    e.anonymous_id,
    COALESCE(e.user_id, li.user_id) AS resolved_user_id,
    e.event_timestamp,
    e.event_type,
    CASE
      WHEN e.user_id IS NULL AND li.user_id IS NOT NULL
        THEN 'pre_signup'
      WHEN e.user_id IS NOT NULL
        THEN 'authenticated'
      ELSE 'anonymous_unresolved'
    END AS identity_state
  FROM {{ ref('stg_segment_events') }} e
  LEFT JOIN login_events li
    ON e.anonymous_id = li.anonymous_id
)

SELECT * FROM resolved
05

Churn Driver Analysis

With a clean behavioral timeline, the churn analysis became tractable. I built a cohort model in dbt that tracked each user's engagement trajectory from signup through their first 30 days — the window that statistically predicted long-term retention.

The analysis surfaced a clear pattern: users who completed their first game action within 48 hours of signup had dramatically higher 90-day retention. Users who signed up but didn't complete that action within the window churned at nearly double the rate. The implication was direct: the product needed an onboarding intervention that drove users to their first game completion faster.

These findings were presented to the product and marketing teams as a Looker dashboard — not a one-time slide deck, but a live view that any stakeholder could filter and explore. Marketing used it to design a targeted re-engagement campaign focused specifically on users who signed up but hadn't yet activated. The campaign ran in Q4 2021 and reduced churn by 10%.

"The analysis was only possible because the event sessionization pipeline existed. Without clean, complete user timelines — with pre-signup behavior attached — the activation signal would have been invisible in the data."

06

The Looker Migration

Migrating 20+ reports isn't just a technical task — it's a change management project. Stakeholders have muscle memory around their existing dashboards. If the new versions look different or behave slightly differently, they'll distrust them even if the numbers are more accurate.

I handled this in three phases. Phase one: rebuild the most-used reports in Looker exactly as they existed in Data Studio, so stakeholders had a reference point. Phase two: run both versions in parallel for two weeks, addressing any discrepancies (which were always opportunities to fix data quality issues). Phase three: train every user with a 45-minute live session and a written reference guide, then sunset the old reports.

The training investment paid off immediately. Within a month of migration, stakeholders were building their own Explores — filtering by date, cohort, and product segment without waiting for analyst support. Ad-hoc data requests dropped by 40%, and the analytics team shifted from reactive to proactive.

07

Results

−40%
Ad-hoc Requests
After Looker migration — teams moved from requesting data to finding it themselves
−10 hrs
Manual Reporting
Per week, fully automated via dbt pipelines and scheduled Looker dashboards
+25%
Data Reliability
Measured by reduction in data errors and failed report loads
−10% churn
Q4 2021 Retention
Driven by campaign informed by event sessionization analysis
+5%
User Conversion
Marketing team optimized campaign spend using lifecycle dashboards
300+
dbt Tests Deployed
Running in CI — broken models no longer reach production silently
08

Lessons Learned

  • 01 Migration is change management, not just engineering. The technical work was the smaller half of this project. Stakeholder buy-in, parallel running periods, and real training sessions determined whether the new platform actually got used.
  • 02 Event data is worthless without identity resolution. Segment collected excellent behavioral signals, but they were siloed by session. The work to stitch anonymous and authenticated events together was what unlocked the churn analysis — and it required careful business rule definition before any code was written.
  • 03 Self-service requires investment in the model layer, not just the BI tool. Looker didn't create self-service on its own. The dbt models underneath had to be designed for the questions people wanted to ask — clean joins, sensible grain, documented fields. The BI tool is the interface; the data model is the product.
  • 04 Parallel running periods are non-negotiable. Every discrepancy between old and new during the two-week overlap was a data quality bug that had been silently producing wrong numbers. Finding them before sunset — not after — was what made the migration trustworthy.