BI Stack Modernization — WayBetter

01

The Problem

WayBetter ran a behavioral change platform — people bet money on achieving their health goals. The product had strong engagement, but the analytics stack was a liability. Twenty-plus live reports were built directly in Google Data Studio against raw database tables, with SQL written inline and no version control. Any schema change downstream could silently break reports that leadership checked every morning.

The manual reporting burden was severe. Ten hours per week went to pulling data, formatting spreadsheets, and emailing metrics to stakeholders who could have looked it up themselves. The analytics team was reactive — fielding ad-hoc requests all day instead of doing analysis that moved the business.

More critically: no one had a clean picture of why users churned. There was event data in Segment, but it wasn't modeled. Anonymous user sessions weren't being joined to authenticated accounts. The behavioral signals that predicted churn were sitting in raw tables, untouched.

"Ten hours per week went to manual reporting. The analytics team was fully reactive — fielding ad-hoc data requests instead of asking and answering the questions that mattered."

02

Before & After

Before	After
20+ reports in Google Data Studio with inline SQL	20+ reports in Looker built on versioned dbt models
No testing — broken reports discovered by stakeholders	300+ dbt tests running in CI; failures block deployment
10 hours/week of manual reporting tasks	Fully automated; zero manual exports
Anonymous Segment events not joined to users	Sessionized event pipeline with anonymous→user ID mapping
No self-service — every question required an analyst	Business users answer their own questions in Looker
No documented churn model	Churn driver analysis informed a retention campaign: −10% churn Q4 2021

03

How We Built It

The project ran in four phases over roughly five months. The core principle was to deliver value incrementally — starting with the highest-traffic reports and the most broken data pipelines, not trying to migrate everything at once.

01

Source Audit & dbt Project Foundation

Inventoried all Segment event types, PostgreSQL source tables, and existing Google Data Studio reports. Ranked reports by usage frequency and business criticality. Set up the dbt project structure — staging, intermediate, and mart layers — with CI running tests on every pull request from day one. Fixed three broken source pipelines before building any models on top of them.

02

Event Sessionization & Identity Resolution

Built the intermediate layer's most complex piece: joining anonymous Segment events to authenticated user records. This required resolving the identity graph — finding the anonymous IDs that preceded each user's first login and backfilling pre-signup behavioral events to the correct user. The result was a complete behavioral timeline per user, which became the foundation for all retention analysis.

03

Mart Layer & dbt Model Build-Out

Built production dbt models covering user acquisition, engagement, challenge completion, and customer lifecycle. Each mart model had 300+ cumulative tests across the project — uniqueness, not-null, referential integrity, and custom business-logic validations. Implemented incremental models for high-volume event tables to keep pipeline runs under 15 minutes.

04

Looker Migration, Training & Deprecation

Rebuilt all 20+ reports in Looker on top of the new dbt models, running both versions in parallel for two weeks to validate number consistency. Ran 45-minute training sessions with every stakeholder team — not screen-share demos, but hands-on sessions where users built their own Explores. Formally deprecated the Google Data Studio reports once each team signed off.

04

The Migration Approach

The biggest mistake I see in BI migrations is trying to lift-and-shift: taking every existing report and rebuilding it exactly in the new tool. That approach preserves all the debt and misses the opportunity to fix the underlying problems.

Instead, I started with the most-used, highest-trust reports and rebuilt them properly — staging models from source, intermediate transformations, mart-level fact and dimension tables, and then Looker Explores on top. Each migration was a chance to add tests, clean up column names, and document what the data actually meant.

ELT Pipeline — Segment to Looker

Segment

raw events

→

Databricks

raw landing

→

dbt staging

stg_events

→

dbt intermediate

sessionization

→

dbt mart

fct_user_behavior

→

Looker

self-service

The most technically involved piece was the event sessionization pipeline. WayBetter's product collected rich behavioral data via Segment — page views, game actions, challenge completions — but anonymous and authenticated events weren't linked. A user who browsed the app before signing up left a trail of anonymous events that were completely disconnected from their post-signup behavior.

I built an intermediate dbt model that resolved this identity graph: for each authenticated user, find the anonymous ID(s) that preceded their first login using a window function approach, then backfill the pre-signup events with the resolved user ID. This gave us a complete behavioral timeline per user — from first touch through churn or retention.

models/intermediate/int_user_identity_graph.sql

-- Resolve anonymous → authenticated user identity
WITH login_events AS (
  -- First time we see an authenticated user_id for an anonymous_id
  SELECT
    anonymous_id,
    user_id,
    MIN(event_timestamp) AS identified_at
  FROM {{ ref('stg_segment_identify') }}
  WHERE user_id IS NOT NULL
  GROUP BY 1, 2
),

resolved AS (
  -- Join all events back to their resolved user_id
  SELECT
    e.event_id,
    e.anonymous_id,
    COALESCE(e.user_id, li.user_id) AS resolved_user_id,
    e.event_timestamp,
    e.event_type,
    CASE
      WHEN e.user_id IS NULL AND li.user_id IS NOT NULL
        THEN 'pre_signup'
      WHEN e.user_id IS NOT NULL
        THEN 'authenticated'
      ELSE 'anonymous_unresolved'
    END AS identity_state
  FROM {{ ref('stg_segment_events') }} e
  LEFT JOIN login_events li
    ON e.anonymous_id = li.anonymous_id
)

SELECT * FROM resolved

05

Churn Driver Analysis

With a clean behavioral timeline, the churn analysis became tractable. I built a cohort model in dbt that tracked each user's engagement trajectory from signup through their first 30 days — the window that statistically predicted long-term retention.

The analysis surfaced a clear pattern: users who completed their first game action within 48 hours of signup had dramatically higher 90-day retention. Users who signed up but didn't complete that action within the window churned at nearly double the rate. The implication was direct: the product needed an onboarding intervention that drove users to their first game completion faster.

These findings were presented to the product and marketing teams as a Looker dashboard — not a one-time slide deck, but a live view that any stakeholder could filter and explore. Marketing used it to design a targeted re-engagement campaign focused specifically on users who signed up but hadn't yet activated. The campaign ran in Q4 2021 and reduced churn by 10%.

"The analysis was only possible because the event sessionization pipeline existed. Without clean, complete user timelines — with pre-signup behavior attached — the activation signal would have been invisible in the data."

06

The Looker Migration

Migrating 20+ reports isn't just a technical task — it's a change management project. Stakeholders have muscle memory around their existing dashboards. If the new versions look different or behave slightly differently, they'll distrust them even if the numbers are more accurate.

I handled this in three phases. Phase one: rebuild the most-used reports in Looker exactly as they existed in Data Studio, so stakeholders had a reference point. Phase two: run both versions in parallel for two weeks, addressing any discrepancies (which were always opportunities to fix data quality issues). Phase three: train every user with a 45-minute live session and a written reference guide, then sunset the old reports.

The training investment paid off immediately. Within a month of migration, stakeholders were building their own Explores — filtering by date, cohort, and product segment without waiting for analyst support. Ad-hoc data requests dropped by 40%, and the analytics team shifted from reactive to proactive.

07

Results

−40%

Ad-hoc Requests

After Looker migration — teams moved from requesting data to finding it themselves

−10 hrs

Manual Reporting

Per week, fully automated via dbt pipelines and scheduled Looker dashboards

+25%

Data Reliability

Measured by reduction in data errors and failed report loads

−10% churn

Q4 2021 Retention

Driven by campaign informed by event sessionization analysis

+5%

User Conversion

Marketing team optimized campaign spend using lifecycle dashboards

300+

dbt Tests Deployed

Running in CI — broken models no longer reach production silently

08

Lessons Learned

01 Migration is change management, not just engineering. The technical work was the smaller half of this project. Stakeholder buy-in, parallel running periods, and real training sessions determined whether the new platform actually got used.
02 Event data is worthless without identity resolution. Segment collected excellent behavioral signals, but they were siloed by session. The work to stitch anonymous and authenticated events together was what unlocked the churn analysis — and it required careful business rule definition before any code was written.
03 Self-service requires investment in the model layer, not just the BI tool. Looker didn't create self-service on its own. The dbt models underneath had to be designed for the questions people wanted to ask — clean joins, sensible grain, documented fields. The BI tool is the interface; the data model is the product.
04 Parallel running periods are non-negotiable. Every discrepancy between old and new during the two-week overlap was a data quality bug that had been silently producing wrong numbers. Finding them before sunset — not after — was what made the migration trustworthy.