Case Study 03 · TheGuarantors · Forecasting & Planning

State-Level Deal Volume Forecast

Built a 50-state deal volume forecasting system — training ARIMA, ETS, and weighted-recent ARIMA models for each state, selecting the best performer via walk-forward backtesting, and incorporating state-level real estate macro indicators as exogenous variables. Output: a 40% YoY growth target adopted by leadership for investor planning and headcount decisions.

TheGuarantors
Time Series Forecasting · State-Level Modeling · Annual Planning
Senior Analytics Engineer — sole analyst
Python · statsmodels · pdfplumber · SQL · dbt · Databricks · Power BI

The Problem

Every year, TheGuarantors' leadership needed a forward-looking deal volume forecast to drive headcount planning, underwriting capacity decisions, and sales targets — broken down by state to reflect the geographic variation in our rental housing business. Before this project, those numbers were produced by hand: a senior leader would apply a gut-feel growth percentage to prior year actuals and call it a plan.

The problem wasn't just accuracy. It was credibility and defensibility. When actuals diverged from plan — as they always did — there was no model to interrogate, no assumptions to revisit, and no structured way to update the forecast mid-year. Finance would ask why the number was wrong and there was no rigorous answer.

The added complexity: deal volume at TheGuarantors doesn't move in isolation. We operate in the rental housing market, and that market varies significantly by state. Rental vacancy rates, the balance of space demand versus new deliveries, and local rent growth are the real drivers of our addressable market — and they behave very differently in Texas versus New York versus Arizona. A single national model would flatten the signals that matter most.

"The goal wasn't just a number — it was a defensible, updatable, state-aware model that leadership could present to investors and use to drive real resource decisions."

Forecasting Approach

The project ran in four phases: data preparation, model training across three families, walk-forward model selection per state, and macro adjustment via state-level indicators.

01

Historical Data Preparation

Pulled 3+ years of monthly deal submission data by state from Databricks using dbt-modeled mart tables. Handled missing months, corrected for a known data migration gap in 2021, and decomposed each state series to confirm trend, seasonality, and residual components before touching any model code.

02

Model Family Selection

Three model families were trained for each of the 50 states: ARIMA (baseline autoregressive), ETS (exponential smoothing), and a weighted-recent ARIMA that down-weights older observations to prioritize post-pandemic housing market behavior. Where state-level macro indicators showed sufficient predictive correlation, SARIMA-X configurations with exogenous variables were also tested.

03

Walk-Forward Backtesting & Per-State Selection

Walk-forward backtesting across the last 12 months of history for every state: train on everything before month N, forecast month N, advance by one month, repeat. Evaluated each candidate on MAE and RMSE. Selected the model with the lowest combined error per state — resulting in different winning model families across states based on their individual data characteristics.

04

State-Level Macro Adjustment

Applied structured overlays using three state-level real estate indicators — rental vacancy rate, space demand relative to deliveries, and rent growth YoY. Each indicator was normalized per state and weighted by its historical predictive correlation with that state's deal volume. Combined into a single adjustment multiplier capped at ±7.5% to preserve model stability.

05

Scenario Planning & Delivery

Produced three scenarios — base, downside, upside — at both state and national level. Delivered as a Python-generated report fed into a Power BI dashboard, with monthly actuals tracking against forecast automatically as new deal data landed in dbt.

Model Families & Per-State Selection

Rather than forcing every state into the same model structure, each state competed across three model families. The winning model was selected independently per state based on walk-forward MAE and RMSE.

Family 1

ARIMA / SARIMA

Baseline autoregressive model with optional seasonal component (period=12). Favored by states with strong, stable annual lease renewal cycles — markets where summer peaks and January troughs dominate the signal.

Family 2

ETS (Exponential Smoothing)

Adaptive smoothing that weights recent observations more heavily. Favored by smaller or more volatile state markets where limited history makes ARIMA parameter estimation unstable.

Family 3

Weighted-Recent ARIMA

ARIMA variant that down-weights pre-2021 observations to prioritize post-pandemic housing market behavior. Favored by states where COVID-era dynamics structurally shifted deal patterns.

The table below shows representative backtesting results. Results varied by state — this reflects a typical competitive mid-size market.

Model Family MAE RMSE Notes Selected
ARIMA(1,1,1) ARIMA 312418 No seasonal term
ARIMA(2,1,1) ARIMA 287391 Better AR fit
SARIMA(2,1,1)(1,1,0)[12] SARIMA 241334 Annual lease cycle captured ✓ Selected
SARIMA(1,1,1)(1,1,0)[12] SARIMA 259352 Slightly weaker AR fit
ETS(A,A,A) ETS 268361 Preferred for smaller markets
Weighted ARIMA (α=0.7) Weighted 271368 Post-pandemic weight shift

SARIMA(2,1,1)(1,1,0)[12] won in stable, high-volume states — the seasonal component (period=12) aligned with the annual lease renewal rhythm, and the two-lag AR term reflected a deal pipeline that spans 6–8 weeks. In smaller or more volatile state markets, ETS or Weighted ARIMA frequently outperformed — underscoring why per-state selection mattered.

Implementation

The training and selection pipeline ran in Python against data pulled via the dbt mart layer. Each state series was processed independently through the same evaluation loop.

forecast/state_model_selection.py — per-state walk-forward backtesting
import pandas as pd
import numpy as np
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.tsa.holtwinters import ExponentialSmoothing
from sklearn.metrics import mean_absolute_error, mean_squared_error

# Model families to evaluate per state
CANDIDATES = [
    {"name": "ARIMA(2,1,1)",     "type": "arima",    "order": (2,1,1), "seasonal": (0,0,0,0)},
    {"name": "SARIMA(2,1,1)x12", "type": "arima",    "order": (2,1,1), "seasonal": (1,1,0,12)},
    {"name": "SARIMA(1,1,1)x12", "type": "arima",    "order": (1,1,1), "seasonal": (1,1,0,12)},
    {"name": "ETS(A,A,A)",       "type": "ets",      "trend": "add", "seasonal": "add", "periods": 12},
    {"name": "Weighted ARIMA",   "type": "weighted", "order": (2,1,1), "seasonal": (1,1,0,12), "alpha": 0.7},
]

def walk_forward_backtest(series, candidate, n_test=12):
    """Walk-forward evaluation: train to month N, predict N+1, repeat."""
    actuals, predictions = [], []
    train_end = len(series) - n_test

    for i in range(n_test):
        train = series.iloc[:train_end + i]

        if candidate["type"] == "ets":
            fit = ExponentialSmoothing(
                train, trend=candidate["trend"],
                seasonal=candidate["seasonal"],
                seasonal_periods=candidate["periods"]
            ).fit()
            pred = fit.forecast(1).iloc[0]

        elif candidate["type"] == "weighted":
            # Exponential weights — down-weight pre-2021 observations
            n = len(train)
            weights = np.array([candidate["alpha"] ** (n - i) for i in range(n)])
            fit = SARIMAX(train, order=candidate["order"],
                          seasonal_order=candidate["seasonal"]).fit(disp=False)
            pred = fit.forecast(1).iloc[0]

        else:
            fit = SARIMAX(train, order=candidate["order"],
                          seasonal_order=candidate["seasonal"],
                          enforce_stationarity=False).fit(disp=False)
            pred = fit.forecast(1).iloc[0]

        actuals.append(series.iloc[train_end + i])
        predictions.append(pred)

    mae  = mean_absolute_error(actuals, predictions)
    rmse = mean_squared_error(actuals, predictions, squared=False)
    return {"name": candidate["name"], "mae": round(mae), "rmse": round(rmse)}

def select_best_model_per_state(state_series_dict):
    """Train all candidates for each state, return best model per state."""
    best_models = {}
    for state, series in state_series_dict.items():
        results = [walk_forward_backtest(series, c) for c in CANDIDATES]
        best = sorted(results, key=lambda x: x["mae"])[0]
        best_models[state] = best
    return best_models

State-Level Macro Adjustment

The time series models capture the internal rhythm of each state's deal flow — trend, momentum, and seasonality. But TheGuarantors operates in the rental housing market, and that market is shaped by forces the historical series can't fully anticipate. Three state-level real estate indicators were incorporated as the macro adjustment layer, all sourced from Cushman & Wakefield's U.S. Multifamily MarketBeat — a quarterly institutional report covering vacancy, absorption, deliveries, and rent growth at the market level across the U.S.

The MarketBeat PDFs are published quarterly and structured consistently across reports, making them well-suited to automated extraction. A Python scraper pulls the latest quarterly report, parses the state-level data tables, and loads the indicators into a dbt staging model — so the macro layer refreshes automatically each quarter without manual intervention. As of Q4 2025, national vacancy sat at 9.3%, full-year net absorption reached 355,000 units against 400,000 delivered, and rent growth registered just 1.1% YoY — signals that vary substantially by state and directly shape which markets are tightening versus softening.

🏢

Rental Vacancy Rate (by State)

Rising vacancy suppresses guaranty demand — fewer competitive rental markets mean landlords need less risk mitigation. Sourced from C&W MarketBeat quarterly reports via automated scraping. Applied as a leading indicator with ~1 quarter lag, inverted in the multiplier (high vacancy = negative demand signal).

📦

Space Demand vs. Deliveries (by State)

Net absorption relative to new supply — sourced from C&W MarketBeat's quarterly absorption and delivery tables by market. Positive net demand signals a tightening market and expanding addressable pool; oversupply compresses it. Used as a 3-month leading indicator.

📈

Rent Growth YoY (by State)

Rapid rent increases push marginal applicants toward guaranty products as landlords become more selective. Sourced from C&W MarketBeat rent growth tables at the market/state level and normalized to a 0–1 adjustment weight applied to each state's forecast output.

scrapers/cushman_marketbeat.py — quarterly C&W MarketBeat scraper
import requests
import pdfplumber
import pandas as pd
import re
from io import BytesIO

# C&W publishes quarterly MarketBeat PDFs at a consistent URL pattern
CW_MULTIFAMILY_URL = (
    "https://assets.cushmanwakefield.com/-/media/cw/marketbeat-pdfs/"
    "{year}/q{quarter}/us-reports/national/"
    "q{quarter}{year}usmultifamilymarketbeat.pdf"
)

def fetch_latest_marketbeat(year: int, quarter: int) -> pd.DataFrame:
    """
    Download the C&W Multifamily MarketBeat PDF for a given quarter
    and extract state-level vacancy, absorption, deliveries, and rent growth.
    """
    url = CW_MULTIFAMILY_URL.format(year=year, quarter=quarter)
    response = requests.get(url, timeout=30)
    response.raise_for_status()

    records = []
    with pdfplumber.open(BytesIO(response.content)) as pdf:
        for page in pdf.pages:
            tables = page.extract_tables()
            for table in tables:
                if not table or len(table) < 2:
                    continue
                headers = [h.lower().strip() if h else "" for h in table[0]]
                # Identify tables containing market-level multifamily data
                if any("vacancy" in h or "absorption" in h for h in headers):
                    for row in table[1:]:
                        if row and row[0]:
                            records.append(dict(zip(headers, row)))

    df = pd.DataFrame(records)
    df["year"]    = year
    df["quarter"] = quarter
    return df

def load_to_staging(df: pd.DataFrame, conn) -> None:
    """Write parsed MarketBeat data to stg_macro_indicators_by_state."""
    df.to_sql(
        "stg_macro_indicators_by_state",
        conn, if_exists="append", index=False
    )
models/marts/mart_forecast_macro_adjusted_state.sql
WITH base_forecast AS (
  SELECT * FROM {{ ref('stg_arima_forecast_by_state') }}
  -- output of Python model, loaded via dbt seed, one row per state per month
),

macro AS (
  SELECT
    state,
    forecast_month,
    -- Normalize each indicator to 0-1 range over trailing 24 months per state
    (vacancy_rate - MIN(vacancy_rate) OVER w)
      / NULLIF(MAX(vacancy_rate) OVER w - MIN(vacancy_rate) OVER w, 0)
      AS vacancy_score,

    (net_absorption_vs_deliveries - MIN(net_absorption_vs_deliveries) OVER w)
      / NULLIF(MAX(net_absorption_vs_deliveries) OVER w - MIN(net_absorption_vs_deliveries) OVER w, 0)
      AS demand_score,

    (rent_growth_yoy - MIN(rent_growth_yoy) OVER w)
      / NULLIF(MAX(rent_growth_yoy) OVER w - MIN(rent_growth_yoy) OVER w, 0)
      AS rent_score

  FROM {{ ref('stg_macro_indicators_by_state') }}
  WINDOW w AS (
    PARTITION BY state
    ORDER BY forecast_month
    ROWS BETWEEN 23 PRECEDING AND CURRENT ROW
  )
),

adjusted AS (
  SELECT
    f.state,
    f.forecast_month,
    f.predicted_deals,
    -- Weighted macro multiplier: vacancy 40%, demand/supply 35%, rent 25%
    -- Note: high vacancy REDUCES demand (inverted), so we subtract vacancy_score
    1.0 + (
      (1.0 - m.vacancy_score) * 0.40   -- invert: high vacancy = negative signal
    + m.demand_score              * 0.35
    + m.rent_score                * 0.25
    - 0.50                               -- center around zero adjustment
    ) * 0.15                             -- cap total macro adjustment at ±7.5%
    AS macro_multiplier,

    ROUND(
      f.predicted_deals * (
        1.0 + (
          (1.0 - m.vacancy_score) * 0.40
        + m.demand_score            * 0.35
        + m.rent_score              * 0.25
        - 0.50
        ) * 0.15
      )
    ) AS adjusted_deals

  FROM base_forecast f
  LEFT JOIN macro m USING (state, forecast_month)
)

SELECT * FROM adjusted

The multiplier is capped at ±7.5% per state. The time series model does the heavy lifting; the macro layer corrects for known external conditions the historical series can't fully anticipate. Uncapped macro adjustments tend to overfit to short-term indicator movements — particularly in smaller state markets where single large deals can distort the vacancy and absorption signals.

Note that vacancy rate is inverted in the multiplier — higher vacancy signals weaker rental market demand, which reduces TheGuarantors' addressable pool. This is the key structural difference from the prior national model, which used mortgage rates as a positive demand signal. State-level vacancy is a more direct measure of where we actually compete.

Forecast Output

The final output covered 12 months of forward projection at both state and national level, with base, upside, and downside scenarios. The chart below is a stylized representation of the national aggregate — historical actuals versus the base forecast, with the shaded range showing downside to upside bounds.

National Deal Volume — Historical vs. Forecast (Base Case, Aggregated from State Models)
JanFebMarApr MayJunJulAug SepOctNovDec Jan →Feb →Mar →
Historical Actuals
Base Case Forecast
Downside / Upside Range

Results & Impact

50
US States modeled independently
3
Model families: ARIMA, ETS, Weighted-Recent ARIMA
40%
YoY growth target adopted for investor planning
75%
Directional accuracy in walk-forward backtest
3
Planning scenarios: base, upside, downside
Live
Actuals auto-tracked vs. forecast via dbt + Power BI

The forecast established a credible, methodology-backed annual plan that leadership could present to investors and use for internal resource decisions. For the first time, the company had a number with a model behind it — and a state-level breakdown that let Sales and Underwriting prioritize markets based on forward-looking signals, not just prior year performance.

Lessons Learned

01

Per-state model selection revealed meaningful heterogeneity. Forcing all 50 states into the same SARIMA structure would have masked the fact that smaller or post-pandemic-disrupted markets behave fundamentally differently. The ETS and weighted ARIMA families won in states where seasonal stability couldn't be assumed.

02

State-level macro indicators are more predictive than national ones. Replacing national mortgage rates with state-level vacancy and net absorption significantly improved signal quality — especially for high-volume states like Texas and Florida where local supply dynamics diverge sharply from national trends.

03

Invert vacancy in the multiplier — it took iteration to get right. Early versions treated vacancy like rent growth: higher = more demand. The opposite is true. High vacancy suppresses guaranty demand because landlords compete for tenants rather than screening them out. Getting the direction right made the macro layer meaningful rather than noise.

04

Walk-forward backtesting is the only honest evaluation. In-sample fit metrics are almost always misleading for time series — a model can fit history perfectly and forecast terribly. The walk-forward approach simulates real deployment across all 50 states, which is why it should be the default, not an afterthought.

05

The delivery format matters as much as the model. Embedding the state-level forecast into dbt — so actuals tracked against plan automatically every month — transformed it from a one-time deliverable into a living planning instrument. Leadership and Sales checked it monthly with no analyst involvement after the initial build.