LongDong is a fully transparent, multi-signal model that
scores every batter-pitcher matchup on the daily MLB slate and ranks them by home run
probability. Nothing like this exists elsewhere.
Traditional HR prediction is either a single stat (HR/FB matchup) or a black-box algorithm.
LongDong combines six regression-weighted base factors, real-time Statcast rolling windows,
pitch-level arsenal analysis, weather integration, bullpen vulnerability, and a conviction
layer that lets you override the model — all in a single ranked slate where every
number is explained.
1 The Base Score
Every matchup starts with a base score computed from a weighted combination of features.
Weights are calibrated by logistic regression on historical matchup outcomes,
and are refit weekly by the closed-loop optimizer.
Weights live: v3.5
· updated 2026-05-14
· sorted by impact (descending)
Isolated power — extra-base hit ability stripped of singles and walks.
How often the batter hits at ideal exit velocity + launch angle. Strongest batted-ball HR predictor.
Stadium HR-friendliness via 21-day rolling blend. Coors (1.32) vs. Oracle (0.83) is nearly 60% different.
Fielding-independent pitching — captures pitcher vulnerability beyond HR/FB.
HRs over the prior 14 days — continuous hot-streak signal. Captures momentum the categorical form badge can't.
Cumulative season HR rate — captures established power × playing time. Standalone AUC 0.624 in 2025 backtest.
Geometric mean of batter HR/FB × pitcher HR/FB × park factor.
Raw power signal. Higher EV = harder contact = more distance on fly balls.
Pull tendency — pulled fly balls travel farther, especially in short-porch stadiums.
When FIP exceeds xFIP, the pitcher is allowing more HRs than expected — a regression signal.
Each factor is normalized to 0–1 and multiplied by its weight. The weighted sum scales to 0–100.
New in v3.5 (May 2026):
Added Season HR Rate and
Recent HR Density (14d) to
the model after a backtest against ~47K historical matchups showed
they add real predictive lift on top of the existing features
(+0.0091 AUC, +15% relative improvement on top-5 hit rate).
These features capture two intuitions the rate-stat model was
missing: established power × playing time (high-volume
regulars beat part-time platoon bats with the same per-AB rate),
and continuous hot-streak intensity (a continuous version
of the form badge). Weekly optimizer will tune their exact weights
going forward.
Why are some features at zero?
The optimizer recently zeroed out
Pull%, FIP–xFIP Gap —
their predictive signal was already captured by stronger features (a sign of correlated inputs,
not that the underlying stat doesn't matter). Refit weekly; if the data changes, so will the weights.
2 Contextual Patches
Transparent, auditable adjustments on top of the base score. Every patch that fires shows up
in the matchup overlay's score trace.
14-day rolling Statcast window classifies form as HOT (+6), WARM (+3), NEUTRAL (0),
COOL (-3), or COLD (-6). Surprise Barrel bonus (+4)
if recent barrel rate spikes to 1.5× season rate.
Strongest individual signal (r=0.139). When a batter crushes a pitch type that a
pitcher is getting shelled on at high usage: +3 (score ≥ 8), +5 (score ≥ 15 with 2+ exploitable pitch types).
Wind out ≥ 7 mph + batter's pull/center tendency aligned. Pull/center ≥ 85% → +6,
≥ 75% → +4, ≥ 65% → +2.
2+ barrels in last 3 games + HR in last 4 + pull/center ≥ 75% + wind ≥ 4 mph.
Recent form converging with conditions.
ISO ≥ .230 + pitcher vulnerable same-side (≥ 2 HRs in 40 PA) + recent barrel quality
in last 7 days.
Starter < 5.2 IP expected + bullpen HR/9 ≥ 1.3 → +2. Bullpen HR/9 ≥ 1.6
(hemorrhaging) → +3.
Model score 72-79 + sportsbook odds ≥ +700 + strong barrel profile + wind support.
Flags potential market inefficiencies.
Low-ISO batters (< .155) with 4+ HRs in last 100 PA are capped at 68.
Exception: lifts to 73 with recent barrel quality (2+ barrels, 2.0+ HH/FB in 7 days).
Oracle Park, Comerica Park, and Nationals Park with calm wind (< 4 mph). Exception: strong
spray alignment (pull/center ≥ 65%) and high fly-ball rate (≥ 0.7/game) can override.
3 How the Score Comes Together
An example trace for a matchup — exactly what you see in the overlay.
Base Score
71
—
Contact Quality (🔥 HOT)
77
+6
Arsenal Edge
80
+3
Dead Park Penalty
77
-3
Power Mirage Cap
—
not triggered
Final Score
77
B
Every one of these steps is visible when you click any row on the slate. Nothing is hidden.
4 Tiers
The final score (0–100) determines the matchup tier.
S
95–100
Rare air — exceptional alignment across power, context, and conditions. Typically 0–3 per slate.
A+
90–94
Near-elite alignment. One factor shy of perfect, still among the strongest plays of the day.
A
80–89
Strong matchup with multiple supporting factors. The meat of a well-built lineup.
B
70–79
Favorable matchup with clear upside. Solid core plays that don't need much extra conviction.
C+
60–69
Decent structural matchup worth considering. One more positive factor can push these into play.
C
50–59
Weaker alignment — limited upside. Needs strong outside conviction to justify.
D
< 50
Structurally unfavorable. The numbers say stay away unless you know something the model doesn't.
5 Reports
The reports surface the same data that feeds the scoring engine — raw evidence behind the numbers.
6 Player Profiles
Every batter and pitcher in the system has a dedicated profile page.
⚾ Batter Profiles
Season power stats, HR calendar heatmap, rolling form signals, handedness splits, and
every scored matchup — past and present. Links to today's matchup overlay when active.
🎯 Pitcher Profiles
Vulnerability metrics (HR/FB, HR/9, FIP, xFIP), season HR log, and a
Today's Matchups card — every batter facing this
pitcher, ranked with scores, tiers, and signal flags.
7 The Conviction System
Your layer on top of the model.
The model gives you a score. But you might know something it doesn't — maybe a pitcher just
came off the IL with diminished velocity, or a batter changed his swing mechanics. The
conviction system in the matchup overlay lets you adjust individual stats
up or down (Bullish / Bearish) and watch the score move in real time. Tag batters in the
My Batters watchlist with report-driven signals (Arsenal Edge, Unlucky Barrels,
Splits Edge) to track which edges you're following.
8 Data Pipeline
The model runs daily, pulling from five data sources before first pitch.
📊
FanGraphs
Season stats for batters and pitchers. ISO, Barrel%, FIP, xFIP, HR/FB. Also bullpen stats by team.
🛰️
Baseball Savant
Statcast pitch-level data — exit velocity, barrel rates, form signals, and the full arsenal analysis system.
⚾
Sportradar
Daily schedules, probable pitchers, confirmed lineups, roster status, and season HR totals.
🌤️
OpenWeatherMap
Real-time wind speed, direction, and temperature — converted to HR modifiers by wind bearing relative to center field.
💰
DraftKings
HR prop odds for value detection (High-Odds patch) and display on the slate.
9 Track Record
Every matchup scored, tracked, and graded against actual HR outcomes.
The Track Record page
shows daily and rolling hit rates, broken down by tier — so you can see whether S-tier
matchups actually produce HRs more often than lower tiers. A healthy model shows a clear
gradient from S down to D.
Public performance language is now defined separately in the
methodology note:
pick-hit rate, capture rate, official morning snapshots, live research metrics,
replay estimates, and prime-card hit rate.
The weight optimizer uses this same results data to continuously improve via logistic
regression — the model literally learns from its own outcomes. When the optimizer runs,
it rebalances how much each factor contributes based on what's actually been predicting
HRs this season.
Try It Yourself
This whole system — multi-signal scoring, transparent patches, pitch-level arsenal analysis,
the conviction layer, self-correcting optimizer — was built from the ground up. Every number
is explained, every adjustment is visible, and the track record is public. That's the point.