Skip to main content
Baseball Predictor

Frequently Asked Questions

Predictions, edge, confidence, P&L, and the difference between the Claude LLM model and the gradient-boosted statistical model — every term, every chip, and every number you see on the site, defined and explained. The site does not take wagers; it is an analytics tool. Pick a category below to expand it.

Jump to: Getting Started · Terminology · Models · Track Record · Disclaimers

Getting Started

What is this site?

An analytics tool that generates daily MLB game predictions (moneyline, run line, and over/under) using two independent engines: a Claude-based LLM that reasons over assembled game context, and a gradient-boosted statistical model trained on historical MLB data. Both produce a pick, a confidence score, and a list of supporting factors.

How it’s calculated: Picks are generated daily, graded after each game finishes, and tracked on the Track Record page so you can see how each model performs over time.

Is this real-money betting?

No. The site does not accept wagers, hold money, partner with a sportsbook, or facilitate placing bets. It is purely an analytics tool. If you choose to bet on the predictions, you do so at a sportsbook of your own choosing, in a jurisdiction where it is legal, and at your own risk.

How it’s calculated: See the Disclaimers section for responsible-gambling resources.

How do I read a prediction card?

Every card shows: the matchup (away @ home, with team abbreviations), the model's pick (e.g., "NYY moneyline" or "Over 8.5"), a numeric confidence chip with a color ramp (see Confidence), an edge chip (omitted when odds were unavailable), the model name with an LLM or ML source-type chip, and a one-line key factor summarising the strongest driver.

How it’s calculated: Tap the info icon next to Confidence or Edge for a quick definition; each tooltip also links to the matching FAQ entry.

Where does the data come from?

Schedules and live game state come from a sports data API. Odds (moneyline, run line, over/under) come from an odds aggregation API. Team and pitcher statistics come from MLB Stats API. Beat-reporter news comes from RSS feeds. Historical games used to train the statistical model are backfilled from MLB Stats API.

How it’s calculated: All data is refreshed automatically on a schedule throughout the day.

How often are predictions updated?

The pipeline runs on a schedule throughout the day. Each run for a given game produces a new prediction version (see Latest version); the Track Record only counts the most recent version, but the game detail page shows the full timeline of versions for transparency.

How it’s calculated: Grading runs once daily after games finish.

What does the LLM / ML chip on a card mean?

It tells you which engine produced the pick. LLM means a Claude language model (see LLM prediction); ML means the statistical machine-learning model (see ML prediction). When both engines clear the qualification bar on the same game, you'll see two cards — one per engine.

How it’s calculated: The chip color matches the existing source styling used on the game detail page.

Terminology & Numbers

Edge

The difference between the model's estimated win probability and the market-implied probability derived from the odds. Positive edge means the model thinks the price is favorable to the bettor; negative edge means unfavorable. A NULL edge on a card means odds were unavailable when the prediction was generated.

How it’s calculated: edge = model_probability − implied_probability_from_odds. Worked example: at -150, implied probability is 150 / (150 + 100) = 60%; if the model gives the favorite a 65% win chance, edge = 0.65 − 0.60 = +5%.

Confidence

The model's reported strength of conviction in a pick, on a 1–100% scale. Confidence is not a guaranteed win rate — it's the model's signal strength. Higher-confidence picks should win more often than lower-confidence ones over time, but any individual pick can lose.

Color ramp on every card: gray under 50, amber 50–59, lime 60–69, green ≥ 70.

How it’s calculated: Each model produces a raw confidence; a learning loop then calibrates it per band, per bet type, and per source using historical accuracy. Calibration factors are recomputed after graded predictions accumulate.

P&L (Profit & Loss)

Cumulative profit or loss across graded predictions, expressed in units (see Units). Surfaced on the Track Record overall summary and the by-band, by-bet-type, and by-source breakdowns.

How it’s calculated: Assumes 1u flat staking at -110 default odds when actual odds are unavailable: a winning -110 bet returns +0.909u, a losing bet returns -1.000u, a push returns 0u, a void is excluded from the sum. When actual odds are recorded for a graded prediction, those odds are used in place of -110.

Win rate

Percentage of graded predictions that won. Surfaced alongside P&L on the Track Record page.

How it’s calculated: win_rate = wins / (wins + losses) × 100. Pushes and voids are excluded from both numerator and denominator.

ROI (Return on Investment)

P&L expressed as a percentage of total units wagered. Lets you compare performance across filters with different sample sizes (e.g., "green-band picks have +12% ROI vs. +3% for amber").

How it’s calculated: ROI = (P&L_units / total_units_wagered) × 100. Example: +12.5u P&L on 100u wagered = 12.5% ROI.

Units / 1u / 0.5u

A standardized stake size used to measure performance independent of dollar amount. Bettors typically size 1u as 1–2% of bankroll, but the site does not recommend bet sizing — units are a measurement convention only.

How it’s calculated: Every prediction on the Track Record is staked at 1u for P&L purposes. Fractional unit sizes (e.g., 0.5u, 1.5u) are not used by the site's grading; they are only meaningful if you choose to size your own real bets that way.

Moneyline (ML)

A bet on which team wins outright; no point spread. Displayed in American odds: a negative number is the favorite, a positive number is the underdog.

How it’s calculated: -150: bet 150 to win 100. +130: bet 100 to win 130. Implied probability for favorites: |odds| / (|odds| + 100). For underdogs: 100 / (odds + 100).

Run Line (RL)

Baseball's version of a point spread, fixed at -1.5 / +1.5. The favorite must win by 2+ runs to cover; the underdog wins outright or loses by 1.

How it’s calculated: Outcome rule: favorite at -1.5 wins the bet iff final_favorite − final_underdog ≥ 2. Underdog at +1.5 wins iff final_favorite − final_underdog ≤ 1.

Over / Under (O/U)

A bet on the total combined runs scored by both teams versus a half-run line (e.g., 8.5). The half-run prevents pushes — the total is always strictly over or under.

How it’s calculated: Over 8.5 wins iff final_away + final_home ≥ 9. Under 8.5 wins iff total ≤ 8.

Pick

A single model's predicted side on a single bet type for a single game (e.g., "NYY moneyline" or "Over 8.5"). Each pick has its own confidence and edge.

How it’s calculated: No formula; this is a definition only. A given game has up to one pick per (bet_type, model) — three bet types times the number of active models.

Top Picks of the Day

Home-page section showing the strongest individual model picks for today's slate. Multiple models on the same game produce separate cards — there is no consensus or agreement aggregation on this surface (V10).

How it’s calculated: Eligibility: every published, latest prediction with confidence > 60 (strict, not ≥) on a game whose status is scheduled or pregame. Sorted by edge DESC NULLS LAST, then confidence DESC, then game_time ASC NULLS LAST; LIMIT 5.

Best Bets (Track Record)

A ranked list of historical picks on the Track Record page using the V9 score formula. Shows which picks combined high confidence with high edge — useful for spotting patterns in past performance.

How it’s calculated: score = (confidence / 100) × edge, sorted DESC. Default floors apply: min_confidence ≥ 50 and min_edge ≥ 0.05.

Score (V9)

A combined confidence-and-edge ranking metric used by Best Bets on the Track Record page. Higher score means the model is both confident and disagrees with the market. The score chip was removed from Top Picks of the Day in V10; it lives on Track Record only.

How it’s calculated: score = (confidence / 100) × edge. Example: confidence 70%, edge +0.08 → score = 0.70 × 0.08 = 0.056, displayed to 3 decimal places.

Range filter (confidence range)

The numeric Min / Max confidence inputs on the Track Record page (V9). Replaces the older High / Medium / Low tier filter with a more granular 0–100 range.

How it’s calculated: Server clamps both inputs to the 0–100 range. If the user types min > max, the server silently swaps them and renders an inversion notice on the page; clamping out-of-range values renders a similar clamp notice.

Min edge

A minimum-edge floor on the Track Record page. Predictions whose edge is below the floor are excluded from the rendered results. The default for Best Bets is 0.05 (5%).

How it’s calculated: Filter rule: edge >= min_edge. Top Picks of the Day (V10) has no min-edge floor — it surfaces every qualifying pick regardless of edge.

Period filter

Time-window selector on the Track Record page: Last 7 / Last 30 / Season / Custom. Custom opens a date-from / date-to range pair.

How it’s calculated: Filters graded predictions by their associated game date within the chosen window. Last 7 = the previous 7 calendar days; Season = the current MLB regular season.

Bet Type filter

Restricts the Track Record to a single bet type: Moneyline, Run Line, or Over-Under (or All to clear the filter).

How it’s calculated: Direct equality match on the prediction's bet_type column.

Source filter

Filters predictions by engine: All / AI Analysis (LLM) / Statistical (ML). Useful for comparing each engine's stand-alone Track Record.

How it’s calculated: Direct equality match on the prediction's source column (llm or statistical).

Prediction Model

A named prediction model with a source type (LLM or ML) and its own calibration. Visible as a name + LLM/ML chip on each prediction card and as a filter dropdown on the Track Record and Top Picks pages.

How it’s calculated: Multiple LLM and ML configurations can be active at the same time — every active model produces its own prediction for every game on every bet type.

Confidence Band (V9)

Five-percentage-point buckets used on the Track Record By Confidence Band breakdown: 0–44, 45–49, 50–54, 55–59, 60–64, 65–69, 70–74, 75–100. Replaced V8's High / Medium / Low tier breakdown with a uniform numeric view.

How it’s calculated: Each graded prediction is bucketed by its calibrated confidence value at grading time; per-band win rate, P&L, and ROI are aggregated.

Push

Result exactly matches the line. Stake is returned; the bet counts as neither a win nor a loss.

How it’s calculated: No calculation; this is a definition only. Pushes are excluded from win-rate denominators and contribute 0u to P&L.

Void

Bet cancelled due to external circumstances (game cancelled, suspended past the cut-off, or sportsbook pitcher-change rules). Stake is returned.

How it’s calculated: No calculation; this is a definition only. Voids are excluded entirely from Track Record statistics.

Implied Probability

The probability of an outcome implied by the odds. The Edge calculation compares the model's probability against this number to decide whether the price offers value.

How it’s calculated: Favorites (negative odds): |odds| / (|odds| + 100). Underdogs (positive odds): 100 / (odds + 100). Worked examples: -150 → 60%; +130 → 43.5%.

Graded

A prediction whose game has finished and whose outcome (win / loss / push / void) has been recorded. Only graded predictions count toward Track Record statistics. The home page Top Picks list shows ungraded picks (scheduled and pregame games only).

How it’s calculated: Grading runs once daily after games finish. Voids are excluded from Track Record entirely; pushes count toward the row count but contribute 0 to P&L.

Latest version / Version history

A single game can have multiple prediction versions if the pipeline re-ran during the day (V6). Track Record uses only the latest version (is_latest = TRUE) for official P&L and accuracy. The game detail page shows the full version timeline so you can see how a pick or its confidence shifted as new context arrived.

How it’s calculated: Each pipeline run for a given (game, bet_type, model) creates a new row with a fresh version number; the new row is flagged is_latest = TRUE and the prior row's flag is flipped to FALSE.

How the Predictions Work

What is the LLM (Claude) prediction and how does it work?

A Claude-based language model (default Sonnet; configurable per model entry — Haiku, Sonnet, or Opus) that ingests the assembled game context (schedule, odds, team and pitcher stats, recent form, news, series record) and produces a pick, a confidence score, a key factor, and 3–5 bullet points of narrative reasoning.

LLM predictions run in batch mode by default via the Anthropic Messages Batches API for ~50% cost savings on the scheduled runs. Real-time mode is available for manual triggers.

How it’s calculated: The model is prompted with a structured context document. The output schema is enforced by the prediction service so every LLM pick has the same shape as every ML pick.

What is the Statistical (ML) prediction and how does it work?

A gradient-boosted tree model (XGBoost / scikit-learn) trained on historical MLB game data backfilled from MLB Stats API. The model ingests a numeric feature vector (team and pitcher stats, recent form, market odds, contextual features) and outputs the same shape as the LLM prediction: pick, confidence, key factor, and 3–5 feature-driven bullets.

It is faster than the LLM, cheaper, and not subject to narrative bias, but it cannot incorporate qualitative factors (news, situational context).

How it’s calculated: Training is supervised on labeled historical games. Inference calls the trained artifact via the same provider interface as the LLM.

What inputs do the models use?

Both engines see: today's schedule and odds, team and starting-pitcher season-to-date stats, recent form (last 10 games), the current series record between the two teams, and venue. The LLM additionally ingests beat-reporter news articles and any qualitative context available. The ML model receives only numeric features (no text).

How it’s calculated: The same feature engineering pipeline is used for ML training and ML inference, eliminating training-serving skew.

Why might the two models disagree?

The LLM weights qualitative factors (news, situational narrative, lineup nuance) the ML model cannot see. The ML model weights base-rate patterns the LLM may underweight. Disagreement is normal and useful — neither source is "right" by default. Over time the Track Record source filter lets you compare which engine performs better in different situations.

How it’s calculated: On the game detail page, each model gets its own card so you can compare picks, confidence, edge, and reasoning side-by-side.

How does Top Picks of the Day use the models?

V10 surfaces every individual model prediction with confidence > 60 ranked by edge. There is no consensus / agreement aggregation on this surface — if two models both clear 60% on the same game and bet, both appear as separate cards. Agreement information is still visible on the game detail page.

How it’s calculated: See Top Picks of the Day for the exact ranking formula and tiebreakers.

What is calibration and why does it matter?

Raw confidence scores can be over- or under-confident compared to actual accuracy. Calibration is the process of adjusting them so that, e.g., picks reported as 70% confidence really do win about 70% of the time. The site runs a calibration loop that learns per-model, per-band, per-bet-type adjustments from past graded predictions and applies them to new picks.

How it’s calculated: Calibration factors are scoped to (model_id, bet_type, confidence_band) and updated on the daily schedule after grading.

What does multi-model mean, and why are there multiple LLM/ML configurations?

The site can run multiple model configurations side-by-side (V7). For example, a Haiku-based LLM model and a Sonnet-based LLM model can both be active and produce parallel picks for every game. Different ML configurations (different feature subsets or hyperparameters) can also be active at the same time. Each model entry has its own lifecycle, calibration, and Track Record.

How it’s calculated: Use the Prediction Model filter on Track Record or Top Picks to see only one model's picks; remove the filter to see all active models interleaved.

Track Record & History

What does the /history page show me?

Every graded prediction the site has ever produced, filterable by period, bet type, source, confidence range, min edge, team, and prediction model. Includes summary tiles (overall win rate, P&L, ROI), a By Confidence Band breakdown, a By Bet Type breakdown, a By Source breakdown, and a Best Bets ranked list.

How it’s calculated: Only the latest version of each prediction counts; pushes and voids are handled per the rules described under P&L and Win rate.

What is Best Bets?

A ranked list on the Track Record page showing the picks that combined the highest confidence with the highest edge in the selected period. Useful for spotting patterns ("green-band over/under picks have been profitable") rather than evaluating individual bets in isolation.

How it’s calculated: score = (confidence / 100) × edge, default floors min_confidence ≥ 50 and min_edge ≥ 0.05. Sorted by score DESC. See Score for details.

How is P&L calculated?

See P&L for the full formula. Short version: 1u flat staking, default -110 odds when actual odds are unavailable, pushes contribute 0u, voids are excluded.

How it’s calculated: P&L is recomputed live from the filtered set of graded predictions every time you change a filter and click Apply.

How are win rate and ROI calculated?

Win rate is wins / (wins + losses) × 100, with pushes and voids excluded from the denominator. ROI is P&L expressed as a percentage of total units wagered.

How it’s calculated: See Win rate and ROI for worked examples.

What do the filters do?

Each filter narrows the set of graded predictions used to compute the page's summary tiles, breakdowns, and Best Bets list:

As of V11, every filter lives inside one form with one Apply button — edit any combination, click Apply once, and every change is submitted together.

How it’s calculated: Filters compose with AND. The URL after Apply reflects every non-default filter; the URL is shareable and bookmarkable.

What is the Reset filters link?

A single link near the Apply button that returns to the bare /history URL — every filter goes back to its default. The link is only visible when at least one filter is non-default. (V11 replaces the prior pair of per-section Clear links with this single Reset.)

How it’s calculated: Reset is just an <a href="/history"> navigation; no JavaScript required.

What does "graded" mean and when does a prediction become graded?

See Graded. A prediction is graded once its game has finished and the outcome has been recorded against the line. Pushes and voids are also graded states with their own rules.

How it’s calculated: Grading runs once daily after games finish; the time between final and graded is typically under one hour.

What is By Confidence Band?

The Track Record's per-band breakdown using the V9 5-percentage-point buckets (0–44, 45–49, 50–54, 55–59, 60–64, 65–69, 70–74, 75–100). For each band you see row count, win rate, P&L, and ROI.

How it’s calculated: See Confidence Band.

Why do I see multiple versions of a prediction for the same game?

The pipeline can run multiple times for the same game during the day (e.g., when new odds arrive or a starting pitcher is announced). Each run produces a new version. The game detail page shows the full timeline so you can see how a pick or its confidence shifted; the Track Record uses only the latest version for official statistics.

How it’s calculated: An "Updated" badge on a game card indicates that more than one prediction version exists for that (game, bet_type, model).

How do I share a filtered view?

Just copy the URL after clicking Apply. The query string contains every filter you have set, so a recipient opening the URL sees the exact same filtered Track Record. Bookmarking works the same way.

How it’s calculated: URLs are backward-compatible across V9–V11; older bookmarks remain functional.

Disclaimers & Responsible Use

Is this real-money betting?

This site is an analytics tool. It does NOT take wagers, hold money, or facilitate bets.

Are predictions advice?

Predictions are model output, not advice. They reflect what the model inferred from the inputs available at prediction time and should not be treated as a recommendation to bet.

What does past performance mean here?

Past performance does not guarantee future results. Confidence is a model signal, not a probability of winning. Edge depends on the odds available when the prediction was generated; live odds at a sportsbook may differ.

What about responsible gambling resources?

If you choose to bet, do so legally in your jurisdiction and within your means. If gambling is becoming a problem for you or someone you know, the National Council on Problem Gambling offers a 24/7 confidential helpline and other resources (opens in a new tab).

Where can I report a bug or ask a question?

This is an analytics tool maintained for educational and research purposes. There is no public support channel at this time.

Last updated: 2026-05-09