▸ METHODOLOGY · v1.0 · APRIL 2026

How we grade financial influencers.

GuruScope is an accountability engine, not a brokerage ledger. Every score on this site follows the rules below — published in full so you, journalists, and compliance teams can audit them. Methodology updates are dated in section 14.

SEE THE LEADERBOARD →JUMP TO SIG TEST →

▸ 01

Data collection

We ingest content from public-facing channels using each platform's standard public API or syndicated RSS feed. As of v1.0 we cover:

YouTube — channel uploads via RSS + yt-dlp resolution. Transcripts via the official transcript API with a Webshare proxy fallback.
Twitter / X — Apify apidojo~tweet-scraper when the operator has a paid Apify plan with residential proxies. Otherwise stubbed.
Blog / RSS — feed parsing via feedparser.
TikTok & podcasts — adapters exist but are not enabled in v1 because audio transcription cost is not yet justified by the data quality.

Each guru's content is re-scanned weekly via a cron job (Sundays 02:00 UTC). New content is appended; existing predictions are re-verified against fresh market data.

▸ 02

Prediction extraction

Transcripts and posts are passed to a Gemini 2.5 Flash extractor (or Claude Sonnet 4.6 when configured). The model returns a strict JSON array of structured claims with:

claim, asset, exact_quote (verbatim, used for citation)
recommendation_type — buy / sell / hold / sector_rotation / avoid / general_macro
time_horizon — intraday / weeks / months / 1-2_years / 5+_years / unspecified
confidence_level — high / medium / low (drives calibration scoring)
disclosures[] — array of {type, quote} tagging holds_position, paid_promo, disclaimer, conflict, or no_disclosure

We require exact_quote on every prediction so any reader can verify our paraphrase against the source. Predictions without a verifiable quote are rejected.

▸ 03

Entry price rule

Entry price for any traded asset is the closing price on the publish date of the source content. This is the same convention used by academic studies of newsletter performance and by competing trackers (TrueAlpha, CXO Advisory, etc.). No look-ahead.

▸ 04

Evaluation horizons

Every prediction is scored at four horizons in parallel:

Stated — the creator's own time horizon (default for credibility scoring; charitable)
90 / 180 / 365 daysfrom publish — fixed windows that prevent the "my 10-year thesis hasn't played out yet" dodge

Leaderboard alpha defaults to the stated horizon; users can switch via the horizon dropdown to compare. Horizons that haven't elapsed (e.g. 365d for a prediction made 30 days ago) return null rather than partial values.

▸ 05

Benchmark selection

Default benchmark is the S&P 500 (^GSPC). Sector-specific ETFs are used when the underlying asset matches a known sector taxonomy:

Energy → XLE, Mining/Gold → GDX, Tech → XLK, Financials → XLF, Healthcare → XLV, Consumer → XLY, Real estate → XLRE
Crypto → BTC-USD (S&P 500 used as secondary cross-check)
Commodities → GSG or futures contract directly

▸ 06

Alpha calculation

Alpha is the simple difference between asset return and benchmark return over the same window:

alpha_pct = ((asset_close_T - asset_close_0) / asset_close_0
            - (bench_close_T - bench_close_0) / bench_close_0) * 100

Example: BTC bought at $42,000 on 2025-01-01, $48,000 on 2025-04-01 = +14.3%
         S&P 500 over same period = +6.1%
         alpha = +14.3% - 6.1% = +8.2%

We do not annualise. Reported alpha is the realised excess return at each horizon.

▸ 07

Directional accuracy

Predictions resolve to one of four outcomes:

Correct — directional call (buy/sell) confirmed by >5% move in the predicted direction within the horizon
Wrong — directional call contradicted by >5% move in the opposite direction
Partial — small move (within ±5%), or right direction but missed magnitude target. Counts as 0.5 in accuracy
Pending — horizon hasn't elapsed yet

Qualitative claims (e.g. "the Fed will pause") that aren't directly tied to a tradeable asset are verified via Tavily web search rather than market data — same correct/wrong/pending buckets.

▸ 08

Risk adjustment

Three risk metrics per prediction, aggregated per guru:

Max drawdown — worst peak-to-trough decline of the recommended asset during the holding period
Volatility — standard deviation of returns across the guru's predictions
Approximate Sharpe — avg_alpha / volatility (zero risk-free rate assumed for simplicity)

The risk_adjusted_scoreshown on profiles is the guru's risk_adjusted_alpha (avg_alpha / |max_drawdown|) scaled to 0-100. A guru who picks volatile penny stocks scores lower than one who picks steady compounders, even at the same alpha.

▸ 09

Statistical significance

A guru with 3 predictions at 100% accuracy is statistically indistinguishable from a coin flip. Without a sample-size gate, the leaderboard would be dominated by lucky outliers. We run two tests:

Primary — one-sample t-test on alpha

from scipy import stats
t_stat, p_value = stats.ttest_1samp(alpha_values, popmean=0.0)
significant = p_value < 0.05

Secondary — bootstrap 95% CI on mean alpha (handles fat-tailed return distributions where the t-test can mislabel during tail events)

# 5000 resamples with replacement
ci_lower = percentile(bootstrap_means, 2.5)
ci_upper = percentile(bootstrap_means, 97.5)
positive_alpha_confirmed = ci_lower > 0

Sample-size tiers

N/A — fewer than 5 verified predictions. Shown on profile pages only.
PRELIM — 5-19 verified. Eligible for leaderboard but tagged as preliminary.
SIG — passes both t-test (p<0.05) AND bootstrap CI lower bound > 0.

The default leaderboard view is exploratory and shows all gurus. The "Sig only" toggle and "N≥20 (evaluable)" filter let users gate by methodology rigour.

▸ 10

Confidence calibration

Confidence calibration measures whether a guru's stated confidence (high/medium/low) tracks their actual hit rate. We use a Brier-style decomposition: bucket predictions by stated confidence, compute realised accuracy per bucket, then penalise the gap between expected accuracy (e.g. 80% for "high confidence") and observed.

A guru who's right 80% of the time on "high confidence" calls and 50% on "low" scores 100. A guru who uses "high confidence" on every call regardless of outcome scores 0.

▸ 11

Disclosure detection

For every recommendation, the extractor scans the source content for explicit disclosures:

holds_position — creator says they own the asset
paid_promo — sponsored content, affiliate link, brand deal
disclaimer — generic "not financial advice" or similar
conflict — financial relationship with the asset issuer
no_disclosure — recommendation made with no disclosure of any kind

The aggregate disclosure_qualityscore is the percentage of recommendations accompanied by at least one disclosure of any type. We treat undisclosed buy/sell calls as a structural red flag, not just stylistic preference — this aligns with FINRA's 2024 finfluencer enforcement actions (M1 Finance $850K, TradeZero $250K, Moomoo $750K).

▸ 12

Deletion policy

Each weekly re-scan compares live content IDs against our cache. Missing items are marked deleted_at = NOW() in our DB but their cached extracted predictions are preserved. Profile pages show deleted predictions with a strikethrough + accent "deleted" tag, citing the original URL even though it returns 404.

We do not host the original video, transcript, or screenshot. Only our structured prediction record persists. This is consistent with fair-use research/criticism.

▸ 13

Known limitations

Transcript imperfection — auto-captioned YouTube transcripts can mishear numbers and ticker symbols. We mitigate by requiring the model to quoteexact_quote; visibly garbled quotes are flagged for review.
Context loss— extraction can miss qualifiers ("if the Fed pauses, then BTC to $100k"). We're improving by passing larger windows of surrounding text.
Survivorship in the source— if a creator deletes wrong calls before our first scan, we never see them. Deletion tracking only works on content we've previously cached.
Benchmark mismatch — sector ETF assignment is heuristic. A creator recommending a small-cap miner gets benchmarked against GDX, which may understate or overstate their stock-picking edge.
We are not a brokerage ledger — predictions are statements made publicly, not actual trades. We measure what they said, not what they did. A creator could outperform privately while looking bad here, and vice versa.

▸ 14

Change log

v1.02026-04-16

Initial public methodology. v3 scoring (9 dimensions) + v4 additions (statistical significance via t-test + bootstrap, dual-horizon scoring at 90/180/365 + stated, sample-size tiers, leaderboard sig gates).

v0.92026-04 INTERNAL

Cross-platform deduplication via TF-IDF cosine similarity. Conviction signal surfaces predictions repeated across multiple platforms or videos.

v0.82026-04 INTERNAL

Soft-anchored qualitative prompts (5-point scale for risk_awareness and survivorship_honesty). Removed unfair zero-anchoring penalty.

Questions about a specific score, or want to dispute a prediction extraction? Use the report button on any guru profile, or email methodology@guruscope.com. Disputes are reviewed manually; we do not auto-approve creator edits.