Shoothill AI Trust Index

Live · 10 models tracked

Know which AI you can actually trust.

AI models change quietly. The one you trusted last month may already be making things up, or quietly ignoring your instructions. Shoothill AI Trust Index keeps watch, every hour, and tells you when something slips. Free for life.

Talk to our teamHourly updates · Sample methodology published

Claude Opus 4.797.0▲ 2.1GPT-5.494.0▲ 1.4Gemini 2.5 Pro93.3▲ 0.8GPT-5.4 mini90.5▼ 0.4Gemini 2.5 Flash87.8▲ 0.3Claude Haiku 4.587.5▼ 1.2Claude Sonnet 4.581.5▼ 3.8Claude Opus 4.797.0▲ 2.1GPT-5.494.0▲ 1.4Gemini 2.5 Pro93.3▲ 0.8GPT-5.4 mini90.5▼ 0.4Gemini 2.5 Flash87.8▲ 0.3Claude Haiku 4.587.5▼ 1.2Claude Sonnet 4.581.5▼ 3.8

https://trust.shoothill.ai

Portfolio Trust Index

Weighted across your watched models

Live

91.0/ 100+1.3· 30d

Composite · 5 categories

TRUTHREASONDISCIPSTABILBUSI

Leader

Claude Opus 4.7

At risk

Claude Sonnet 4.5

Live leaderboard · last run

Claude Opus 4.797.0▲ 2.1

GPT-5.494.0▲ 1.4

Gemini 2.5 Pro93.3▲ 0.8

GPT-5.4 mini90.5▼ 0.4

Gemini 2.5 Flash87.8▲ 0.3

Example data. Trust Index runs continuously throughout the day. to see today's live benchmarks.

01 · The problem

AI models don't stay the same.

The companies behind them push updates without warning. A model that was rock-solid last week can start hallucinating, ignoring instructions, or quietly getting worse. You'll only find out when a customer notices. Trust Index is the early-warning system.

Truthfulness

Spot answers it made up

We measure how often each model invents a fact, especially in medical, legal, and finance questions. So you know the actual rate, not just the vibe.

Reasoning

Watch it on hard problems

Multi-step maths, logic, and planning: the kind of thinking your team actually relies on it for. We update the test set as the bar moves.

Discipline

See when it stops following instructions

Catches the silent slips: ignoring formatting rules, breaking persona, drifting off-brief. The kind of regression that quietly breaks AI features in production.

Stability

Catch a model getting worse

Compares each new score to the model's recent history. Email lands the moment something shifts past a threshold you set.

Readiness

Test it on real work

Tasks pulled from real enterprise jobs: drafting emails, extracting data, classifying documents, summarising. Demos look easy; we test the messy ones.

Governance

Show your working

Every score is timestamped and exportable. Pass risk reviews and audits with a paper trail, not just an opinion.

02 · How it works

How we do it.

No black-box scoring. Sample tests and the full grading methodology are published; the rest of the test set is kept private so model providers can't train against the exact prompts. Same questions every run, so scores stay comparable as the world moves on.

01 · TEST

We test the models

Every hour, we put each tracked model through the same fixed library of test cases. Bespoke enterprise scenarios, same questions every run, kept private so providers can't train against the exact prompts.

02 · GRADE

We grade the answers

Each answer is checked against the right answer, by rules that don't change between runs. So scores today and last week are directly comparable.

03 · COMBINE

We roll it up

Five categories combine into one Trust Index per model: truthfulness, reasoning, instruction adherence, stability, and business readiness.

04 · ALERT

You get the news

Set the limits you care about. We email you the moment a model you watch crosses one.

03 · Who this is for

Built for the people who answer the question "is the AI working?"

Trust Index isn't built for ML researchers. It's for the people responsible for whether AI in their organisation can be trusted with real work. Compliance leads, IT, product owners, and operations teams who'd rather know about a regression before a customer does.

Compliance / risk

Prove you checked.

You picked a model for client-facing work. Six months later, your auditor asks how you know it still meets policy. Trust Index gives you a dated, exportable record of every score since the day you started watching.

IT / engineering

Catch regressions before customers do.

Your team has GPT-5.5 in a production feature. The provider quietly updates the model and it starts ignoring your formatting rules. You see it on your dashboard the next morning, not in a customer support ticket.

AI product owner

Pick the right model on data, not vibes.

Choosing between Claude Opus and Gemini 2.5 Pro for a contract summariser? See months of side-by-side performance on the categories that actually matter for the task.

Operations / CX

Spot when the bot starts making things up.

A drafted reply that's 95% right and 5% invented is the worst kind of mistake. Trust Index tracks hallucination rate per model so you know when to retrain or switch.

Senior leadership

Walk into the board meeting with answers.

The C-suite asks if the firm's AI is working. Trust Index lets you answer with months of independent, dated evidence instead of a vendor's marketing slide.

Procurement / first-time buyer

Choose your first model with eyes open.

Bringing AI into the business for the first time? Trust Index gives you a no-vendor, no-spin view of how every major model has performed, week after week.

04 · About Shoothill

A full-service digital technology provider.

Shoothill helps businesses work smarter and become more efficient. Since 2006 we've delivered 400+ projects: bespoke software, IT infrastructure, creative and marketing services, managed cyber security. Trust Index is the free benchmark we built along the way.

Consult

Plan the tech that fits.

Copilot, modern workplace, digital transformation. Invest in the right places first.

Create

Websites, design, marketing.

Sharp creative, smart SEO, print and digital campaigns that actually move the needle.

Develop

Bespoke software, built right.

Custom web apps, mobile apps, and AI tailored to your team's real problems.

Support

Keep it running.

Managed IT, cyber security, connectivity. The hard part of keeping things live, handled.

Thinking about AI in your business?Let's talk →

Stop guessing which AI is good this week.

Trust Index is free, forever. Pick the models your team uses, set the alerts you want, and go back to your day. We'll let you know when something changes.

Talk to our team

Hourly Trust Index

New scores every hour. Always free.

Watchlists & alerts

Pick the models you use. We email when one slips.

Compare side-by-side

Run your own prompts against any model. (Paid plan.)

Audit trail built-in

Every score timestamped, exportable. Bring receipts.

Contact

Tell us about your AI project.

Shoothill helps businesses pick, build, and operate AI that's safe, useful, and commercially viable. Fill this in and we'll get back to you within one working day.