Bespoke-MiniCheck

Combat hallucination with our SOTA grounded factuality model and API

Bespoke-MiniCheck is a SOTA grounded factuality model, which ranks at the top of the LLM-AggreFact leaderboard.

Figure: Leaderboard showing the performance of different models on the LLM-AggreFact benchmark for grounded factuality and hallucinations with Bespoke-MiniCheck-7B model ranking first.

Grounded Factuality

Grounded factuality is a way of measuring hallucination. In grounded factuality, the source of truth is given in a context document. Then, the factuality of a claim is checked with respect to that given context. This problem is also called textual entailment in the NLP and linguistics literature. Please read our blog article for more information.

Grounded factuality is extremely important for RAG, where a context naturally exists, and LLMs generate claims (answers). If a claim is not factually grounded in a context, then it means the model hallucinated some unsupported information. For example, a Stanford study found that in the legal setting, RAG-based AI research tools hallucinate 17% to 33% of the time, contrary to claims that RAG systems are “hallucination-free”.

Bespoke-MiniCheck 7B

Using our proprietary curation platform, we trained a 7B model that is remarkably good at grounded factuality. Given a context and a claim, this model spits out a percentage that says how well the claim is supported by the context. We have made this model available on HuggingFace for non-commercial use.

As mentioned above, this model tops the LLM-AggreFact leaderboard with 77.4% on the benchmark. Vectara’s HHEM 2.1, a model with similar capability, gets only 71.8%.

Small but Mighty

The model we trained is relatively small and beats the performance of much larger models, such as Claude 3.5 Sonnet, in this task. As a result, the model can respond to results in about 200ms milliseconds on modern GPUs, thus being useful as a guardrail, and can run on consumer-grade hardware such as Macbooks. Please contact us if you are interested in 100ms response times.

Bespoke-MiniCheck API

We are excited to announce the availability of this model’s capability via a self-serve API platform. You can sign up for free at our Bespoke Console, and use our client library for easy access. Please check the documentation at Bespoke Docs. You can drastically improve your RAG game in just a few lines of code.

Try Before You Buy

Not convinced that the benchmark numbers tell the full story? We get it. Experience the product’s snappy performance at our Bespoke Playground.

Bespoke-MiniCheck blows everything else we tested out of the water
— ML Engineer at GuardrailsAI

Integrations

Ollama has added support for our model as a first class citizen. It’s available here. The model produces yes/no, but logit support will land soon.

The model is also available via GuardrailsAI’s hub. See more information here.

Contact

For questions/comments related to the product/model, please either contact us or schedule a meeting with one of the founders.