Blog

Announcing winners of Reasoning Datasets Competition

From 150 innovators emerged 70 unique datasets across 25 diverse domains - from finance and medicine to ethics and gaming; meet the standouts pushing LLM reasoning to new frontiers

Bespoke-MiniChart-7B: Pushing The Frontiers Of Open VLMs For Chart Understanding

Bespoke-MiniChart-7B: an open-source 7B model that sets a new standard for chart question answering

Improving Multi-Turn Tool Use with Reinforcement Learning

Reasoning datasets competition

Bespoke Labs, Hugging Face, and Together.ai are launching a competition to find the most innovative reasoning datasets.

Combine the benefits of Retrieval-Augmented Generation and Fine-Tuning for better domain adaptation with Curator

Build domain-specific language models with Curator using Retrieval Augmented Fine-Tuning (RAFT)

Effortless Gemini Batch Processing with Curator

Cost-efficient and easy, use Gemini Batch API now with Curator

Claude 3.7 Sonnet thinking mode in Curator

Adding support for accessing the thinking traces of Claude 3.7 Sonnet in Curator

Launching Code Executor

With Bespoke Curator, you can generate code using a LLM and also execute it.

OpenThinker is a decensored reasoning model

Scaling up Open Reasoning with OpenThinker-32B

Powerful reasoning models can be trained by scaling data, verifying reasoning traces, and scaling model size. Releasing OpenThinker-32B, a state-of-the-art open-data reasoning model

Measuring Reasoning with Evalchemy

If you can't measure it, you can't improve it. Releasing reasoning benchmarks into our model evaluation tool Evalchemy

Launching the Open Thoughts Project

Open Thoughts, an open-source effort to curate the best open reasoning datasets

Cut Token Costs in Half: Batch Processing Made Easy with Curator

Bespoke-Stratos: The unreasonable effectiveness of reasoning distillation

We trained Bespoke-Stratos-32B, our reasoning model distilled from DeepSeek-R1 and using Berkeley NovaSky’s Sky-T1 data pipeline. The model outperforms Sky-T1 and o1-preview in reasoning (Math and Code) benchmarks, and almost reaches the performance of DeepSeek-R1-Distill-Qwen-32B while being trained on 47x fewer examples.

Hallucination reduction

Hallucinations, Fact checking, Entailment and all that. What does it all mean?

AI hallucinations can derail accuracy, but Bespoke's latest factuality model is designed to combat that. Learn how advanced checks are helping models deliver more reliable outputs, reducing common errors in data generation.