Reasoning datasets competition

Bespoke Minicheck Illustration

TL;DR Bespoke Labs, Hugging Face, and Together.ai are launching a competition to find the most innovative reasoning datasets. Create a great proof-of-concept reasoning dataset and win prizes to help you scale your work!

The Deepseek moment for datasets

Since the launch of DeepSeek-R1 in January 2025, we've witnessed remarkable growth in reasoning-focused datasets on the Hugging Face Hub. These datasets, such as OpenThoughts-114k, OpenCodeReasoning and codeforces-cot, have primarily centered on mathematics, coding, and science – domains with clearly verifiable answers.

However, we're now seeing reasoning approaches expand into new territories, including: Financial analysis, Medical reasoning and Multi-domain reasoning

A strong open dataset can have a massive impact on the open source community, enabling a new generation of models to be trained and evaluated. For example, OpenThoughts-114k has been used to train more than 230 models. We believe the next breakthroughs in model performance won’t come from architecture alone, they’ll come from better data. That’s why now is the perfect moment to rally the open source community curating reasoning datasets that reflect the real world’s complexity, uncertainty, and richness.

To accelerate the progress on reasoning, we're launching a competition for reasoning datasets.

How the competition works

The goal is simple: create impactful proof-of-concept reasoning datasets and share them with the community. The best submissions will win prizes designed to help scale these datasets and train models using this data.

Competition Timeline🗓️

  • Launch Date: April 9, 2025
  • Submission Deadline: May 1, 2025, at 11:59 pm Pacific Time
  • Winners Announced: May 5, 2025

How to Submit Your Dataset

We're hosting all submissions on the Hugging Face Hub:

  • Create your dataset with at least 100 examples
  • Make it publicly available on the Hub
  • Tag it with reasoning-datasets-competition

For judging purposes, we'll evaluate a sample of 100 rows from each submission (or the entire dataset if it contains exactly 100 rows). More on evaluation criteria in subsequent section

Submission Requirements

To be considered for prizes, your dataset must meet the following criteria:

  • Size: Include at least 100 examples/rows to demonstrate your dataset's concept and quality
  • Documentation: Provide a comprehensive dataset card that includes:
    • Clear description of the dataset's purpose and scope
    • Detailed explanation of how the dataset was created
    • Examples of how the dataset can be used for model training or evaluation
    • Any limitations or biases to be aware of
  • Accessibility: Ensure your dataset has a valid dataset viewer preview for easy browsing
  • Discoverability: Tag your dataset with reasoning-datasets-competition to be officially considered
  • Licensing: Include clear licensing information that allows for research use

While these are the minimum requirements we encourage you to go beyond these! Think of your dataset card as your pitch. It’s your chance to showcase what makes your dataset the best, and help judges see why you deserve a high score across our evaluation criteria: Approach, Domain, and Quality.

We've provided templates and examples to help you get started quickly below.  

What We Are Looking For 🔍

We welcome all innovative approaches, but here are some areas we're particularly excited about:

Reasoning Datasets for New Domains

We're keen to see datasets that expand reasoning beyond traditional STEM fields. Consider domains like:

  • Legal reasoning: Cases requiring application of statutes and precedents to reach judgments
  • Financial analysis: Scenarios requiring evaluation of complex investment opportunities
  • Literary interpretation: Texts requiring evidence-based analysis of themes and symbolism
  • Ethics and philosophy: Problems requiring structured moral or philosophical reasoning
Reasoning Datasets for Novel Tasks

While most reasoning datasets focus on improving benchmarks for mathematics or coding, there are other tasks where reasoning models could significantly improve performance:

  • Structured data extraction: Datasets teaching models to extract and organize information from unstructured text (example approach)
  • Zero-shot classification: Datasets focused on training smaller models to be more effective zero-shot classifiers through reasoning
  • Search improvement: Reasoning datasets designed to enhance search relevance and accuracy
  • Diagrammatic reasoning: Datasets that train models to interpret, analyze, and reason about visual representations like flowcharts, system diagrams, or decision trees
  • Constraint satisfaction problems: Collections teaching models to reason through complex scheduling, resource allocation, or optimization scenarios with multiple interdependent constraints
  • Evidence evaluation: Datasets demonstrating how to assess source credibility and weigh conflicting information
  • Counterfactual reasoning: Collections developing "what if" thinking by systematically altering variables and exploring potential outcomes
Reasoning Distillation Datasets

One key insight from the DeepSeek paper was that distillation can effectively transfer reasoning capabilities from larger to smaller models. We're interested in datasets specifically designed for this purpose.

Datasets to support a reasoning "ecosystem"

Beyond direct reasoning datasets, we're interested in collections that help build a robust reasoning ecosystem. This could include:

  • Reasoning classification: Datasets for training models to classify or annotate different types of reasoning
  • Error detection: datasets for training models to identify flaws in reasoning processes

This area is one where you can potentially make a big impact without needing a lot of resources to get started.

Evaluation Criteria

Submissions will be judged on three dimensions: Approach, Domain, and Quality. Within these, we consider factors like novelty, scalability, and utility

Dimension Table
Dimension What It Covers What We Value
Approach Dataset creation method: tools, prompts, pipelines Novelty: Unique generation strategies
Scalability: Can it scale or be reused?
Domain Targeted reasoning domain or skill Novelty: Covers non-standard fields
Utility: Real-world or model-relevant
Quality Clarity, depth, diversity and completeness of examples Well-structured prompts
Reasoning-rich outputs
No hallucinations

Prizes 🏆

First Place 🥇:

$1,500 USD in API credits from Together.ai

$1,500 USD gift card from Amazon (or country-specific equivalent)

Hugging Face Pro subscription (alongside compute credits to scale up the dataset)

1st and 2nd Runner-up (each):

$500 USD gift card from Amazon (or country-specific equivalent)

Hugging Face Pro subscription (alongside compute credits to scale up the dataset)

Spotlight Awards 🌟:

Top 4 innovative uses of Curator each win a $250 USD gift card from Amazon (or country-specific equivalent)

All Participants:
Every eligible participant receives $50 USD in API credits from Together.ai (details in FAQ below)

How to Signup 📝

Step 1: Register here to receive Together.ai credit and updates on the competition

Step 2: Join the competition discussion thread on HuggingFace

Step 3: Join #reasoning-dataset-competition channel on Discord

Resources for creating reasoning datasets 🧰

If you want to quickly get started in creating a reasoning dataset you can checkout:

Frequently Asked Questions

Q: How can I ask questions?

A: You can ask questions on this discussion thread

Q: Can I submit multiple datasets?

A: Yes, you can submit as many datasets as you want. 

Q: How to claim Together AI credits?

A: Fill this questionaire on Together's website. Enter hackathon name (question 6) as 'Reasoning datasets competition'

Q: Can I collaborate with others?

Absolutely! Team submissions are welcome.

Q: Do I have to use Curator to generate my dataset?

You can use whatever tools you prefer to create the dataset. 

Q: Do I have to use LLMs/synthetic data to generate my dataset?

No, you can take whatever approach you think is best.

Got more questions? Head over to community discussions threads on Hugging Face & Discord

Ready to fine-tune your models with precision?

By submitting, you agree to receive periodic emails from Bespoke Labs.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.