Combine the benefits of Retrieval-Augmented Generation and Fine-Tuning for better domain adaptation with Curator

Bespoke Minicheck Illustration

Retrieval-augmented generation (RAG) is a powerful approach for grounding large language models (LLMs) in external knowledge. But at times, it becomes essential to incorporate external knowledge into the LLM itself, especially if it is small or there is a lot of proprietary knowledge.

To achieve this, we are adding support to generate data with Retrieval Augmented Finetuning (RAFT) method in Curator.

What is RAFT?

RAFT (Retrieval-Augmented Fine-Tuning) is a technique the Berkeley Gorilla team developed in collaboration with Microsoft Research. RAFT enables the creation of synthetic question-answer (QA) datasets from text documents, which can be used to fine-tune LLMs for better performance in RAG pipelines. RAFT takes chunks of a document and generates relevant question-answer pairs based on the content. The following two key ideas are used when creating the training data:

  1. Along with the question, the document that gave rise to the context (called the oracle document) is added, along with other distracting documents.
  2. With a probability, the oracle document is not included.

The former allows the model to pay close attention to the relevant context, and the latter allows the model to memorize some knowledge.

In Curator we have made it very easy to create this data, which can then be fed to existing fine-tuning libraries.

RAFT learns to ignore distracting documents (image credit)

RAFT + Curator

We’ve integrated RAFT natively into Curator through curator.blocks.raft. That means you can generate high-quality synthetic QA datasets with just a few lines of code, and it fits right into your existing LLM workflow with curator.LLM objects.

Here’s a working example you can try out:👉 RAFT Example on GitHub

In this walkthrough, we demonstrate:

  • How to generate RAFT QA datasets with Curator
  • How to fine-tune a model on this data
  • How to use the fine-tuned model in a RAG inference script

RAFT with Curator

✅ Native hosted viewer for live dataset previews

✅ Full support for batch mode generation - 50% cost efficient

✅ Smart caching to avoid redoing work

✅ Faster QA dataset generation

✅ Clean and simple API: curator.blocks.raft

If you're building production-grade RAG systems and care about quality, speed, and developer experience, Curator’s RAFT support is designed for you.

As always, please give us feedback and show us support by starring CURATOR on GitHub!

Ready to fine-tune your models with precision?

By submitting, you agree to receive periodic emails from Bespoke Labs.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.