Bespoke Labs

Generating synthetic data at scale can be expensive. So, several LLM API providers, including Google, offer 50%-70% discounts through batch mode, which processes large requests asynchronously. However, batch API with Gemini is notoriously tricky due to many steps involved and scattered documentation.

The challenge with Gemini batch API

Let’s go over the steps required for a simple Gemini batch processing (when not using Curator):

Create request files in JSONL format (must follow Gemini’s request structure!).
Upload this file to a GCP bucket and get the cloud storage URL (and keep track of this).
Create a batch prediction job on Vertex AI with the same cloud storage URL.
Split requests exceeding 150k, repeating steps 1 and 2 for each batch.
Manual polling of status from Vertex using batch IDs (gets complicated when multiple batch files are uploaded).
Persist responses manually for basic caching.

These steps add a lot of friction, causing many users to stick to online processing and miss out on significant cost savings. Curator solves this by making Gemini’s batch APIs easy to use!

Curator Gemini Batch mode: 50% cost-efficient and infinitely easier

No manual polling, no file management, just cost-efficient batch processing in a few lines of code.

import os

from bespokelabs import curator

os.environ["HOSTED_CURATOR_VIEWER"]="1"
os.environ["GOOGLE_CLOUD_PROJECT"] = "<project-id>"
os.environ["GEMINI_BUCKET_NAME"] = "<bucket-name>"
os.environ["GOOGLE_CLOUD_REGION "] = "us-central1"  # us-central1 is default

llm = curator.LLM(model_name="gemini-1.5-flash-001", backend="gemini", batch=True)
questions = [
    {"prompt": "What is the capital of Montana?"},
    {"prompt": "Who wrote the novel 'Pride and Prejudice'?"},
    {"prompt": "What is the largest planet in our solar system?"},
    {"prompt": "In what year did World War II end?"},
    {"prompt": "What is the chemical symbol for gold?"},
]
ds = llm(questions)

Happy data generation!

‍

Check out the step-by-step guide here

Read more about other API batch processing offered by Curator for OpenAI, Anthropic and more here.

As always, please give us feedback and show us support by spreading the word and starring on GitHub!

‍

Effortless Gemini Batch Processing with Curator

The challenge with Gemini batch API

Curator Gemini Batch mode: 50% cost-efficient and infinitely easier

Building the foundation for the modern agentic world.