We’re hiring! Join our mission to build the foundation for the agentic world. See Open Roles ->

[

]

Launching Code Executor

With Bespoke Curator, you can generate code using a LLM and also execute it.

No headings found on page

In today's agentic world, the ability to process and transform large datasets containing LLM generated code is becoming increasingly valuable. Whether you're creating synthetic code datasets, verifying outputs, generating automated test cases, or complex outputs that need code execution, there's often a need to execute code snippets automatically, securely, and at scale. This is where Curator's CodeExecutor comes in.

Curator provides a robust solution for these needs with CodeExecutor, a flexible framework that handles the complexities of code execution while giving you full control over the process.

Hello world example

Creating a custom code executor with Curator is straightforward. You simply extend the Curator.CodeExecutor class and implement a few key methods:

from bespokelabs import curator
from datasets import Dataset

class HelloExecutor(curator.CodeExecutor):     
  def code(self, row):
    location = row['location']
    return f"""print("Hello {location}")"""
	
  def code_output(self, row, execution_output):        
    row['output'] = execution_output.stdout 
    return row
		
locations = Dataset.from_list([
    {'location': 'New York'},
    {'location': 'Tokyo'}
])

hello_executor = HelloExecutor() 

print(hello_executor(locations).to_pandas())
from bespokelabs import curator
from datasets import Dataset

class HelloExecutor(curator.CodeExecutor):     
  def code(self, row):
    location = row['location']
    return f"""print("Hello {location}")"""
	
  def code_output(self, row, execution_output):        
    row['output'] = execution_output.stdout 
    return row
		
locations = Dataset.from_list([
    {'location': 'New York'},
    {'location': 'Tokyo'}
])

hello_executor = HelloExecutor() 

print(hello_executor(locations).to_pandas())
from bespokelabs import curator
from datasets import Dataset

class HelloExecutor(curator.CodeExecutor):     
  def code(self, row):
    location = row['location']
    return f"""print("Hello {location}")"""
	
  def code_output(self, row, execution_output):        
    row['output'] = execution_output.stdout 
    return row
		
locations = Dataset.from_list([
    {'location': 'New York'},
    {'location': 'Tokyo'}
])

hello_executor = HelloExecutor() 

print(hello_executor(locations).to_pandas())

Synthetically generating 3Blue1Brown-like videos

While code executor is powerful, the real value addition comes from coupling it with Curator.LLM(). In this example, we generate a fully synthetic dataset of math animation videos using curator.LLM() and curator.CodeExecutor(). The data generation recipe is as follows:

Data generation recipe for Math Animated Videos

Our process leverages a chained curator pipeline to generate math animation video scripts. We begin by selecting a diverse range of subjects of varying difficulty, then identify various topics within each subject, and finally craft multiple questions for each topic. In total, we produce 1,000 unique questions, each paired with its own tailored script for a math animation video. Then, for each snippet, we generate a corresponding Manim code using a single curator block with Claude-Sonnet-3.7 with thinking and self consistency enabled to improve the quality of code generated. Finally, we pass the 1000 examples through curator’s code executor with the docker backend and the manim docker image. We run this on a CPU machine on GCP and it takes about 20 minutes for the full run. Here is an example output:

We find that Claude-3.7-sonnet is able to generate simple mathematical animations in one shot. It sometimes struggles with more complicated animations that involve camera movement. In our case, around 25% of the Manim snippets generate valid videos. This is mostly due to incorrect code being generated for the given script, sometimes due to hallucination in function names and calls.

We definitely think that there is scope of improvement in the code generation pipeline by adding manim’s documentation to the prompt and/or by using models finetuned for manim code generation. Also, using CodeExecutor, one can create a pipeline with feedback loop for to regenerate code based on error message. It is quite simple to do it using curator and code executor and we’d love if you try it out!

Get started

As you can see, Curator makes it very easy to define complex workflows involving several stages of LLM calls followed by code execution.

To get started, please check the docs and the following examples:

  1. Math video generation

  2. Colab example

For feature requests and bugs, please use the Curator repository in GitHub. Join our discord community TokenTown.

Star us on GitHub to support!

Share

Science

Science

Build

Build

Data

Data

Updates

Updates

[ Environment research ] & infrastructure for the agent era.

©2026 BespokeLabs.AI, Inc.

[ Environment research ] & infrastructure for the agent era.

©2026 BespokeLabs.AI, Inc.

[ Environment research ] & infrastructure for the agent era.

©2026 BespokeLabs.AI, Inc.