In today's agentic world, the ability to process and transform large datasets containing LLM generated code is becoming increasingly valuable. Whether you're creating synthetic code datasets, verifying outputs, generating automated test cases, or complex outputs that need code execution, there's often a need to execute code snippets automatically, securely, and at scale. This is where Curator's CodeExecutor comes in.
Curator provides a robust solution for these needs with CodeExecutor, a flexible framework that handles the complexities of code execution while giving you full control over the process.
Creating a custom code executor with Curator is straightforward. You simply extend the Curator.CodeExecutor
class and implement a few key methods:
from bespokelabs import curator
from datasets import Dataset
class HelloExecutor(curator.CodeExecutor):
def code(self, row):
location = row['location']
return f"""print("Hello {location}")"""
def code_output(self, row, execution_output):
row['output'] = execution_output.stdout
return row
locations = Dataset.from_list([
{'location': 'New York'},
{'location': 'Tokyo'}
])
hello_executor = HelloExecutor()
print(hello_executor(locations).to_pandas())
While code executor is powerful, the real value addition comes from coupling it with Curator.LLM()
. In this example, we generate a fully synthetic dataset of math animation videos using curator.LLM()
and curator.CodeExecutor()
. The data generation recipe is as follows:
Our process leverages a chained curator pipeline to generate math animation video scripts. We begin by selecting a diverse range of subjects of varying difficulty, then identify various topics within each subject, and finally craft multiple questions for each topic. In total, we produce 1,000 unique questions, each paired with its own tailored script for a math animation video. Then, for each snippet, we generate a corresponding Manim code using a single curator block with Claude-Sonnet-3.7 with thinking and self consistency enabled to improve the quality of code generated. Finally, we pass the 1000 examples through curator’s code executor with the docker backend and the manim docker image. We run this on a CPU machine on GCP and it takes about 20 minutes for the full run. Here is an example output:
We find that Claude-3.7-sonnet is able to generate simple mathematical animations in one shot. It sometimes struggles with more complicated animations that involve camera movement. In our case, around 25% of the Manim snippets generate valid videos. This is mostly due to incorrect code being generated for the given script, sometimes due to hallucination in function names and calls.
We definitely think that there is scope of improvement in the code generation pipeline by adding manim’s documentation to the prompt and/or by using models finetuned for manim code generation. Also, using CodeExecutor, one can create a pipeline with feedback loop for to regenerate code based on error message. It is quite simple to do it using curator and code executor and we’d love if you try it out!
As you can see, Curator makes it very easy to define complex workflows involving several stages of LLM calls followed by code execution.
To get started, please check the docs and the following examples:
For feature requests and bugs, please use the Curator repository in GitHub. Join our discord community TokenTown.
Star us on GitHub to support!