Curation that puts your data to work

Empowering engineers to build high-quality, tailored datasets for precise model fine-tuning.
Mahesh
About us
Mahesh — a former staff engineer at Google DeepMind has teamed up with Alex of UT Austin, to launch Bespoke Labs. While compute and models are now abundant, their focus is on improving access to high-quality data that is critically needed to push the field forward.
Alex
Research is in our DNA
Research is kinda what we do. It’s part of our DNA and keeps us ahead of the game when it comes to new tools and insights. We also just love a good paper.
DATACOMP: In search of the next generation of multimodal datasetsAbstractMultimodal datasets are a critical component in recent breakthroughs such as CLIP, Stable Diffusion and GPT-4, yet their design does not receive the same research attention as model architectures or training algorithms. To address this shortcoming in the machine learning ecosystem, we introduce DATACOMP, 
a testbed for dataset experiments centered around a new candidate pool of 12.8 billion image-text pairs from Common Crawl. Participants in our benchmark design new filtering techniques or curate new data sources and then evaluate their new dataset 
by running our standardized CLIP training code and testing the resulting model on 38 downstream test sets. Our benchmark consists of multiple compute scales spanning four orders of magnitude, which enables the study of scaling trends and makes the benchmark accessible to researchers with varying resources. Our baseline experiments show that the DATACOMP workflow leads to better traini
published
  9/23