Datasets !
Here are some Datasets that I played a major role in curating.
mmmCAD : Multi-modal Modification of CAD
Currently under construction. Reach out to me and give me your opinion on how we can make the dataset better!
LARC : Language-complete Abstract Reasoning Corpus
From Communicating Natural Programs to Humans and Machines. [~350 man-hours, ] One participant (Describer) describes an abstract transformation of grids from the ARC corpus to another (Builder) using language. The builder applies the transformation on a new input grid to produce an output grid. Access the dataset here.
DARC: A Recursive Decomposition Dataset of ARC Tasks
From ANPL: Towards Natural Programming with Interactive Decomposition . [~440 man-hours] A corpus of 227 ARC tasks, recursively decomposed and grounded as Python code. Access the dataset here
DiffVL100
From DiffVL: Scaling Up Soft Body Manipulation using Vision-Language Driven Differentiable Physics. [~50 man-hours] 100 soft-body manipulation tasks inspired by real-life scenarios from online videos. Access the dataset here