About the dataset

The original Talk2Car dataset had textual command with a specific action, referring to an object in the image, and the object of interest was referred to using a bounding box. However, in autonomous driving, referring directly to objects is not amenable for downstream tasks like planning. Hence, we augmented the original dataset with segmentation masks corresponding to navigable regions. The newly created Talk2Car-RegSeg dataset has 8349 training and 1163 validation image-command pairs, similar to those used in the original dataset. We observed that the commands in Talk2Car's validation set are very complex as they are verbose. To evaluate the performance in a controlled setting, we also curated a novel test split (Test-RegSeg). Test-RegSeg contains 500 randomly selected images from the validation set with newly created commands. The commands in the Test-RegSeg split are simplified and straightforward.

Publication

We have benchmarked the dataset using a novel transformer based model and a set of baseline approaches. We have also done ablations and analysis studies on this dataset (e.g. action type of commands, the length of commands) to assess its applicability in realistic scenarios. More detailed results can be viewed in our paper.

Gallery