Project Overview


Improved efficiency and consistency with labeling and annotating data.

Agencies such as the Alcohol, Tax, and Trade Bureau, want to leverage emerging technology to improve the identification of certain prohibited icons on alcohol labels submitted via Certificate of Label Approval/Exemption (COLAS). Currently, agencies must manually review each submission, which is time-consuming and expensive.

The lack of training data makes highly accurate modeling a challenge. There are labels, and there are icons, but there are no labels with icons.

AutoDataGen quickly generates thousands of randomized image examples and returns those images with appropriate classification labels and bounding boxes. It uses a web scraper tool to download the desired background images (i.e., alcohol labels) and object images (icons) based on search terms. Finally, the tool randomly combines the backgrounds and objects to create a data set that is ready to be used in Machine Learning.

This tool goes beyond cutting and pasting images. It removes backgrounds from object images creating data that is very similar to real data. Foreground detection and background removal are automated steps in this process, reducing the resources used to perform these actions.

Team Name: AutoDataGen

Team Members:
  • Andrea Mycroft (Team Lead)
  • Vinay Katari

Targeted Industry

Fed Civ





AutoDataGen improves efficiency and consistency in labeling and annotating data.


a Demo

Instead of Show and Tell, we’ll Listen and Show. We’ll listen to what challenges your agency is facing. Then we’ll show you our cutting-edge prototypes and collaborate to decide which provides the best solution and the greatest value.


Fill out my online form.