Bifrost’s Generative AI approach to building performant Computer Vision Models

May 19,2023
Challenges with Synthetic Data

While synthetic data has immense potential, it’s important to remember that all synthetic data is NOT the same. Synthetic data generation is a by-product of the process and theory used to develop it. It too can fall prey to the same challenges often faced by real data:

  1. Biased Data (Generation): Biased data (generation) is one of the most common challenges all data faces. Individuals tend to base training data off what they are familiar with, which leads to poor performance when the model is confronted with different classes, groups, environments or surroundings.
  2. Overfitting: when a model may perform exceptionally well on validation data but poorly on new, unseen images, because it has effectively 'memorized' the training data rather than learning to identify key features that define the target class.
  3. Inadequate Diversity: Without a broad range of examples, the model will not learn the full complexity of the problem, resulting in suboptimal performance when presented with new, different inputs.

Example of Poor Performance

Lockheed Martin released a paper a few years ago sighting poor performance on a model trained with synthetic data and validated on real images. However, when you drill down into the details you notice:

  • They only used 3 different models of the aircraft.
  • They placed all the airplanes on a tarmac background.

As a result, the classifier became biased towards that narrow set of variables and began making decisions based on the image background (tarmac) instead of the image foreground (C130 aircraft).

Example of Poor Performance
Heatmap showing where the neural net looks at to make its prediction. Red: area of focus. The neural net focused primarily on backgrounds for 2 out of 4 images.

It’s unsurprising that the white paper conclusion is that synthetic data doesn’t provide measurable benefits. However, this paper epitomizes the challenges with training a computer vision (CV) model. People assume CV models have similar learning pattern as a human. They assume if you have collected 1,000 photos of an object in the real world, the model will be robust enough to perform reliably in production.

Chances are the CV model will perform well under typical scenarios. However, eventually, life’s noises and complexities appear. Cameras will jitter, sensors are affected by weather,skies turn orange from wildfire, objects show upin the least expected places,vehicles you only expect to see in the movies are barreling down the road, and the list goes on.

The Bifrost Approach to Synthetic Data

Bifrost works to overcome the challenges faced with CV building, whether using real or synthetic data. The goal of ensuring the right blend of similarity to how things will be perceived in the production deployment and diversity of conditions. We achieve that through 4 key areas:

1. Parametrically Infinite Variation of Asset

a. Textures, colors, orientations, sizes, weapon configurations, etc. Bifrost can generate 1,000s of different variations of any asset

b. A similarly diverse range of backgrounds (tarmac, forest, grass, etc.)

c. Using a diverse set of backgrounds, we ensure the model focuses on the features of the asset itself rather than the background.

2. Domain Adaptation:

a. A common issue of synthetic data is a distinctive difference between computer-generated and real imagery. AI models can pick up this difference. With unoptimized synthetic imagery, performance can drop the longer you train on it. However, our data has been specifically tuned to emulate specific sensors on a pixel and feature level. This results in more stable training over time and higher accuracy overall.

3. Sensor-Specification Data Generation

a. Bifrost-generated data is built to match specific sensor attributes, optimizing performance for that particular sensor. Such specificity greatly improves performance. Our post-processing techniques emulate real sensor properties and artifacts.

As our F-16 Bench shows, when synthetic data is implemented correctly it forces the neural net to only focus on the object of interest (e.g. the F-16) to make its decision rather than trying to take shortcuts and utilize background information.

Approach to Synthetic Data
Focus heatmap for a classifier trained on Bifrost synthetic data. Notice how focus is primarily on the main body of the aircraft.

The result is a trained model that performs better across a wider range of environments, scenarios and objects. In other words, more performance out of the box!

Want to learn more about Bifrost and how we are enabling companies to build better computer vision models, faster? Reach out at hello@bifrost.ai or here!


Want to play around with Bifrost’s F-16 dataset? access it here!

Share this article:
March 21,2023

It's 2022 and Data Labeling Still Sucks

You've heard it before. Labeling data for machine learning sucks. Labeling is laborious, time consum...

March 16,2023

Modern Strategies for Data Curation in Computer Vision

AI systems are extremely powerful. But when they fail, they often mess up spectacularly. Unsurprisin...

March 22,2023

How Your Data Collection Strategy Influences Your AI's Behavior

In this article, we explain how your choice of data collection method influences AI behavior, and li...

April 24,2023

The differences between human vision and computer vision and why you need domain randomization

Most companies believe they can go outside, snap some pictures and train a robust Computer Vision (C...

March 21,2023

How I Beat The State-of-the-Art in One Week as an Intern

How we managed to beat the state of the art in synthetic-trained aircraft detection within a week us...

March 13,2023

Why Synthetic Data is the Unfair Advantage for AI

In the last decade, we’ve seen AI create brand new industries to solve some of the world’s most crit...

May 23,2023

How to Generate Synthetic 3D Data with Bifrost

Create a Bifrost.ai account and start generating synthetic data today! Contact us at sales@bifrost.a...

March 21,2023

It's 2022 and Data Labeling Still Sucks

You've heard it before. Labeling data for machine learning sucks. Labeling is laborious, time consum...

March 16,2023

Modern Strategies for Data Curation in Computer Vision

AI systems are extremely powerful. But when they fail, they often mess up spectacularly. Unsurprisin...

March 22,2023

How Your Data Collection Strategy Influences Your AI's Behavior

In this article, we explain how your choice of data collection method influences AI behavior, and li...

April 24,2023

The differences between human vision and computer vision and why you need domain randomization

Most companies believe they can go outside, snap some pictures and train a robust Computer Vision (C...

March 21,2023

How I Beat The State-of-the-Art in One Week as an Intern

How we managed to beat the state of the art in synthetic-trained aircraft detection within a week us...

March 13,2023

Why Synthetic Data is the Unfair Advantage for AI

In the last decade, we’ve seen AI create brand new industries to solve some of the world’s most crit...

May 23,2023

How to Generate Synthetic 3D Data with Bifrost

Create a Bifrost.ai account and start generating synthetic data today! Contact us at sales@bifrost.a...

March 21,2023

It's 2022 and Data Labeling Still Sucks

You've heard it before. Labeling data for machine learning sucks. Labeling is laborious, time consum...

March 16,2023

Modern Strategies for Data Curation in Computer Vision

AI systems are extremely powerful. But when they fail, they often mess up spectacularly. Unsurprisin...

March 22,2023

How Your Data Collection Strategy Influences Your AI's Behavior

In this article, we explain how your choice of data collection method influences AI behavior, and li...

April 24,2023

The differences between human vision and computer vision and why you need domain randomization

Most companies believe they can go outside, snap some pictures and train a robust Computer Vision (C...

March 21,2023

How I Beat The State-of-the-Art in One Week as an Intern

How we managed to beat the state of the art in synthetic-trained aircraft detection within a week us...

March 13,2023

Why Synthetic Data is the Unfair Advantage for AI

In the last decade, we’ve seen AI create brand new industries to solve some of the world’s most crit...

May 23,2023

How to Generate Synthetic 3D Data with Bifrost

Create a Bifrost.ai account and start generating synthetic data today! Contact us at sales@bifrost.a...