How to Generate Synthetic 3D Data with Bifrost

May 23,2023
Machine learning engineers and managers working with visual data, start here👇👇

Collecting data costs considerable time and money!

One of my first major contributions to the world of AI and computer vision was DeepPlastic —a computer vision model and dataset used to detect marine plastic debris in the ocean.

In order to create this dataset, my team and I had to go through incredible challenges that included renting boats, scuba diving up to 40 meters, and working with organizations requiring strict security clearance requirements.

đź’ˇIt cost more money in labor and time collecting high quality data for training our computer vision model, than any other step during the model development/hosting process!

Most traditional dataset distributions use a train/valid/test split variation in the range of ~80%/10%/10%. While getting that smaller portion of real world validation and testing data plays a critical role in building a successful ML algorithm—bootstrapping your training data will greatly accelerate the training process while drastically lowering time and cost!

Components of a fully synthetic dataset

🤔A fully synthetic dataset generated by AI will often times be referred to as a “synthset”

A synthset consists of the following components:

  • Objects and classes: the 3D objects a computer vision model will be detecting and the class labels used to describe that object’s name/properties
  • Environment and weather: the type of environment objects will be rendered in and the weather conditions for that environment
  • Sensors: the type of virtual camera that will be used to capture images of objects within the environment
    • EX: a virtual RGB digital camera sensor similar to that of a generic mobile phone camera

How to generate a synthetic dataset of 3D ships in a maritime environment

Let’s generate a synthetic dataset of 3D ships in a variety of coastal maritime conditions. This dataset could then be used train a computer vision model capable of detecting and/or classifying the various ships.

Step 0: signing up

Head over to the production Bifrost.ai web application (Alchemy) to get started! Click Sign up here! to create your account.

Step 0
Fig 0 - sign up landing page

Step 1 and 2: add a project and start creating a synthset

đź’ˇProjects and synthsets can be used as a form of version control! Use separate projects and individual synthsets as a method of iteration.

The project will store each individual synthset as a separate version within it.

Step 1 & 2
Fig 1 - creating a project to generate and store synthetic data within

Step 3: Select the “maritime” channel

Step 3
Fig 2 - the selecting the maritime channel

Channels dictate what type of environment will be rendered and how the 3D assets will populate within it.

Step 4: Start generating a synthset

Step 4
Fig 3 - generating the new synthset

🚨Define object classes in an intentional way that makes them distinct from other similar looking objects! A strong classification ontology will greatly improve model performance and simplify model improvement.

Step 4
Fig 4 - picking a strong class ontology

Step 5: Select assets for rendering

Choose a large collection of assets, or pick individual assets out of a collection.

Step 5
Fig 5 - search for and select the assets to generate within the maritime channel’s environment

Step 6: Select class distributions

đź’ˇUse class distribution weights to ensure that datasets have the perfect balance of class representations. This can be used to bootstrap a model with good balance at the start of development, or to rebalance an existing dataset in need of specific classes!

Step 6
Fig 6 - specifying a 50-50 balance of aircraft carriers and battleships to be present in the synthset

Step 7: Adjust object variance

đź’ˇUse object positioning to capture under represented angles of your objects. For example, you can specify which direction a boat faces to capture more of the stern if you need samples of the boat facing away from the camera!

Easily adjust the size and location of objects during render time by specifying exact details. See the scale and distance variation charts below for an idea of what the numbers correlate to.

Step 7
Fig 7 - adjusting the size and position of the ships within the environment

Step 7_1
Fig 8 - examples of scale and material variation

Step 7_2
Fig 9 - general distance from camera by spawn zone category

Step 7_3
Fig 10 - rendering examples of spawn zone categories

Step 8: Adjusting the environments weather patterns

💡Don’t wait for the perfect weather conditions to capture your data! Synthetically generate random or rare environments to capture challenging data scenarios without waiting for it to occur in the wild.

Similar to object’s, the environment can also be tuned to specific weather patterns and their occurrence rate adjusted accordingly.

Step 8
Fig 11 - selecting weather conditions and their occurrence rates

Step 8_1
Fig 12 - rendered examples of the maritime weather conditions

Step 9: Choose a virtual sensor to take images with

đź’ˇVirtual sensors can replicate expensive cameras, eliminating the need to buy expensive hardware upfront. Get a sense for what data will look like and the efficacy of a sensor before you commit to purchasing it!

Virtual sensors act as the cameras which will take the images in the rendered 3D environment. Select how the sensor should behave and the data type to store images in.

Step 9
Fig 13 - selecting a virtual sensor’s type and parameters

Step 9_1
Fig 14 - examples of field of view (FOV) in the rendered environment

Step 9_2
Fig 15 - examples of sensor elevation being applied at different heights within the rendered environment

Last step: Specify number of objects and model annotation format

đź’ˇDensity of objects present within an image can lead to unique situations and especially difficult to collect in the wild. Ensure you cover all edge cases by having the appropriate amount of sparsity in images!

Set a range of distribution for number of objects to render within each image and the resulting annotation file format.

Last Step
Fig 16 - selecting number of images to render per image and a .coco annotation file format

Review and adjust the quality of the synthetic data

đź’ˇData looking good but needs some adjustments? Simply adjust the settings in the previous steps and generate a new dataset with the ideal properties!

Sample some of the generated images in the resulting synthsets preview window to check how the images turned out.

Each generated dataset will include a dataset_summary.html file containing information about the number of images, annotations, and class details for a dashboard view to the synthset.

Review
Fig 17 - examples of the methods one can use to review the quality of their synthetic data results

Generate a synthset now!

Data forms the lifeblood of innovation. Embrace the power of synthetic data with and forever change the way you approach data-driven challenges.

Create a Bifrost.ai account and start generating synthetic data today! Contact us at sales@bifrost.ai to enable access to the generative channels you need.

https://www.bifrost.ai/request-demo

âť“Get stuck and need some help? Reach out to sales@bifrost.ai for timely support!
Share this article:
August 25,2023

Similarity and Diversity: The Core Foundations of Robust Computer Vision Models

In the vibrant field of artificial intelligence (AI), computer vision stands out as one of the most...

August 21,2023

The Business Value of Synthetic Data: Accelerating Growth While Reducing Costs

In the contemporary data-driven business landscape, acquiring quality data for machine learning (ML)...

March 21,2023

It's 2022 and Data Labeling Still Sucks

You've heard it before. Labeling data for machine learning sucks. Labeling is laborious, time consum...

March 16,2023

Modern Strategies for Data Curation in Computer Vision

AI systems are extremely powerful. But when they fail, they often mess up spectacularly. Unsurprisin...

March 22,2023

How Your Data Collection Strategy Influences Your AI's Behavior

In this article, we explain how your choice of data collection method influences AI behavior, and li...

August 29,2023

How to Improve your Models Effectively - Beyond mAP as a Metric

By the end of this blog post, you should be able to: - Understand the best practices behind iterati...

April 24,2023

The differences between human vision and computer vision and why you need domain randomization

Most companies believe they can go outside, snap some pictures and train a robust Computer Vision (C...

March 21,2023

How I Beat The State-of-the-Art in One Week as an Intern

How we managed to beat the state of the art in synthetic-trained aircraft detection within a week us...

March 13,2023

Why Synthetic Data is the Unfair Advantage for AI

In the last decade, we’ve seen AI create brand new industries to solve some of the world’s most crit...

June 08,2023

Unlocking the Complexities of Synthetic Data: Challenges, Lessons & The Way Forward

Diverse, well-labeled data has become the biggest bottleneck to building computer vision application...

August 25,2023

Similarity and Diversity: The Core Foundations of Robust Computer Vision Models

In the vibrant field of artificial intelligence (AI), computer vision stands out as one of the most...

August 21,2023

The Business Value of Synthetic Data: Accelerating Growth While Reducing Costs

In the contemporary data-driven business landscape, acquiring quality data for machine learning (ML)...

March 21,2023

It's 2022 and Data Labeling Still Sucks

You've heard it before. Labeling data for machine learning sucks. Labeling is laborious, time consum...

March 16,2023

Modern Strategies for Data Curation in Computer Vision

AI systems are extremely powerful. But when they fail, they often mess up spectacularly. Unsurprisin...

March 22,2023

How Your Data Collection Strategy Influences Your AI's Behavior

In this article, we explain how your choice of data collection method influences AI behavior, and li...

August 29,2023

How to Improve your Models Effectively - Beyond mAP as a Metric

By the end of this blog post, you should be able to: - Understand the best practices behind iterati...

April 24,2023

The differences between human vision and computer vision and why you need domain randomization

Most companies believe they can go outside, snap some pictures and train a robust Computer Vision (C...

March 21,2023

How I Beat The State-of-the-Art in One Week as an Intern

How we managed to beat the state of the art in synthetic-trained aircraft detection within a week us...

March 13,2023

Why Synthetic Data is the Unfair Advantage for AI

In the last decade, we’ve seen AI create brand new industries to solve some of the world’s most crit...

June 08,2023

Unlocking the Complexities of Synthetic Data: Challenges, Lessons & The Way Forward

Diverse, well-labeled data has become the biggest bottleneck to building computer vision application...

August 25,2023

Similarity and Diversity: The Core Foundations of Robust Computer Vision Models

In the vibrant field of artificial intelligence (AI), computer vision stands out as one of the most...

August 21,2023

The Business Value of Synthetic Data: Accelerating Growth While Reducing Costs

In the contemporary data-driven business landscape, acquiring quality data for machine learning (ML)...

March 21,2023

It's 2022 and Data Labeling Still Sucks

You've heard it before. Labeling data for machine learning sucks. Labeling is laborious, time consum...

March 16,2023

Modern Strategies for Data Curation in Computer Vision

AI systems are extremely powerful. But when they fail, they often mess up spectacularly. Unsurprisin...

March 22,2023

How Your Data Collection Strategy Influences Your AI's Behavior

In this article, we explain how your choice of data collection method influences AI behavior, and li...

August 29,2023

How to Improve your Models Effectively - Beyond mAP as a Metric

By the end of this blog post, you should be able to: - Understand the best practices behind iterati...

April 24,2023

The differences between human vision and computer vision and why you need domain randomization

Most companies believe they can go outside, snap some pictures and train a robust Computer Vision (C...

March 21,2023

How I Beat The State-of-the-Art in One Week as an Intern

How we managed to beat the state of the art in synthetic-trained aircraft detection within a week us...

March 13,2023

Why Synthetic Data is the Unfair Advantage for AI

In the last decade, we’ve seen AI create brand new industries to solve some of the world’s most crit...

June 08,2023

Unlocking the Complexities of Synthetic Data: Challenges, Lessons & The Way Forward

Diverse, well-labeled data has become the biggest bottleneck to building computer vision application...