X

The amount of video data Nvidia scraped will blow your mind

AI video generators are now on the rise just like how chatbots were on the rise last year. Companies need to obtain and ingest a ton of video data to train their AI models, and that’s where some issues (legal issues) come in. According to a new report, Nvidia scraped a massive amount of video data to train AI.

First, we were learning about how companies stole art to train their image models. Then, we learned about how companies gathered text data to train their chatbots. Now, the same thing is happening with AI video. We’re learning about how several companies downloaded and even pirated a boatload of videos from different sites to train their AI video models.

For example, a report states that Runway downloaded a ton of YouTube videos to train its AI model, and it even possibly pirated videos. Also, reports pointed to OpenAI scraping YouTube videos to train its AI model as well. So, the content that you post to YouTube (you know, that content that you posted for the sake of creation, not to be stolen to train an AI model) isn’t safe.

Nvidia scraped a ton of video data to train AI

This information comes from leaked Slack conversations and documents within Nvidia. The company has some major plans in the works that involve AI. It’s training a model called “Cosmos”, and it plans to use it for applications like a 3D world generator called Omniverse. Also, the company is working on self-driving car systems.

Well, applications like these require some major video data to work properly, so the company instructed workers to go scoop up some major data. Since YouTube is the largest video-sharing platform in the world, it was first on the company’s list. Employees needed to use the YouTube video downloader tool called yt-dlp.

It wasn’t that simple, as they also needed to use virtual machines on Amazon Web Services. This method actually allows them to refresh their IP addresses, which helped them avoid being detected by YouTube.

The Big Red Button wasn’t the only target, as the company was also able to scrape videos from Netflix. Scraping videos from YouTube can be a legal gray area, but scraping videos from Netflix is a blatant violation of some serious laws.

All in all, according to the documents, Nvidia was able to scrape about 80 years of video data each day! Apparently, enough is never enough for Nvidia. The company also targeted academic material not meant for the public, but for research.

Executives didn’t have any issues

Obviously, there are some ethical and legal issues surrounding this practice. While YouTube videos are technically publicly available, and the mass majority of them aren’t copyrighted, YouTube claims that scraping videos goes against its terms of service.

When it comes to Netflix, most of the videos taken are the sole legal property of very big and very litigious companies, so that’s just asking for some major lawsuits.

However, according to some of the Slack messages, executives at Nvidia were confident that the company was doing no wrong. They said that they are “in full compliance with the letter and the spirit of copyright law.”

But, are they, though? Now that this news is out, it’s only a matter of time before we see just now Netflix and YouTube will take it. Depending on how much data was taken, Nvidia might have to deal with some major Hollywood companies along with Netflix itself.