AI training data being stolen is now boringly predictable: The Week in Review
"It's like poetry... It rhymes"
Last week we kicked off our weekly round up with another article from the ongoing move fast and take things saga exposing the theft of training data from YouTube. This week we’re kicking off our weekly round up with another article exposing the theft of training data from YouTube. This is becoming boringly predictable, and even worse, it’s repetitive.
The latest installment is a report from 404 Media on how Runway, yet another AI video platform with yet another hefty valuation, has seemingly been following the recent trend of sourcing its training data by scraping YouTube videos without permission. What’s particularly interesting is that the report provides a link to the most damning of material evidence: a spreadsheet. Per the article:
Part of its training data is popular content from the YouTube channels of thousands of media and entertainment companies, including The New Yorker, VICE News, Pixar, Disney, Netflix, Sony, and many others.
We’ll also add that Nintendo, a notoriously litigious company when it comes to anyone fucking with their IP, had at least 10 of its official channels scraped (rows 3491-3500). The House of Mario is NOT going to like that. And it’s not just the big players who have had their data stolen, as the article notes:
A spreadsheet titled ‘Cinematic Masterpieces’ contains 206 links to individual channels and videos of especially high-quality, including animated shorts and student films.
As we pointed out last week, we’re going to call this for what it is: stolen data scraped in violation of YouTube’s terms and conditions. YouTube itself has said as much.
What makes this latest example of flagrant theft particularly interesting is the context provided by two other articles published this week. On Thursday, The Information published a new report on OpenAI where it claims the company is on course to lose upwards of $5 Billion this year on the costs of providing the infrastructure required to run ChatGPT (and no doubt its new search tool that launched yesterday). As the report suggests, this will most likely mean that the company will need to raise more money this year.
If that’s the case, OpenAI (and similar companies, for that matter) may find raising that money increasingly difficult. The AI arms race necessitates gathering more and more data, which is becoming more expensive as the sources of training data increasingly dry up. At the same time, according to 404 Media, there is unsurprisingly a growing backlash against AI companies scraping data at all, with an increasing number of websites restricting scraper bots and more well-resourced media companies seeking legal consequences. There is an undeniable and growing feeling among individual creators and media companies that they’re no longer going to let AI companies pillage their work without compensation and will seek legal remedies if these companies continue to flagrantly steal their content.
This “worm turning” situation of creators just saying no, combined with AI’s growing infrastructure costs and lack of any significant revenue in the foreseeable future, leads us to believe that eventually, after this epic night on the town spending investors money like drunken sailors, AI companies are going to be faced with a raging hangover and an exorbitant bar bill to pay. The numbers don’t add up—which is just as well as math hasn’t been AI’s strong suit.
In other words, maybe we need worry less about the moral dilemma of AI stealing our work and simply wait for reality to bite. Eventually this could all come down to boring economics when, to paraphrase the words from Top Gun, Sam Altman et al are held accountable by the markets for letting their “ego write checks their companies can't cash.”
—James & Ross
What We’ve Read
The moral bankruptcy of Marc Andreessen and Ben Horowitz (The Verge) “This isn’t a movement. It’s a clique.” We here at MBH4H are huge Liz Lopatto stans. Our former colleague is an expert proponent of calling out bullshit and eviscerating hypocrisy with an incredibly well written turn of phrase, which makes her delicious writing about Marc Andreessen and Ben Horowitz’s self-serving and contemptible support of the Trump Vance ticket an absolute joy to read. —James
Meta's New Llama 3.1 AI Model Is Free, Powerful, and Risky (WIRED) By all accounts, Meta's new 405 billion-parameter Llama 3.1—free to use and available now—is designed to go toe-to-toe with the highest tier of paid products offered by OpenAI, Anthropic, and the like. From a business perspective, it's a cunning attempt to undercut every competitor who can't afford to subsidize their costs for time immemorial, and at the same time, the openness with which Llama is being developed is attractive to some of the top AI talent. But with that openness comes some concerns for the ramifications of such unfettered power in the wild. Will Knight has a great rundown of what this could mean. Meanwhile, Mark Zuckerberg continues to paradoxically humanize himself more and more in his promotion of Llama using (what else?) artificially generated images of himself and his gold chains. —Ross
AI Can Write Poetry, but It Struggles With Math (The New York Times). Chalk this up to something I never expected when I dreamt of the future: a highly advanced computer system that struggles with arithmetic. Large Language Models learn through pattern recognition, to grossly oversimplify, which means multi-step math problems can pose problems when logic isn’t built into the design. A generative model may confidently bullshit a “proof” of 1+1=2, but that doesn’t make it so. —Ross
What We’ve Watched
Sunny. This Apple TV+ show continues to be delightful, a kind of Agatha Christie-meets-Tokyo Vice-meets-WALL-E. Apart from the great script and a superb cast—particularly Rashida Jones—it’s the overall production design of the show that stands out; a near-future Kyoto full of soft pastel tech is such a refreshing take on our always-on, screen-focused world of today. It should also go without saying that I look forward to the day my own Homeassist robot arrives. I need a Sunny in my life.—James
Presumed Innocent. The finale of this new Apple TV+ series based upon the Harrison Ford movie was well made, well acted and, well… fine. Not wanting to spoil the reveal, I’ll just say that it was a little flat and the series as a whole is not one of the streamer’s best. —James
I Asked Photoshop AI to Zoom Out Infinitely. Here’s What Happened. I love seeing how someone takes any tool and pushes it to a creative extreme. Earlier this week, the YouTube algorithm decided I needed to know that Joe Scott tried to take Adobe Photoshop's AI tools and see what would happen if he kept using generative fill to "expand" a photo farther and farther past its original frame. Mild spoiler, I was really hoping Adobe would get weirder, but such is the limit of letting an AI robot direct without more human input. —Ross
Little Britain first aired as a BBC radio series between 2000-2002 and then on BBC television between 2003-2006. In many ways this show hasn’t aged well, as it is often deeply problematic—for which Matt Lucas and David Walliams have acknowledged and apologized. But despite the lack of sensitivity (and this is in no way a defense of the indefensible), it's difficult to overstate how incredibly funny and often excruciatingly uncomfortable this series was to watch. Many of Little Britain’s catch phrases such as "computer says no," (as featured in Liz Loppato’s aforementioned banger of a Verge article on Andreessen Horowitz) and Only Gay in the village became commonly used in everyday conversations, particularly in the pub after a couple of pints. —James
What We’re Going To Watch
Deadpool and Wolverine. I made it abundantly clear last week that I find Ryan Reynolds to be a marketing genius, and that the Deadpool marketing campaigns are worthy of study. (Love ‘em or hate ‘em, you’re still talking about ‘em, etc. etc.) But for me, a onetime MCU fanatic whose interest has waned since the original saga culminated with Avengers: Endgame, there’s something exciting about seeing how much they let Reynolds and the whole creative team play around within the remaining vestiges of a post-acquisition X-Men universe. If nothing else, it’s one helluva lead-in to Marvel’s big San Diego Comic-Con return to Hall H, where Kevin Feige and co. will make their case for a reinvigorated cinematic universe. It’s already making a splash with a Galactus-sized drone show—and as you know, we love drones. —Ross
Inside Out 2. Pixar’s back baby! I haven’t watched the new film yet (I am waiting for it to come out on iTunes or Disney+, whichever comes first), but as I consider the first Inside Out to be one of Pixar’s top five movies (yes, Cars is still #1—not up for discussion) I’m thrilled that the sequel has been so well received and has already become a monster hit. —James
What we’ve played
Sekiro: Back to ting, ting, ting. My playthrough of Elden Ring:Shadow of the Erdtree is done. I have beaten all the major bosses and minibosses. I have futzed with my Faith build and tried my hand at some of the new weapons. I have even taken down one of the furnace golems. But it’s time to move on, or to be more precise, move back.
This week I decided on a whim to start playthrough #3 of Sekiro: Shadows Die Twice and within minutes I was hooked once again. It is difficult to convey just how incredibly good the feel of this game is, the precision is exquisite and even after months of playing Elden Ring and the DLC, Sekiro feels as fresh as the day I first began playing it in the summer of 2019.
Of course, it goes without saying that Sekiro is still stupidly hard. I would argue it is way harder than any boss in Elden Ring or the DLC because you have no Mimic, no summons; it’s just you and your sword. But, even after getting bogged down fighting the incredibly annoying Flaming Bull miniboss, I am already back facing Genichiro Ashina after about an hour and a half of play. For context, it took me five months to get to this point the first time I played this game; this is as close to a speedrun as I am ever likely to get. —JamesSparedevil. There’s a genre of independent games where often I wonder what came first, the concept or the clever name. That isn’t a knock to Sparedevil but an acknowledgement of just how good the title is. It’s a Doom clone—not a first person shooter, in the modern sense, but something more akin to the original aesthetic and minimal control quirks of Doom—set in a bizarre bowling nightmare. You are beset on all sides by lanes that converge on your position, where demonic pins materialize in 10-pin armies and march on your position. Armed with an unlimited supply of charged-up bowling balls, you are rewarded for your actual bowling performance here, both in score and in more powerful anti-pin weaponry. The longer you survive, the more demonic the pinemies. That’s all there is to it, really, but it’s more than enough for a quick diversion in between rounds of Hades 2. Big thanks to Digital Trends’ Giovanni Colantonio for first bringing my attention to it. —Ross
Animal Well. I made a start on Animal Well shortly after it launched but was distracted by Shadow of the Erdtree. But, inspired by this bonkers speedrun, I am ready to dive back into the game—in between Sekiro bosses. Think of it as a sorbet between FromSoft courses, if you like. —James
Fallout London. A truly impressive fan project five years in the making. A “total conversion mod” (ie. they used Fallout 4 as their canvas and built an entirely new game on top of it) this “DLC-sized” project tells their own story within Fallout’s universe but set across the pond, which would be a first for the series if it were canonical.
The voice cast is astounding, featuring two former Time Lords and a former British Speaker of the House, among many others. Admittedly I wasn’t able to finish setting up the game in time for this issue — while the mod is free, the effort to get this working on my Steam Deck is nontrivial, to put it mildly — but it’s my weekend crusade to get this started. (Those who pick up a copy of Fallout 4 and the mod via GOG will have a much easier time, as there are some clever tools to help streamline setup.)
I’ve been following this project so long, and it’s a bit wild to see it’s finally available, especially after this year’s Fallout Amazon series reinvigorated my love of the franchise. Much like James and Elden Ring, I’m all but certain I’ll have more to say again later. —Ross