Meta staff torrented nearly 82TB of pirated books for AI training — court records reveal copyright violations

MicroWave@lemmy.world · 23 hours ago

Meta staff torrented nearly 82TB of pirated books for AI training — court records reveal copyright violations

NotMyOldRedditName@lemmy.world · edit-2 14 hours ago

So… if we say every ebook is 10mb (that’s well into the high end, only a few are that big)

That’s 8,589,934 10mb books.

AI says the average public library in the USA has 116,481 items (but that includes all media formats), but if we go with that, then 82 TB is about 73.74 average sized libraries with no repeating content.

aramova@infosec.pub · 14 hours ago

NYPL has around 10 million books and an additional 10 million manuscripts in its collection. Over 54 million total articles for lending.

Not the largest by far, but still mind boggling in size.

To torrent and ingest something of that size is crazy.

NotMyOldRedditName@lemmy.world · edit-2 14 hours ago

Damn, that’s huge.

Never seen a library that big before. The university here has about 1.5 million and that’s a big library.