Summary

Court records in an ongoing lawsuit reveal that Meta staff allegedly downloaded 81.7TB of pirated books from shadow libraries like Z-Library and LibGen to train its AI models.

Internal messages show employees raising ethical concerns, with one saying, “Torrenting from a corporate laptop doesn’t feel right.”

Meta reportedly took steps to hide the activity.

The case is part of a broader debate on AI data sourcing, with similar lawsuits against OpenAI and Nvidia.

  • NotMyOldRedditName@lemmy.world
    link
    fedilink
    arrow-up
    13
    ·
    edit-2
    14 hours ago

    So… if we say every ebook is 10mb (that’s well into the high end, only a few are that big)

    That’s 8,589,934 10mb books.

    AI says the average public library in the USA has 116,481 items (but that includes all media formats), but if we go with that, then 82 TB is about 73.74 average sized libraries with no repeating content.

    • aramova@infosec.pub
      link
      fedilink
      arrow-up
      5
      ·
      14 hours ago

      NYPL has around 10 million books and an additional 10 million manuscripts in its collection. Over 54 million total articles for lending.

      Not the largest by far, but still mind boggling in size.

      To torrent and ingest something of that size is crazy.

      • NotMyOldRedditName@lemmy.world
        link
        fedilink
        arrow-up
        3
        ·
        edit-2
        14 hours ago

        Damn, that’s huge.

        Never seen a library that big before. The university here has about 1.5 million and that’s a big library.