• FauxLiving@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    2
    ·
    1 day ago

    There are thousands of different diffusion models, not all of them are trained on copyright protected work.

    In addition, substantially transformative works are allowed to use content that is otherwise copy protected under the fair use doctrine.

    It’s hard to argue that a model, a file containing the trained weight matrices, is in any way substantially similar to any existing copyrighted work. TL;DR: There are no pictures of Mickey Mouse in a GGUF file.

    Fair use has already been upheld in the courts concerning machine learning models trained using books.

    For instance, under the precedent established in Authors Guild v. HathiTrust and upheld in Authors Guild v. Google, the US Court of Appeals for the Second Circuit held that mass digitization of a large volume of in-copyright books in order to distill and reveal new information about the books was a fair use.

    And, perhaps more pragmatically, the genie is already out of the bottle. The software and weights are already available and you can train and fine-tune your own models on consumer graphics cards. No court ruling or regulation will restrain every country on the globe and every country is rapidly researching and producing generative models.

    The battle is already over, the ship has sailed.

    • MHS
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      2
      ·
      edit-2
      11 hours ago

      Exactly!!
      Thank God, you get it.

      This video (which was trending a while ago) explained it pretty well:
      https://www.youtube.com/watch?v=pt7GtDMTd3k

      And to add to what you said, people have some huge misunderstandings about how Gen AI work. They think it somehow just copy pastes portions of the art it was trained on, and that’s it. That’s not the case AT ALL, it’s not even close to that.

      AI models should be allowed to be trained on copy righted data. If they shouldn’t be allowed to do that, then humans shouldn’t be allowed to do it either. Why do we give such advice to upcoming writers and musicians and artists, to consume the kind of content that they want to create in the future? To read the kind of books that they want to write like? To listen to the kind of music that they want to create? To look at pieces of art that they want to create? Should humans ALSO be limited to only publuc domain content?? I really don’t think so.

      Again, Gen AI models don’t just copy paste stuff from their training set of data. They understand what makes up that piece of data. Just like a human does.

      Thankfully, reasoning models like Deepseek-R1 have started to show the average person how an AI actually reasons and thinks about things and that they don’t just spew stuff out of nowhere in the hopes that it makes some kind of sense, slapping pieces of their training data set together to write something that’s barely comprehensible. The “Think” tags in such models really helped clarify some huge misunderstandings that some people had. Although, many many people are still left who have a really messed up view of how AIs work, and they somehow speak with such confidence about these topics with no knowledge of the technical details. It drives me nuts.