• redcalcium@lemmy.institute
    link
    fedilink
    arrow-up
    14
    ·
    11 months ago

    We love this example because it illustrates just how difficult it will be to fully understand LLMs. The five-member Redwood team published a 25-page paper explaining how they identified and validated these attention heads. Yet even after they did all that work, we are still far from having a comprehensive explanation for why GPT-2 decided to predict Mary as the next word.

    Current approach to ML model development has the same vibe with people writing a block of code that somehow works and then put comments like "no idea why but it works, modify at your own risk’

    • Jumper775@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      11 months ago

      Perhaps we could see even greater improvements if we stopped and looked at how this works. Eventually we will need to as there is a limit to how much real text exists.