• schizo@forum.uncomfortable.business
    link
    fedilink
    English
    arrow-up
    7
    arrow-down
    1
    ·
    5 days ago

    It’s raid rebuild times.

    The bigger the drive, the longer the time.

    The longer the time, the more likely the rebuild will fail.

    That said, modern raid is much more robust against this kind of fault, but still: if you have one parity drive, one dead drive, and a raid rebuild, if you lose another drive you’re fucked.

    • notfromhere@lemmy.ml
      link
      fedilink
      English
      arrow-up
      1
      ·
      5 days ago

      Just rebuilt onto Ceph and it’s a game changer. Drive fails? Who cares, replace it with a bigger drive and go about your day. If total drive count is large enough, and depends if using EC or replication, it could mean pulling data from tons of drives instead of a handful.

      • GamingChairModel@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        5 days ago

        It’s still the same issue, RAID or Ceph. If a physical drive can only write 100 MB/s, a 36TB drive will take 360,000 seconds (6000 minutes or 100 hours) to write. During the 100-hour window, you’ll be down a drive, and be vulnerable to a second failure. Both RAID and Ceph can be configured for more redundancy at the cost of less storage capacity, but even Ceph fails (down to read only mode, or data loss) if too many physical drives fail.

        • notfromhere@lemmy.ml
          link
          fedilink
          English
          arrow-up
          1
          ·
          5 days ago

          While true, it can fill the drive replacement with data spread from way more number of drives than raid can, so the point I was trying to make is that a second failure due to resilvering cam be greatly mitigated by using a Ceph setup.