How to archive a website in a future-proof way (involves PDF hybrid)

evenwicht@lemmy.sdf.org · edit-2 2 days ago

How to archive a website in a future-proof way (involves PDF hybrid)

m-p{3}@lemmy.ca · 3 days ago

I often don’t really care about the actual webpage format, compared to the actual content, and my strategy has been to convert and archive webpages to Markdown. At least I get to keep most of the text formatting, tables, etc in an easily readable format, the only annoyance is that images have to be stored in a file tree.

evenwicht@lemmy.sdf.org · edit-2 3 days ago

The other thing is, what about JavaScript? JS changes the presentation.

Markdown is probably ideal when saving an article, like a news story. It might even be quite useful to get it into a Gemini-compatible language. But what if you are saving the receipt for a purchase? A tax auditor would suspect shenanigans. So the idea with archival is generally to closely (faithfully) preserve the doc.

m-p{3}@lemmy.ca · edit-2 3 days ago

Yeah in that case it would be better to preserve as close as possible the original.

In my case, most of the stuff I archive are articles, tutorials, documentation and stuff that doesn’t change often so markdown fits that bill relatively well, and can be read in plain-text quite easily which is great for future-proofing readability.

How to archive a website in a future-proof way (involves PDF hybrid)

How to archive a website in a future-proof way (involves PDF hybrid)

MAFF (a shit-show, unsustained)

MHTML (shit-show due to non-portable browser-dependency)

PDF (lossy)

PDF+MHTML hybrid

We need to evolve

(update) The goals