Replies: 6 comments
-
Thanks for the questions, it gives me a chance to document how the format works briefly. The files generated by SingleFileZ are ZIP files. In fact, the ZIP specification allows arbitrary data to be inserted before and after the payload. In the case of SingleFileZ, this feature is used to disguise the ZIP file as an HTML file. The resulting HTML page is invalid because it contains binary data, but the HTML specification allows for this case. This file contains also a script for unzipping the ZIP payload when displaying the page in a browser. Paths like Since the saved page is a ZIP file. I think this is quite future proof safe for the coming decades. Backward compatibility of HTML/JS also ensures that the script in the saved page should also work for a long time. |
Beta Was this translation helpful? Give feedback.
-
Thank you for the explanation. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the write up. I also found myself digging through the recent commits like "merge sfz code" to see how you managed the universal format. Really impressive work arounds! I'll have to keep reading on ISO-8859-1 encoding and why some characters like 0 and cr+lf are lost. Glad it shipped. PS. This code made me smile :) async function base64ToUint32Array(data) {
return new Uint32Array(await (await fetch("data:application/octet-stream;base64," + data)).arrayBuffer());
} |
Beta Was this translation helpful? Give feedback.
-
@heyheyhello You're welcome. I'm glad my code made you smile :) For the record, here is the test page I used to do my tests to find the best encoding: https://jsfiddle.net/7qv3y20z/. It shows how the binary content is altered when read from the DOM depending on the encoding of the page. Here is below the code of the test. <!DOCTYPE html>
<html>
<head>
<title>Test binary content</title>
<style>
body {
font-family: monospace;
}
</style>
</head>
<body>
<iframe hidden></iframe>
</body>
</html> const ENCODINGS = [
"utf-8",
"ibm866",
"iso-8859-2",
"iso-8859-3",
"iso-8859-4",
"iso-8859-5",
"iso-8859-6",
"iso-8859-7",
"iso-8859-8",
"iso-8859-8i",
"iso-8859-10",
"iso-8859-13",
"iso-8859-14",
"iso-8859-15",
"iso-8859-16",
"koi8-r",
"koi8-u",
"macintosh",
"windows-874",
"windows-1250",
"windows-1251",
"windows-1252",
"windows-1253",
"windows-1254",
"windows-1255",
"windows-1256",
"windows-1257",
"windows-1258",
"x-mac-cyrillic",
"gbk",
"gb18030",
"big5",
"euc-jp",
"iso-2022-jp",
"shift-jis",
"euc-kr",
"utf-16be",
"utf-16le",
"x-user-defined"
];
let encodingIndex = 0;
onmessage = ({
data
}) => {
document.querySelector("iframe").src = "about:blank";
const difference = [];
data.forEach((value, index) => {
if (value != index) {
difference.push({
expected: index,
read: value
});
}
});
document.body.innerHTML +=
"<details><summary>" + ENCODINGS[encodingIndex] + " (" +
difference.length + " differences)</summary>" +
JSON.stringify(difference) + "<br><br></details>";
encodingIndex++;
if (encodingIndex < ENCODINGS.length) {
runNext();
}
};
runNext();
function runNext() {
const blob = new Blob([
"<!DOCTYPE html> <html><head><meta charset=\"",
ENCODINGS[encodingIndex],
"\"></head><body><!--",
new Uint8Array((new Array(256).fill(0).map((value, index) => index))),
"--><script>(",
() => {
const commentData = Array.from(document.body.firstChild.textContent).map((value) => value.charCodeAt(0));
parent.postMessage(commentData, "*");
},
")()<\/script></body></html>"
], {
type: "text/html"
});
document.querySelector("iframe").src = URL.createObjectURL(blob);
} |
Beta Was this translation helpful? Give feedback.
-
For the record, I've updated the FAQ to explain how the format works, see https://github.com/gildas-lormeau/SingleFile/blob/master/faq.md#how-does-the-self-extracting-zip-format-work |
Beta Was this translation helpful? Give feedback.
-
Mainly because I can, I've started to play with the support of PNG files. Here's an example of page which is also a ZIP file and a PNG file: https://gildas-lormeau.github.io/ |
Beta Was this translation helpful? Give feedback.
-
I couldn't find any document about the logic of the feature added recently "Self-Extracting ZIP Files Added to SingleFile (version 1.22)". And it feels a little intimidating to switch to the new SingleFileZ format considering Mozilla dumped MAFF. So I'd like to know what exactly is this new format. How it works and what's the future.
Binary Data in HTML?
I don't know that you can actually embed binary data in HTML before SingleFileZ. It's really impressive. How did you do this?! Why use
<xmp>
, isn't it deprecated? What is that base64 data in<sfz-extra-data>
? And why use ISO-8859-1 encoding?Redirect HTTP Request??
Here are some pieces of saved HTML (extracted using 7z):
<link rel=stylesheet href="stylesheet_0.css">
<img src=images/1.svg>
How does it redirect these urls? I never know that these urls can be intercepted by javascript. I tried to read the source code but I'm too stupid to understand. Could you please explain the method? Is it future proof(like open a saved file decades later using the future system and software)?
Beta Was this translation helpful? Give feedback.
All reactions