brhfl.com

Binaries and hex editors

Talking about certain files as ‘binaries’ is a funny thing. All files are ultimately binary, after all, it’s just a matter of whether or not a file is encoded as text. Even in the world of text, an editor or viewer needs to know how the text is encoded, what bytes map to what characters. Is a file ASCII, UTF-8, PostScript? Once we know something is text or not text, it’s still likely to be made to the standards of a specific format, lest it be nothing but plain text. Markdown, HTML, even PDF1 are human-readable text to an extent, with rules about how their content is interpreted. A human as well as a web browser knows that a <p> starts a paragraph, and this paragraph continues until a matching </p> is found.

If we open a binary in a text editor, we’ll see a lot of familiar characters, where data happens to coincide with printable ASCII. We’ll also see a lot of gibberish, and in fact some of the characters may cause a terminal to behave erratically. Opening a binary in a hex editor makes a little more sense of it, but still leaves a lot to be answered. In one column, we’ll see a lot of hexadecimal values; in another we’ll see the same sort of gibberish we would have seen in our text editor. In some sort of status display, we’ll also generally see a few more bits of information – what byte we’re on, its hex value, its decimal value, etc. Why would we ever want to do this? Well, among other things, binary file formats have rules as well, and if we know these rules, we can inspect and navigate them much like an HTML file. Take this piece of a PNG file, as it would appear in bvi (my hex editor of choice).

00000000  89 50 4E 47 0D 0A 1A 0A 00 00 00 0D 49 48 44 52 .PNG........IHDR
00000010  00 00 02 44 00 00 01 04 08 06 00 00 00 C9 50 2B ...D..........P+
00000020  AB 00 00 00 04 73 42 49 54 08 08 08 08 7C 08 64 .....sBIT....|.d
00000030  88 00 00 00 09 70 48 59 73 00 00 0B 12 00 00 0B .....pHYs.......
00000040  12 01 D2 DD 7E FC 00 00 00 1C 74 45 58 74 53 6F ....~.....tEXtSo
"ban_ln_560_NLW.png" 14498451 bytes    00000000 10001001 \211 0x89 137 NUL