brhfl.com

Separating cd and pushd

While much of this post applies to bash, I am a zsh user and this was written from that standpoint.

One piece of advice that I’ve seen a lot in discussions on really tricking out one’s UNIX (&c.) shell is either setting an alias from cd to pushd or turning on a shell option that accomplishes this1. Sometimes the plan includes other aliases or functions to move around on the directory stack, and the general sell is that now you have something akin to back/forward buttons in a web browser. This all seems to be based on the false premise that pushd is better than cd, when the reality is that they simply serve different purposes. I think that taking cd out of the picture and throwing everything onto the directory stack greatly reduces the stack’s usefulness. So this strategy simultaneously restricts the user to one paradigm and then makes that paradigm worse.

It’s worth starting from the beginning here. cd changes directories and that’s about it. You start here, tell it to go there, now you’re there. pushd does the same thing, but instead of just launching the previous directory into the ether, it pushes it onto a last in, first out directory stack. pushd is helped by two other commands – popd to pop a directory from the stack, and dirs to view the stack.

% mkdir foo bar baz
% for i in ./*; pushd $i && pushd
% dirs -v
0       ~/test
1       ~/test/foo
2       ~/test/baz
3       ~/test/bar

dirs is a useful enough command, but its -v option makes its output considerably better. The leading number is where a given entry is on the stack, this is globbed with a tilde. ~0 is always the present working directory ($PWD). You’ll see in my little snippet above that in addition to pushding my way into the three directories, I also call pushd on its own, with no arguments. This basically just instructs pushd to flip between ~0 and ~1:

% pushd; dirs -v
0       ~/test/foo
1       ~/test
2       ~/test/baz
3       ~/test/bar

This is very handy when working between two directories, and one reason why I think having a deliberate and curated directory stack is far more useful than every directory you’ve ever cded into. The other big reason is the tilde glob:

% touch ~3/xyzzy
% find .. -name xyzzy
../bar/xyzzy

So the directory stack allows you to do two important things: easily jump between predetermined directories, and easily access predetermined directories. This feels much more like a bookmark situation than a history situation. And while zsh (and presumably bash) has other tricks up its sleeves that let users make easy use of predetermined directories, the directory stack does this very well in a temporary, ad hoc fashion. cd actually gives us one level of history as well, via the variable $OLDPWD, which is set whenever $PWD changes. One can do cd - to jump back to $OLDPWD.

zsh has one more trick up its sleeve when it comes to the directory stack. Using the tilde notation, we can easily change into directories from our stack. But since this is basically just a glob, the shell just evaluates it and says ‘okay, we’re going here now’:

% pushd ~1; dirs -v
0       ~/test
1       ~/test/foo
2       ~/test
3       ~/test/baz
4       ~/test/bar

Doing this can create a lot of redundant entries on the stack, and then we start to get back to the cluttered stack problem that started this whole thing. But the cd and pushd builtins in zsh know another special sort of notation, plus and minus. Plus counts up from zero (and therefore lines up with the numbers used in tilde notation and displayed using dirs -v), whereas minus counts backward from the bottom of the stack. Using this notation with either cd or pushd (it is a feature of these builtins and not a true glob) essentially pops the selected item off of the stack before evaluating it.

% cd +3; dirs -v
0       ~/test/baz
1       ~/test/foo
2       ~/test
3       ~/test/bar
% pushd -0; dirs -v
0       ~/test/bar
1       ~/test/baz
2       ~/test/foo
3       ~/test

…and this pretty much brings the stack concept full circle, and hopefully hits home why it makes far more sense to curate this stack versus automatically populating it whenever you change directories.


  1. In zsh, the option is AUTOPUSHD. ↩︎

Tagging in Acrobat from the keyboard

Since much of my work revolves around §508 compliance, I spend a lot of time restructuring tags in Acrobat. Unfortunately you can’t just handwrite out these tags à la HTML, you have to physically manipulate a tree structure. The Tags panel is very conducive to mouse use, and because Adobe is Adobe, not very conducive to keyboard use. Many important tasks are missing readily available keyboard shortcuts, and it has taken me a while to be able to largely ditch the mouse1 and instead use the keyboard to very quickly restructure the tags on very long, very poorly tagged documents.

A couple of notes – this assumes a Windows machine, and one with a Menu key2. While I generally prefer working on MacOS, I’m stuck with Windows at work, so these are my efficiencies. Windows may actually have the leg up here, since the Acrobat keyboard support is so poor, and MacOS does not have a Menu key equivalent. Additionally, this applies to Acrobat XI, it may or may not apply to current DC versions. Finally, all of this information is discoverable, but I haven’t really seen a primer laid out on it. If nothing else perhaps it will help a future version of myself who forgets all of this.


  1. While I do use a mouse for some things, if my pointing device is supplementing a keyboard-driven workflow and not the other way around, I prefer the trackpoint by far. I wish they were more commonplace (particularly on desktop keyboards) – it really is much more efficient when your fingers never have to leave the keyboard. ↩︎
  2. Laptop keyboards increasingly seem to be omitting the Menu key. Generally speaking, I’ll remap right Ctrl to be Menu if this is the case. ↩︎

Extracting JPEGs from PDFs

I’m not really making a series of ‘things your hex editor is good for’, I swear, but one more use case that comes up frequently enough in my day-to-day life is extracting JPEGs from PDF files. This can be scripted simply enough, but I find doing these things manually from time to time to be a valuable learning experience.

PDF is a heck of a file format, but we really only need to know a few things right now. PDFs are made up of objects, and some of these objects (JPEGs included) are stream objects. Stream objects always have some relevant data stored in a thing called a dictionary, and this includes two bits of data we need to get our JPEG: the Filter tells the viewer how to interpret the stream, and the Length tells us how long, in bytes, the data is. The filter for JPEGs is ‘DCTDecode’, so we can open up a PDF in a hex editor (I’ll be using bvi again) and search for this string to find a JPEG. Before we do, one final thing we should know is that streams begin immediately after an End Of Line (EOL) marker following the word ‘stream’. EOL in a PDF should always be two bytes – 0D 0A or CR LF.

/DCTDecodeEnter

00002E80  6C 74 65 72 2F 44 43 54 44 65 63 6F 64 65 2F 48 lter/DCTDecode/H
00002E90  65 69 67 68 74 20 31 31 39 2F 4C 65 6E 67 74 68 eight 119/Length
00002EA0  20 35 35 33 33 2F 4E 61 6D 65 2F 58 2F 53 75 62  5533/Name/X/Sub
00002EB0  74 79 70 65 2F 49 6D 61 67 65 2F 54 79 70 65 2F type/Image/Type/
00002EC0  58 4F 62 6A 65 63 74 2F 57 69 64 74 68 20 31 32 XObject/Width 12
00002ED0  31 3E 3E 73 74 72 65 61 6D 0D 0A FF D8 FF EE 00 1>>stream.......
/DCTDecode                                     00002E85  \104 0x44  68 'D'

This finds the next ‘DCTDecode’ stream object and puts us on that leading ’D’, byte offset 2E85 (decimal 11909) in this instance. Glancing ahead a bit, we can see that the Length is 5533 bytes. If we then search for ‘stream’, (/streamEnter), we’ll be placed at byte offset 2ED3 (decimal 11987). The word ‘stream’ is 6 bytes, and we need to add an additional 2 bytes for the EOL. This means our JPEG data starts at byte offset 11995 and is 5533 bytes long.

How, then, to extract this data? It may not be everyone’s favorite tool, but dd fits the bill perfectly. It allows us to input a file, start at a byte offset, go to a byte offset, and output the resulting chunk of file – just what we want. Assuming our file is ‘test.pdf,’ we can output ‘test.jpg’ like…

dd bs=1 skip=11995 count=5533 if=test.pdf of=test.jpg

bs=1 sets our block size to 1 byte (which is important, dd is largely used for volume-level operations where blocks are larger). skip skips ahead however many bytes, essentially the initial offset. count tells it how many bytes to read. if and of are input and output files respectively. dd doesn’t follow normal Unix flag conventions, there are no prefixing dashes and those equal signs are quite atypical, and dd is quite powerful, so it’s always worth reading the manpage.


Fireworks, and its bloated PNGs

The motive behind my last post on binary editors was a rather peculiar PNG I was asked to post as part of my job. It was a banner, 580x260px, and it was 14MB. Now this should have set off alarms from higher up the web chain: even with the unnecessary alpha channel, 580(px)×260(px)×(8(bits)×4(R,G,B,A)) is only 460KB or so. A very basic knowledge of how information is stored is always helpful – complicated file sizes are largely because of compression or encryption, neither of which applies here.

So what happened? Adobe Fireworks, which is completely unsurprising. Fireworks was a Macromedia project, and while Macromedia obviously shaped a large chunk of the web in their heyday and also into the Adobe years, Macromedia projects were shit. The very definition of hack. I’m certain Adobe learned all of their terrible nonstandard UI habits from their Macromedia acquisition. I never thought Fireworks was terrible, but nor did I find it impressive. It was often used for wireframing websites, which feels wrong to me in every single way. But, to get ahead of myself, it had one other miserable trick: saving layers and other extended data in PNG files. Theoretically, this is great: layer support in an easily-read compressed lossless free image format. Awesome! But in Adobe’s reality, it’s terrible: not even any current Adobe software can recover these layers.

As mentioned in my previous post, PNGs are pretty easy to parse: data comes in chunks: the first 4 bytes state the chunk length, then 4 bytes of (ASCII) chunk type descriptor, then the chunk data, then a 4 byte CRC checksum. Some chunks are necessary: IHDR is the header that states the file’s pixel dimensions, color depth, color type, pixel ratio, etc; IDATs contain the actual image data. Other chunks are described by the format but not necessary. Finally, there are unreserved chunks that anyone can use, and that this or that reader can theoretically read. The chunk type is 4 ASCII bytes, and is both a (potentially) clever descriptor of the chunk, and 4 bits worth of information – each character’s case means something.

So my image should have had a few things: the PNG magic number, 25 bytes worth of IHDR chunk explaining itself, ~460KB worth of IDAT chunk, and then an IEND chunk to seal the deal. Those were definitely present in my terrible file. Additionally, there were a handful of proprietary chunks including several hundred mkBT chunks. I don’t know much about these chunks aside from the fact that they start with a FACECAFE magic number and then 72 bytes of… something… And I also know there are a lot of them. Some cursory googling shows that nobody else really knows what to make of them either, so I’m not sure I’m going to put more effort into it. Suffice it to say: Fireworks, by default, saves layers in PNG files, and this made a ~460KB file 14MB.

So why do the files even work? Well, remember I mentioned that case in a chunk descriptor is important – it provides 4 bits of information. Note the difference between the utterly important IDAT and the utterly bullshit mkBT. From left to right, lower vs. uppercase means: ancillary/critical; private/public; (reserved for future use, these should all be uppercase for now); safe/unsafe to copy. The important thing to glean here is that mkBT is ancillary — not critical. We do not need it to render an image.

So, when we load our 14MB PNG in a web browser, the browser sees the IHDR, the IDATs, and renders an image. It ignores all the garbage it can’t understand. This is perfectly valid PNG, because all of those extra chunks are ancillary, the browser can ignore them. PNG requires a valid IDAT, so Fireworks must put the flat image there. So, it works, but we’re still stuck with a humongous PNG. Most image editors will discard all of this stuff because it’s self-described as unsafe-to-copy (meaning any editing of the file may render this chunk useless). But for reference, pngcrush will eliminate these ancillary chunks by default, and optipng will with the -strip all flag.

Takeaways? Know enough about raw data to see that your files are unreasonably large, I suppose. Or automatically assume that a 14MB file on your homepage is unreasonably large, regardless. Maybe that takeaway is just ‘perform a cursory glance at your filesizes’. Maybe it’s flatten your images in Fireworks before exporting them to PNG. Maybe instead of just performing lazy exports, web folks should be putting the time in to optimizing the crap out of any assets that are being widely downloaded. Maybe I’m off track now, so my final thought is just — if it looks wrong, save your audience some frustration and attempt to figure out why.


Binaries and hex editors

Talking about certain files as ‘binaries’ is a funny thing. All files are ultimately binary, after all, it’s just a matter of whether or not a file is encoded as text. Even in the world of text, an editor or viewer needs to know how the text is encoded, what bytes map to what characters. Is a file ASCII, UTF-8, PostScript? Once we know something is text or not text, it’s still likely to be made to the standards of a specific format, lest it be nothing but plain text. Markdown, HTML, even PDF1 are human-readable text to an extent, with rules about how their content is interpreted. A human as well as a web browser knows that a <p> starts a paragraph, and this paragraph continues until a matching </p> is found.

If we open a binary in a text editor, we’ll see a lot of familiar characters, where data happens to coincide with printable ASCII. We’ll also see a lot of gibberish, and in fact some of the characters may cause a terminal to behave erratically. Opening a binary in a hex editor makes a little more sense of it, but still leaves a lot to be answered. In one column, we’ll see a lot of hexadecimal values; in another we’ll see the same sort of gibberish we would have seen in our text editor. In some sort of status display, we’ll also generally see a few more bits of information – what byte we’re on, its hex value, its decimal value, etc. Why would we ever want to do this? Well, among other things, binary file formats have rules as well, and if we know these rules, we can inspect and navigate them much like an HTML file. Take this piece of a PNG file, as it would appear in bvi (my hex editor of choice).

00000000  89 50 4E 47 0D 0A 1A 0A 00 00 00 0D 49 48 44 52 .PNG........IHDR
00000010  00 00 02 44 00 00 01 04 08 06 00 00 00 C9 50 2B ...D..........P+
00000020  AB 00 00 00 04 73 42 49 54 08 08 08 08 7C 08 64 .....sBIT....|.d
00000030  88 00 00 00 09 70 48 59 73 00 00 0B 12 00 00 0B .....pHYs.......
00000040  12 01 D2 DD 7E FC 00 00 00 1C 74 45 58 74 53 6F ....~.....tEXtSo
"ban_ln_560_NLW.png" 14498451 bytes    00000000 10001001 \211 0x89 137 NUL

  1. PDF is an extreme example, but technically a person could very well sit down and write a PDF file from scratch in a text editor, and while the format’s standard is thousands of pages long and nearly impossible to grok, a human could theoretically read it as well. The (or at least, one) issue is that much of the underlying content will be compressed and therefore once again a binary. ↩︎