image processing · brhfl.com

JPEG Comments

bri hefele, 2018-10-08, graphics, image processing, intellectual property

A while back, floppy disk enthusiast/archivist, @foone posted about a floppy find, the Alice JPEG Image Compression Software. I suggest reading the relevant posts about the floppy, but the gist is that @foone archived and examined the disk and was left with a bunch of mysterious .CMP files which appeared to have JPEG streams but did not actually function as JPEGs. Rather, they would load but only displayed an odd little placeholder¹, identical for each file. I know a bit about JPEGs, and decided to have a hand at cracking this nut. The images that resulted were not particularly interesting – this was JPEG compression software from the early ‘90s, clearly targeted at industries that would be storing a lot of images² and not home users. The trick to the files, however, was a fun discovery.

SVGs

bri hefele, 2017-10-20, code, data visualization, graphics, html, image processing, svg

For someone rooted in graphic design and illustration, I typically hate running across visuals on the internet. Aside from being numbed by ads, the fact of the matter is that a large percentage of the graphical presentation on the web is just bandwidth-stealing window dressing with little impact on the surrounding content. Part of my plan with this blog was to avoid graphics almost entirely, and yet over the past month or so, I have littered this space with a handful of SVGs. I think, for the most part, they have added meaningful visual aids to the surrounding content, but I still don’t want to make too much of a habit of it.

I’m far more comfortable with SVGs (or, vector graphics in general) because I find it easier to have them settle onto the page naturally without becoming jarring. I could obviously restrict the palette of a raster image to the palette of my site, and render a high resolution PNG with manageable file size, but scaling will still come into play, type may be mismatched… aside from being accessibility issues, these things have subtle effects on visual flow. I’m thankful that SVG has been adopted as well as it has, and that it’s relatively simple to write or manipulate by hand. Following is the process I go through to make my graphics as seamless as possible.

Generally speaking, the first step is going to be to get my graphic into Illustrator. Inside Illustrator, I have a palette corresponding to my site’s colors. Making CSS classes for primary, secondary, tertiary colors is in my to-do list, but I need to ensure nothing will break with a class defining both color and fill. Groups and layers (mostly) carry over when Illustrator renders out an SVG, so I make a point of going through the layer tree to organize content. Appearances applied to groups cascade down in the output process, so (as far as SVG output is concerned) there’s no point in, say, applying a fill to a group – each individual item will get that fill in the end anyway. I use Gentium for all of the type, as that is ideally how it will be rendered in the end, though it’s worth quickly checking how it all looks in Times New Roman as well.

Once I get things colored and grouped as I need them, I crop the artboard to the artwork boundaries. This directly affects the SVG viewbox, and unless I need extra whitespace for the sake of visually centering a graphic, I can rely instead on padding or the like for spacing.

Once in the SVG Save dialog, I ensure that ‘Type’ is set to ‘SVG’. I don’t want anything converted to an outline, because I want the type to visually fall back with the rest of my page. I never actually save an SVG file from Illustrator, I just go to ‘SVG Code…’ from the Save dialog, and copypaste it elsewhere for further massaging. This involves:

Setting width="100%" and height="15em"¹, or somewhere near that as far as height is concerned. This keeps the images centered and prevents unsightly scaling issues on mobile.
Removing all references to font-family, which ensures that my page’s font cascades down. Generally this means that it’ll render in Gentium as when I was designing it, but if the page falls back for whatever reason, the graphic will too.
Ensuring unique IDs are used. If I didn’t name layers, etc. in Illustrator, I could wind up with several objects on a page with id="Layer1", which would obviously violate HTML spec.
Getting rid of any empty groups. Sometimes Illustrator just throws in a bunch of <g></g> at the end for no discernible reason.
Grouping similar objects, as best as possible, applying shared attributes to the group instead of individual objects
Deleting empty <rect>s, which Illustrator creates around text boxes. Presumably this is because, in Illustrator, text boxes themselves can have appearances applied to them like any other object. It would be nice if it was smart enough to only carry this over if there was some sort of appearance, though.
Adding a <title> if a description is necessary, or aria-hidden="true" to the SVG if the image can be ignored by assistive technology.
Likewise, if the graphic is being ‘read’, adding aria-hidden="true" to (likely) all of the text elements within. In my diagrams, assistive tech users certainly don’t need to hear a bunch of random numbers without context, especially when I’ve provided a <title>.

Illustrator seemingly outputs SVG with the intent being structural accuracy if the file is read back in for editing, which is often counterproductive for web use, which would prioritize small filesize without a sacrifice in selection ordering or visual accuracy. To be fair, I just installed 2018 and haven’t tested its SVG waters yet, so we’ll see how Adobe managed to ~~mess that up~~ handle that.

Finally, it’s worth mentioning SVGO (and the web-based SVGOMG). Very customizable optimization, definitely more useful once one starts dealing with more intricate, complicated SVGs. I’m happy to optimize mine down by hand, and stop there – but I’m keeping them to a handful of kilobytes anyway.

Fireworks, and its bloated PNGs

bri hefele, 2017-04-11, image processing

The motive behind my last post on binary editors was a rather peculiar PNG I was asked to post as part of my job. It was a banner, 580x260px, and it was 14MB. Now this should have set off alarms from higher up the web chain: even with the unnecessary alpha channel, 580(px)×260(px)×(8(bits)×4(R,G,B,A)) is only 460KB or so. A very basic knowledge of how information is stored is always helpful – complicated file sizes are largely because of compression or encryption, neither of which applies here.

So what happened? Adobe Fireworks, which is completely unsurprising. Fireworks was a Macromedia project, and while Macromedia obviously shaped a large chunk of the web in their heyday and also into the Adobe years, Macromedia projects were shit. The very definition of hack. I’m certain Adobe learned all of their terrible nonstandard UI habits from their Macromedia acquisition. I never thought Fireworks was terrible, but nor did I find it impressive. It was often used for wireframing websites, which feels wrong to me in every single way. But, to get ahead of myself, it had one other miserable trick: saving layers and other extended data in PNG files. Theoretically, this is great: layer support in an easily-read compressed lossless free image format. Awesome! But in Adobe’s reality, it’s terrible: not even any current Adobe software can recover these layers.

As mentioned in my previous post, PNGs are pretty easy to parse: data comes in chunks: the first 4 bytes state the chunk length, then 4 bytes of (ASCII) chunk type descriptor, then the chunk data, then a 4 byte CRC checksum. Some chunks are necessary: IHDR is the header that states the file’s pixel dimensions, color depth, color type, pixel ratio, etc; IDATs contain the actual image data. Other chunks are described by the format but not necessary. Finally, there are unreserved chunks that anyone can use, and that this or that reader can theoretically read. The chunk type is 4 ASCII bytes, and is both a (potentially) clever descriptor of the chunk, and 4 bits worth of information – each character’s case means something.

So my image should have had a few things: the PNG magic number, 25 bytes worth of IHDR chunk explaining itself, ~460KB worth of IDAT chunk, and then an IEND chunk to seal the deal. Those were definitely present in my terrible file. Additionally, there were a handful of proprietary chunks including several hundred mkBT chunks. I don’t know much about these chunks aside from the fact that they start with a FACECAFE magic number and then 72 bytes of… something… And I also know there are a lot of them. Some cursory googling shows that nobody else really knows what to make of them either, so I’m not sure I’m going to put more effort into it. Suffice it to say: Fireworks, by default, saves layers in PNG files, and this made a ~460KB file 14MB.

So why do the files even work? Well, remember I mentioned that case in a chunk descriptor is important – it provides 4 bits of information. Note the difference between the utterly important IDAT and the utterly bullshit mkBT. From left to right, lower vs. uppercase means: ancillary/critical; private/public; (reserved for future use, these should all be uppercase for now); safe/unsafe to copy. The important thing to glean here is that mkBT is ancillary — not critical. We do not need it to render an image.

So, when we load our 14MB PNG in a web browser, the browser sees the IHDR, the IDATs, and renders an image. It ignores all the garbage it can’t understand. This is perfectly valid PNG, because all of those extra chunks are ancillary, the browser can ignore them. PNG requires a valid IDAT, so Fireworks must put the flat image there. So, it works, but we’re still stuck with a humongous PNG. Most image editors will discard all of this stuff because it’s self-described as unsafe-to-copy (meaning any editing of the file may render this chunk useless). But for reference, pngcrush will eliminate these ancillary chunks by default, and optipng will with the -strip all flag.

Takeaways? Know enough about raw data to see that your files are unreasonably large, I suppose. Or automatically assume that a 14MB file on your homepage is unreasonably large, regardless. Maybe that takeaway is just ‘perform a cursory glance at your filesizes’. Maybe it’s flatten your images in Fireworks before exporting them to PNG. Maybe instead of just performing lazy exports, web folks should be putting the time in to optimizing the crap out of any assets that are being widely downloaded. Maybe I’m off track now, so my final thought is just — if it looks wrong, save your audience some frustration and attempt to figure out why.