JPEG Comments

A while back, floppy disk enthusiast/archivist, @foone posted about a floppy find, the Alice JPEG Image Compression Software. I suggest reading the relevant posts about the floppy, but the gist is that @foone archived and examined the disk and was left with a bunch of mysterious .CMP files which appeared to have JPEG streams but did not actually function as JPEGs. Rather, they would load but only displayed an odd little placeholder1, identical for each file. I know a bit about JPEGs, and decided to have a hand at cracking this nut. The images that resulted were not particularly interesting – this was JPEG compression software from the early ‘90s, clearly targeted at industries that would be storing a lot of images2 and not home users. The trick to the files, however, was a fun discovery.

The title of this post gives it away, I realize – the real images were effectively ‘commented out’. Here’s a hex dump of the relevant chunk of one of the photos:

(offset)   (hex)                                              (ascii)
00000270   F4 F5 F6 F7 F8 F9 FA FF C0 00 11 08 00 3C 00 50    .............<.P
00000280   03 01 21 00 02 11 01 03 11 01 FF DA 00 0C 03 01    ..!.............
00000290   00 02 11 03 11 00 3F 00 FF FE 00 0E 49 4D 41 47    ......?.....IMAG
000002A0   45 20 44 41 54 41 3D 3E C6 B1 3F E8 51 7F BB 52    E DATA=>..?.Q..R
000002B0   13 5E 64 BE 26 7D 65 1F E1 C7 D1 08 EA DB 73 B4    .^d.&}e.......s.

Something sure looks suspicious in that ASCII column, doesn’t it? Let’s talk briefly about JPEG files. JPEGs contain a number of different sorts of data: EXIF/metadata, the Huffman and quantization tables used to compress the image, information about the details of the image (bit depth, dimensions), and the image data itself, to name a few. All of this information is split up into chunks prefixed with a two byte code: FF followed by another byte that says what the data that follows is. At offset 277, we see FF C0. This is the start of frame, and the next seventeen bytes tell us (among other things) that it’s an 8-bit/channel color image, 80x60 pixels3. At offset 28A, we run into FF DA, which is the start of the image itself. This only runs for 11 bytes, until we hit FF FE at offset 298. Those 11 bytes are the odd little placeholder image from above, and FF FE is, as you can probably guess, a comment.

Comments aren’t that prevalent in JPEGs. JFIF, EXIF, and XMP data are all stored in application-specific data chunks (much like the layer information in Adobe Fireworks PNGs). Comments are typically used to mark what encoder produced the JPEG, and that’s about it. But, much like using comments to soft-delete code, an entire image can be stuffed in there, waiting for a specific decoder (or hex editor user) to erase the placeholder image and the comment prefix. Presumably this is just what the Alice software did: it would find FF DA, and ignore everything until after FF FE 00 0E IMAGE DATA=>. Other decoders would simply ignore the real image, because that’s what the JPEG spec tells them to do.

Not seen above, but necessary for the process to work and interesting to consider is the sequence FF 00. All markers, including comments, are only terminated upon encountering another FF byte. Once you get to the compressed image data, you’re likely going to need FF bytes that aren’t instructions to the decoder. These are essentially escaped by the two-byte sequence FF 00. The decoder knows that this is not the start of a new chunk, but rather a literal FF. This works across the board – which means that our commented-out image can (and does) contain several FF 00 sequences, and the decoder does not interpret this as termination of the comment.

Finally, it’s worth noting that JPEG images end with FF D9, which are (expectedly) the last two bytes in any of the given .CMP files. The placeholder image doesn’t need its own FF D9, since the one at the end of the file is the next marker that’s encountered after the comment regardless. In fact, doing so likely would have required additional logic in the Alice placeholder-removal scheme, as you would now have to ignore the end-of-image marker under (exactly) one specific condition on top of everything else.

This is obviously not a robust form of copyright protection, and seemingly lends itself to an inefficient set of Huffman and quantization tables as well. These inefficiencies could likely be handled better by modern encoders designed around needing tables for two images, and it is interesting to think of potential use-cases. One could, theoretically, comment out encrypted image data, while leaving a placeholder image that tells a user as much4. Practical? Likely not, but as much as us code nerds take our ability to comment out code for granted, it’s rather fascinating to see the same techniques played out in the binary sphere.

  1. I have a policy against including raster images in my posts, but this seems like a perfectly valid time to make an exception: Two 8 x 8 grids of greyscale pixels that start highly contrasted in the upper left and fade to the lower right.. ↩︎
  2. Among other things, the images seemed to be of forensic evidence, automobile damage, a patient’s teeth, and a museum artifact. ↩︎
  3. The sample image chosen for this demonstration was a thumbnail; 80x60 are actually the correct dimensions and not part of the placeholder. ↩︎
  4. Some data would likely still get through – image dimensions, for one. Presumably these are not sensitive, but what about the Huffman tables? How much information can be gleaned from the bits and bobs that define the image compression? Could these data chunks be encrypted and commented out as well? ↩︎