An accidental PDF bomb

Recently I was tasked with ensuring a two-page document was §508 compliant, something that I do every day. I didn’t really expect any hang-ups; even the most complicated two-page PDF is still only two pages. I got through the first page with ease. Navigating via the Tags panel, as I do, I landed on a table in the second page. Acrobat immediately stopped responding. Frustrating, but Acrobat is not the stablest of software, so I didn’t think much of it. Back into the fray, Acrobat hangs up at the exact same spot. Third time around, I turn off highlighting, and am able to start unfolding the table…


…and so on, until I gave up. I folded it back up and ctrl-clicked the disclosure triangle to automatically unfold the entire nest. Acrobat immediately stopped responding. I had a suspicion, and upon opening the file up yet again and changing the topmost <TR> to <Not a TR>, I unfolded…

  <Not a TR>
      <Not a TR>
          <Not a TR>
              <Not a TR>

A self-referential nightmare

This was not a table with a lot of nested elements. It was, in fact, an infinite table. Essentially a form of resource bomb1, when Acrobat is required to do anything that actually requires traversing this nest of tags, it will never stop. Fortunately, with highlighting off and a bit of luck, I was able to delete the tag from within Acrobat. What if this wasn’t the case?

I’ve mentioned before that PDFs are, ultimately, plaintext files that are at least somewhat human-readable. Generally speaking, of course, this is not the case – exporters typically Flate- or LZW-compress every otherwise uncompressed stream in a given PDF. Fortunately, Acrobat can undo this, spitting out a thoroughly uncompressed PDF for us to examine. There are going to be some odd header bytes, and there’s likely other binary data (JPEGs, for instance), so I recommend doing this in a hex editor.

Let’s talk about objects

Before we actually look at the troublesome bit of the file in question, we need to be a little bit comfortable with the way objects are formatted. We’re dealing with indirect objects for the purpose of this post, enclosed within the delimiters x y obj␍ and ␍endobj, where x is a unique ID and y is a generational identifier (likely 0). Within this, we care about a few things: x y R is a reference to the object with aforementioned identifiers x and y, K or K[] is a single object or array of objects that are children of the current object, and P is the parent object. With that in mind, let’s look at our little troublemaker (highlighted for clarity):

189 0 obj␍<</A 190 0 R/K[76 0 R 191 0 R 192 0 R 193 0 R 194 0 R 195 0 R 196 0 R 197 0 R 198 0 R 199 0 R 200 0 R 201 0 R 202 0 R 203 0 R 204 0 R 205 0 R 206 0 R 207 0 R 208 0 R 209 0 R 210 0 R 211 0 R 212 0 R 213 0 R 214 0 R 215 0 R 216 0 R 217 0 R 218 0 R 219 0 R]/P 191 0 R/S/Table>>␍endobj

191 0 obj␍<</K[189 0 R 189 0 R]/P 189 0 R/S/TR>>␍endobj

In my tree diagrams at the beginning of this post, I failed to note that each <TR> actually had two copies of the corresponding <Table> in it. Also, the <TR> was, in effect, the problem – all of the other references seen in the table above were valid, necessary table rows. Following the highlighting, you can see that Object 189, the <Table>, has Object 191, the <TR> as both a child (K) and a parent (P). Likewise, Object 191 has Object 189 as both a child and a parent. In my experience, the child is the more pressing matter of the two; it creates a direct path downward into the neverending pit of tables and rows.

Safely fixing the problem

My original question here was how I could have fixed this in my hex editor of choice if I was unable to do so in Acrobat. The simplest way is to leave Object 191 intact, but remove the entire K[] section2, yielding 191 0 obj␍<</P 189 0 R/S/TR>>␍endobj. This will leave an empty <TR> in place of the troublesome one. This can then be deleted from the Tags panel, but before that happens, it’s still possible to get Object 189 (the <Table>) to cause a hang – changing its P value accordingly (in this instance it was Object 188) will safeguard against this. Finally, the tag should be easy to find, but if one wants to make it foolproof, we can add a Title to make it readily identifiable. The keyword is T and strings are delimited in parentheses, so 191 0 obj␍<</P 189 0 R/S/TR/T(BAD!BAD!)>>␍endobj will entitle it ‘BAD!BAD!’.

Can we delete the tag entirely from our hex editor? Absolutely, we just need to take a few extra precautions. We’ll delete the entirety of 191 0 obj␍<</K[189 0 R 189 0 R]/P 189 0 R/S/TR>>␍endobj. We need to have sorted out the appropriate parent object of Object 189 this time, and must set P x y R accordingly (given ID/generation as x/y). Finally, Acrobat will be a bit befuddled if we leave the nonexistent 191 0 R in Object 189’s K[] array, so we’ll delete this, yielding 189 0 obj␍<</A 190 0 R/K[76 0 R 192 0 R ….

Either of the above strategies will likely lead to Acrobat thinking it needs to fix the PDF3, but that’s just housekeeping and subsequent saves should patch it right up.

Final questions and thoughts

While the point of this post was largely to illustrate how one can manually edit the raw bits and bobs of a PDF to fix unique problems, two questions remain: where did this infinite loop come from, and why does it matter? The former has an unsatisfactory answer: I don’t know. I exported the PDF myself from a PowerPoint (.pptx) file that I was given. From inside PowerPoint itself, nothing seemed irregular about the table in question. Examining Office XML files is never a clean process, but my cursory glance didn’t see any odd recursion. Regardless, it’s distressing that the Acrobat export plugin for Office would render out such a structure.

As to why it matters, well, for many users it won’t. Tags aren’t really a necessary consideration for a straightforward visual presentation of a PDF4. Acrobat (and most viewers I’ve dealt with) don’t bother parsing the tags unless/until they have to. So, a sighted user opening this file up in Acrobat Reader, scrolling through it, and closing it when done would be entirely oblivious. Nothing about that scenario would trigger crash conditions. However, trying to open the document using a screen reader immediately crashes, as do most attempts to export the file to another format.

A document that crashes when presented with a screen reader is obviously a problem for accessibility. Unfortunately, it’s one that an amateur pass through the document would likely miss. Acrobat’s accessibility checker doesn’t dive deep enough to crash, nor does it manage to find the fatal flaw. I have long believed that accessibility checkers in software provide a false sense of accessibility and generally do more harm than good for this reason. While this particular issue is (hopefully) quite rare, it does reinforce my stance.

  1. I previously wrote about making bomb-style SVGs. ↩︎
  2. We can also break the loop by making Object 191’s two children something, anything else. Referencing a nonexistent object is a bit weird, but it works enough to buy us time. ↩︎
  3. I’m not entirely sure how it determines this. The only references to checksumming I’ve seen in the spec are specific to embedded objects, and I haven’t found any MD5 hashes in files I’ve worked on that Acrobat has felt the need to ‘fix’. It may very well be that there’s just a missing reference somewhere now. Regardless, I’ve never had this manifest as an actual problem. ↩︎
  4. I was curious about how Firefox would handle the file, as it sort of recreates a PDF in HTML. It doesn’t, however, do this very accurately it seems. Accordingly, it had no problem rendering the table (which it didn’t even render as a table). ↩︎

Solo: Islands of the Heart

Solo: Islands of the Heart is, in the words of the developers, “A contemplative puzzler set on a gorgeous and surreal archipelago” wherein the player “Reflect[s] on love’s place in [their] life with a personal and introspective branching narrative.” This sounds like peak me: I love puzzles, surreal landscapes, love, and introspection! To top it off, the game offers some flexibility regarding gender representation; you’re not automatically forced into a binary heteronormative default. I snatched it up pretty quickly after learning about it (and confirming that it at least attempted to be queer-friendly) and completed a run after a few days of casual pick-up-and-put-down play. While I’m not sure that it was quite what I’d hoped it would be, it made enough of an impression on me that I feel the need to write about it. Be warned, there may be some things that resemble spoilers ahead, but the game is very much dependent upon what you put into it, so I’m not even sure spoiling is… a thing.

The basics…

The basic gist of the game is that you hop around from island to island trying to activate totems. There are two pieces to each; you activate a small one, which shines a light at a large one which you can then talk to. Talking to the large totems asks you a question related to love, after which a new island opens up. There are some other minor puzzles along the way, like helping smitten dogs reach one another or watering gardens; these are all optional and don’t move anything forward in the game. Puzzles involve moving five different types of boxes around, generally so you can move upward to a place you can’t reach, or float via parachute to a far away bit of land. They are, for the most part, pretty simple and somewhat flexible in terms of solving. They can be frustrating in terms of guiding just how high up or far out you need to be to land on that island – suddenly you’re in the water again swimming back to your pile of boxes.

In my experience, there was a considerable disconnect between the ‘do a box puzzle’ and the ‘talk about your love life’ elements. I suspect that part of the idea here is to allow the introspective side of your brain some time to relax by running the lateral thinking bits instead. And, as a whole, I didn’t really mind that disconnect – but it stacked up with other things. I mentioned that it was fairly easy to misjudge just how high or far out you’d need to coax the boxes, lest you plunge into the sea. This happened to me quite a lot, often multiple times on the same puzzle in later stages. Swimming is slow, and faster swimming is achieved by hitting a certain rhythm with the swim button. This decision, too, I can easily justify as an exercise in mindfulness instead of impatiently button-mashing. But these things compound – things start feeling like busy work keeping you at bay while the totems think of something to ask.

Regarding the questions…

The questions the totems ask are not trivial, they run a fairly wide gamut and certainly lend themselves to introspection. Early on, one basically asked if I was polyamorous which… is honestly a very important sort of acknowledgement in a game like this. You’re asked how important things like sex and shared values are; you’re asked if you would abandon your family for a lover. You’re also asked questions that relate more directly to the path you choose at the beginning – that is, are you in love, have you loved and lost, or have you never loved at all. It’s easy, when answering this at the beginning of the game, to fall into the trap of your character being you. And, to be fair, I think that it would be a waste of energy to not align your choices in the game with your personal life and feelings. But, it’s important to keep a bit of distance, as the game will occasionally contradict your answers or dive into things that quite possibly aren’t at all applicable to your situation.

For example, having chosen in earnest the ‘in love once, but not now’ option, I was asked a lot of questions as to why I thought the relationship failed. One was about time, did I think time played a role. After answering ‘no’, the next question basically opened with ‘okay, but time basically had to play into it’, directly contradicting my honest response. This was the first moment where I got annoyed and began to realize I needed to distance myself from the little tiny on-screen version of me that I was shaping. Some of the responses were, to me, absurd to the point of throwing me right out of the game’s depth, such as “You can’t fully hate what you don’t fully love”. But again, the key was to answer honestly while consciously separating myself from my avatar.

About those gender options…

I’d be remiss to not touch on the matter of gender. You can independently choose one of three body styles for your character, and one of three ‘genders’. While the game refers to it as gender and gives you the option of male, female, and non-binary, what it actually means is pronouns. To be clear, I’m glad that they put an effort into making this game inclusive, I’m glad that you can use they/them pronouns. But that’s not gender, and there’s no reason not to call it what it is. Both you and your partner1 get the three options; you can change yours at any time. It’s a root-level option in the pause menu, right with ‘Back to the main menu’ and ‘Settings’. This is absolutely the right way to handle a thing, and should be seen as an example for all developers to follow. Your partner is static upon initial choosing, which… is honestly a little weird, given the player’s flexibility. I would like to see this reconsidered.

In closing…

I’m glad that I played this game. I’d have to be very cautious in recommending it, however: it’s very short, it’s not great as a puzzle game, and the disconnects mentioned (between puzzle and introspection, between player and avatar) are a little tricky to reckon with. I doubt there’s much in the way of replay value – even writing this, I’d like to go through the beginning again to pull some direct quotations but at the same time… I really don’t want to. I might play through a different path if I find myself in love again, but even that feels like a toss-up. Still, there aren’t a lot of games doing this sort of emotional introspective adventure, and I think there’s a lot of value in it. And even though the matter of gender may be a bit flawed, enough of an attempt was made such that the game feels fairly inclusive (or, at least, not intentionally exclusive).

  1. I say partner even in the example of ‘no longer in love’ because you go through the game with a ghostly avatar of the other person in question. Even if they aren’t your partner in life, there is a ‘partner’ character with you through the introspection. ↩︎

Font changes, hopefully no major issues

Short meta-post. Until 2019-08-20, I was using Font Library as a CDN for the two webfonts1 that I use on this site: Hack for code blocks and other monospaced needs, and Gentium for everything else. Font Library was, for at least a week, down, leading to upsettingly long load times. I temporarily just removed the appropriate <link>s, allowing the site to render in the user’s default monospace and serif fonts, respectively. Font Library is back up, now, but the downtime made me think about alternative solutions. I sure as hell was not going to subject my audience to Google as the CDN. And I realized, I don’t really have any need for a CDN, why make the additional external requests? Why worry at all about a third-party’s uptime? So, I am currently hosting the copies of Gentium and Hack that the site uses. I’m not entirely sure it’s the same version of Gentium2, so I may need to poke around, say, math posts and see if any glyphs are missing. Otherwise, I think this is the best solution and should be relatively problem-free.

I think it’s worth briefly mentioning why Font Library was down. Microsoft, citing trade restrictions, started banning Iranian, Syrian, and Crimean hosting on GitHub. The Bassel Khartabil Fellowship was one such banned project, based in Syria. My understanding is that Font Library was not directly affected by this, but having been built partially on Khartabil’s work, removed their site from GitHub in solidarity and in opposition of the policy. I mention this because it’s important. It was a bold move for Font Library to have that much downtime out of principle, and I applaud them for it. I would not be at all dissuaded from continuing to use their service, except… again the whole thing made me realize there’s really no practical reason for me to use any CDN for font hosting.

One final meta note, I have updated all posts of the category ‘lgbt’ to ‘lgbtqia’ instead. I think it’s just a habit of being of a certain age; I generally find myself defaulting to the four-letter initialism. But, there’s no reason not to try to be better and more inclusive, and this is such a simple update to make, it’s rather ridiculous not to.

  1. Okay, there’s a third, but it just holds the flower that I use for a bullet point & for horizontal rules and the flower-arrow that I use for external links (•, x) ↩︎
  2. I can’t use Gentium Plus because there’s no bold weight. Releasing a bold weight is a priority for SIL, apparently, and if that comes out I may move to it. However, because of its extensive unicode support, each file (even with WOFF compression) is ~600K. I would need to subset these fonts (which, by license, would require me to rename them as well) if I did move to them, because I’m not making anyone download ~2M of fonts. That’s absurd. ↩︎

The poetics of TTRPGs

I have often expressed, in a pseudo-jest of oversimplification, that I prefer novellas to novels, short stories to novellas, and poems to short stories. I have always been more drawn to the meditative experience of an impossibly-concise framework than the contemplative experience that length and breadth brings. That isn’t to argue that either experience is objectively better, more difficult to create, nor more serious or worthy of being canonized as art – I, myself, personally just find something extremely satisfying in art that I can hold in a single breath. That oxygenates my blood and travels throughout me.

At Gen Con this year1, I had the opportunity to play Alex Robert’s For the Queen, a short, card-based, no-dice-no-masters TTRPG. The basic gist is that all of the players are on a journey in wartime with their queen, and characters and narratives unfold as players answer questions prompted by the deck of cards. You don’t really need a table, you don’t need to write anything. It’s an incredibly distilled essence of roleplaying. The experience soaked into me, stuck in my mind. A week later, I was trying to figure out why, and how, and it occurred to me that the game is a poem.

My mind repeatedly wandered to another game that I love, that similarly demands I pore over its delicacy: The Quiet Year by Avery Alder. The Quiet Year is also free of masters, and also deck-based2. Cards lay out events that happen during a given season, and players use these events to draw a map that tells the story of a community. Two common themes between these games are cards and lack of a master, but I don’t think either of those elements specifically makes a game a poem. Cards prompting events are randomized, but it’s not the chaotic, make-or-break randomness of chucking a D20 at your GM. An egalitarian system free of masters adds an odd aura of intimacy within the group. They’re poetic elements, certainly, but that’s kind of like saying that in literature, everything that rhymes or
looks like
is apoem(period)

And certainly, there are a bunch of formalized rules that we can scrutinize and calculate and determine that aha! A given piece of written or spoken word simply must be a poem! But that’s clearly not what I mean, and I don’t know that it’s productive to try to break down countless elements and rule sets to establish an encyclopedic guide as to whether or not a given TTRPG will give me this lingering satiety. To me, it’s simply about feeling, much of which I believe comes from crossing boundaries, challenging expectations, and doing it all with the crash of shocking brevity.

Let’s talk about a game I haven’t actually played, Orc Stabr by Liam Ginty and Gabriel Komisar3. Fitting on a single sheet, it is a simple game (though a game made more traditional by way of both dice and masters). I suspect it is a fairly quick game, but again… I have not played it. Aside from the game itself, however, there was an additional experience layered onto it, a bit of a metagame if you will. It was launched on Kickstarter, and all of the materials for it were written from the perspective of its orc designer, Limm Ghomizar. Backers could get a full sheet of paper, or a hand-torn half-sheet of paper, encouraging them to find other backers to form a full, playable sheet with. Every sheet had something custom done to it – crayon doodles, recipes, custom rules, handprints, all manner of weird things that simply served to make each copy human, personal, and unique. Seeing folks post about their copies when they received them and just knowing that everyone was getting some different bit of weird was an act of art in itself. And that had that lingering feeling of something once, seemingly rigid, being shattered in the medium.

Clever means of introducing interactivity to narratives have always existed outside what we understand and refer to as gaming. Things like Fluxus’ event scores, the Theatre of the Oppressed, Choose Your Own Adventure novels. Community storytelling has always been a thing, and presumably ‘interactive storytelling, but with rules’ is not a particularly novel concept either. It’s almost certainly unfair, then, to presume that there’s really anything new about what feels like a Gygaxian mold being broken. But I do feel like I’m seeing more and more of this sort of thing being done very intentionally in a space dominated by long-campaign, dice-laden, hack’n’slash systems. There’s a vibrancy to the sense of art and emotion that is being put into games, and that I think seethes through the players of these games.

And that, to me, is poetry.

  1. I wrote a post about Gen Con that I’m unlikely to publish, it was more for the sake of diarism. Suffice it to say that my anxiety toward travel and planning made it difficult, but that it was a fun, wonderfully queer time, and I had some very positive new experiences. ↩︎
  2. The Quiet Year technically has dice, but they’re essentially used as counters. The game also runs several hours. It’s heavier than For the Queen, for sure, which I think gets to my point well… Trying to nail down some exacting set of qualifications is futile. ↩︎
  3. I am friends with the folks behind Orc Stabr, and scribbled on quite a few copies of the game myself. I may have more invested in the experience I describe because of that, but I’m not being compensated to speak kindly of it. Just a lil disclaimer. ↩︎

WTPDF: Role Mapping

PDF 1.7 supports a limited number of standard tags, limited enough that I can freely list them here: Document, Part, Article, Section, Division, Block quotation, Caption, Table of Contents (TOC), TOC item, Index, Paragraph, Heading, six hierarchical Heading levels, List, List item, List item body, Label, Table, Table row, Table header cell, Table data cell, Table header row group, Table body row group, Table footer row group, Span, Quotation, Note, Reference, Bibliography entry, Code, Link, Annotation, Figure, Formula, and Form. There are also Nonstructural1 elements and Private elements, though these are a bit more niche in use, as well as Ruby (with three additional child tags) and Warichu (with two) for Japanese and Chinese language documents. Forty-nine standard tags. Open up a file from, say, Excel, however, and you’ll find tags such as Workbook and Worksheet, in a perfectly valid document.

Enter role-mapping, a rather straightforward mechanism that allows you to create any tag and have it take on the role of any other tag. So, one can map Turkey to Heading Level 2 (<H2>), and place content inside of Turkey tags which will then be read as though they were regular Heading Level 2 tags. Some tools which export to more flexible formats like XML may leave these intact, but in general it is safe to assume that the new tag is, for all intents and purposes, the tag it’s been mapped to.

This becomes important for folks doing accessibility (and other tagged PDF) work for a few reasons. First of all, on large/complex documents, one can easily create their own tags to help with organization, all while mapping them to standard tags that make sense to the outside world. In the first paragraph of this post, I mentioned the other reason that it’s important to (at least) know about their existence – various PDF exporters will use them. If you’re doing any work with tags in PDF, it is vital to know what these nonstandard tags ‘really’ are. Often I disagree with what an exporter has assigned a role to, and instead of redoing every tag, one can simply reassign the role. Finally, only in an absolute pinch, if a document has been authored in some screwy sort of way, say, every paragraph was tagged <H6> because the person who created it liked the aesthetic of that style in Word, standard tags can also be assigned different roles. You can make Heading Level 6 behave like a paragraph instead. You absolutely should not unless you have some kind of time crunch and no better option, but that band-aid is there.

In Acrobat, the Role Map can be viewed and edited from ‘Edit Role Map…’ in the contextual menu of the Tags panel.

  1. Nonstructural elements (<NonStruct>) still contain tagged content and will still be picked up by assistive tech, etc. The nonstructural bit simply means that the tag has no semantic meaning. This, combined with role maps, can be useful for organizing tags without changing the structure – e.g. mapping <Chapter1>, <Chapter2>, etc. to <NonStruct>. ↩︎