PDF 1.7 supports a limited number of standard tags, limited enough that I can freely list them here: Document, Part, Article, Section, Division, Block quotation, Caption, Table of Contents (TOC), TOC item, Index, Paragraph, Heading, six hierarchical Heading levels, List, List item, List item body, Label, Table, Table row, Table header cell, Table data cell, Table header row group, Table body row group, Table footer row group, Span, Quotation, Note, Reference, Bibliography entry, Code, Link, Annotation, Figure, Formula, and Form. There are also Nonstructural1 elements and Private elements, though these are a bit more niche in use, as well as Ruby (with three additional child tags) and Warichu (with two) for Japanese and Chinese language documents. Forty-nine standard tags. Open up a file from, say, Excel, however, and you’ll find tags such as Workbook and Worksheet, in a perfectly valid document.
Enter role-mapping, a rather straightforward mechanism that allows you to create any tag and have it take on the role of any other tag. So, one can map Turkey to Heading Level 2 (<H2>), and place content inside of Turkey tags which will then be read as though they were regular Heading Level 2 tags. Some tools which export to more flexible formats like XML may leave these intact, but in general it is safe to assume that the new tag is, for all intents and purposes, the tag it’s been mapped to.
This becomes important for folks doing accessibility (and other tagged PDF) work for a few reasons. First of all, on large/complex documents, one can easily create their own tags to help with organization, all while mapping them to standard tags that make sense to the outside world. In the first paragraph of this post, I mentioned the other reason that it’s important to (at least) know about their existence – various PDF exporters will use them. If you’re doing any work with tags in PDF, it is vital to know what these nonstandard tags ‘really’ are. Often I disagree with what an exporter has assigned a role to, and instead of redoing every tag, one can simply reassign the role. Finally, only in an absolute pinch, if a document has been authored in some screwy sort of way, say, every paragraph was tagged <H6> because the person who created it liked the aesthetic of that style in Word, standard tags can also be assigned different roles. You can make Heading Level 6 behave like a paragraph instead. You absolutely should not unless you have some kind of time crunch and no better option, but that band-aid is there.
In Acrobat, the Role Map can be viewed and edited from ‘Edit Role Map…’ in the contextual menu of the Tags panel.
- Nonstructural elements (<NonStruct>) still contain tagged content and will still be picked up by assistive tech, etc. The nonstructural bit simply means that the tag has no semantic meaning. This, combined with role maps, can be useful for organizing tags without changing the structure – e.g. mapping <Chapter1>, <Chapter2>, etc. to <NonStruct>. ↩︎