I’ve been increasingly interested in QR codes as of late, for reasons that Glenn Fleishman articulated far better than I could. I also just find them rather fascinating as a format. There’s a lot of redundancy to account for errors and damage (wonderfully demonstrated here), and a handful of possible masks that overlay all of the data means that the exact same data will have myriad possible representations. I’ve also been curious as to the most efficient ways to store and present the data (GIFs and 1 bit/pixel PNGs done at the pixel-level and then scaled up seem pretty good), and got to wondering if Unicode Block Elements (2580-259F) would work. As above, they seem to, albeit with the entire block scaled to 60% vertical height and the line-height condensed. Also, Hack (the monospace font I use on this site) seems to render 2588, FULL BLOCK, as a quarter-sized block centered in the space that it should be filling up all of. So I substituted 2593, DARK SHADE, which works. Also, the squareness and contiguousness of the thing seems to be crucial for recognition, far more so than the integrity of the actual data within.
This was a truly pointless exercise, but hey, it’s a thing that can be done.
How we’re presented with data skews how we interpret that data. Anyone who has read a lick of Tufte (or simply tried to make sense of a chart in USA Today) knows this. Many such commonly-encountered misrepresentations tend to relate to scaling. Volatility in a trend line minimized by a reduction in scale, perspective distortion in a 3D pie chart, a pictograph being scaled in two dimensions such that its visual weight becomes disproportionate – technically, the graphic may accurately line up with the values to be conveyed, but visually the message is lost.
In taking mandatory domestic violence training at work recently, I was thrown completely off by a statistic and accompanying graphic. The graphic (albeit using masculine and feminine silhouette images) resembled:
…with the corresponding statistic that one in every four women and one in every seven men has experienced domestic violence. While the numbers made sense to me, it took me a while to grasp them, because the graphic was simply showing more men than women. It’s something akin to a scale problem, our brains are going to see the obvious numbers – four and seven – instead of readily converting the graphics into their respective percentages. How to fix this, then?
If we simply repeat our graphics such that the total number in either set is a common multiple, now it’s much simpler to process the information that we’re supposed to process. We might not immediately recognize that seven in twenty-four is the same as one in four, nor that four in twenty-four is the same as one in seven, but we do know that seven is greater than four (which caused the problem in the first place earlier), and now we’re not dealing with mentally constructing percentages from a simple visual.
My line of work involves a lot of mushing around of the requirements of customers who don’t know the Agency’s style requirements, best practices for web, and CMS limitations into something that comfortably adheres to the preceding rules while still pleasing (or at least appeasing) said customer. Often, folks inexplicably want a handful of tiny, near-zero content pages, when typically we try to present the content together, broken up by headers with IDs. I have to go through quite a few layers to communicate how their requirements will actually manifest, and the technical knowledge is reduced each step of the way.
The term ‘anchors’ comes up in these instances, a lot. I understand that the term makes sense as far as relating something concrete to a structural abstraction – the ‘anchor’ is cast somewhere on the page, and you can just follow the corresponding line (link) to it. I also understand how this came to be from a historical tech perspective – as I understand it, Tim Berners-Lee envisioned a far more bidirectional system of linking, so our
<a> (anchor) element would represent something more nodelike, an exit point and an entrance point.
But links don’t work that way on the internet as we know it. The
<a> element is probably the least logically named element in modern HTML. But for a while, even though
<a> elements still didn’t work that way, we had kind of a hack in place that accomplished the same goal. The HTML 3.2 specification tells us that “[Anchors] are used to define hypertext links and also to define named locations for use as targets for hypertext links”, with the distinction coming from whether an
<a> element has an
href or a
name attribute. It wasn’t until HTML 4.0 that we even had an
id attribute to use.
The two uses of the anchor element, while compatible in conjunction with one another (
<a href="www.example.com" name="here">) in line with Berners-Lee’s vision, are still semantically very different functions. HTML 4.0 still encouraged the dual-usage, but at least acknowledged these were fundamentally different things, “The A element may define an anchor, a link, or both.” It no longer actually calls
<a> anchor, and instead states that the element has two distinct usages. Obviously this is not great as far as semantics are concerned, but more trouble comes when one starts to introduce styling.
HTML 4.0 brought with it the big push for stylesheets, the separation of structure and content. Of course, if you’re styling
<a> to look like a link, all of your
<a> elements being used as anchors now just appear to be nonfunctional links. The solution wrecked any sense of structural connection between the anchor and the text it represents – simply use an empty
<a name="headline"> element in front of your text. This is clearly awful, and with the
id attribute now present and sharing namespace with
name, entirely unnecessary.
HTML5 still supports this behavior, though recommends against it. Anyone who cares at all about semantics, about accessibility should recommend against it. The CMS I use at work has finally done away with it. And I think that as we slowly come to our senses about this, we should probably just do away with the term ‘anchor’ as well. The attribute is
id, the hash in the URL denotes a ‘fragment identifier’. They’re a bit more jargonistic, but these are the terms I always try to use. There’s still a legacy connection between the word ‘anchor’ and the
<a> element. And when dealing with folks who occasionally wind up changing things that they don’t really have the background to be changing, legacy language can lead to legacy behavior, as well as making it more difficult to search for help they may need.
When I was in elementary school, I learned much of my foundation in computing on the Commodore 64. It was a great system to learn on, with lots of tools available and easy ways to get ‘down to the wire’, so to speak. Though it was hard to see just how limited the machines were compared with what the future held, some programs really stood out for how completely impossible they seemed. One such program was S.A.M. – the Software Automated Mouth, my first experience with synthesized speech.
Speech synthesis has come a long way since. It’s built into current operating systems, it can be had in IC form for under $9, and it’s becoming increasingly present in day-to-day life. I routinely use Windows’ built in speech synthesizer along with NVDA as part of my accessibility checking regimen. But I’m also increasingly becoming dismayed by the egregious use of speech synthesis when natural human speech would not only suffice but be better in every regard. Synthesis has the advantage of being able to (theoretically) say anything while not paying a person to do the job. I’m seeing more and more instances where this doesn’t pan out, and the robot is truly bad at its job to boot.
Three examples, all train-related (I suppose I spend a lot of time on trains): the new 7000 series DC Metro cars, the new MARC IV series coach cars, and the announcements at DC’s Union Station. None of these need to be synthesized. They’re all essentially announcing destinations – they have very limited vocabularies and don’t make use of the theoretical ability to say anything. Union Station’s robot occasionally announces delays and the like, but often announcements beyond the norm revert to a human. Metro and MARC trains only announce stops and have demonstrated no capacity for supplemental speech. Where old and new cars are paired, conductors/operators still need to make their own station stop announcements.
So these synthesizers don’t seem to have a compelling reason to exist. It could be argued that human labor is now potentially freed up, but given the robots’ limited vocabularies and grammars, the same thing could be accomplished with human voice recordings. I can’t imagine that the cost of hiring a voice actor with software to patch the speech together into meaningful grammar would be appreciably more expensive than the robot. In fact, before the 7000 series Metro cars, WMATA used recordings to announce door openings and closings; they replaced these recordings in 2006, and the voice actor was rewarded with a $10 fare card.
Aside from simply not being necessary, the robots aren’t good at their job. This is, of course, bad programming – human error. But it feels like the people in charge of the voices are so far detached from the final product that they don’t realize how much they’re failing. The MARC IV coaches are acceptable, but their grammar is bizarre. When the train is coming to a station stop, an acceptable thing to announce might be ‘arriving at Dickerson’, which is in fact what the conductors tend to say. The train, instead, says ‘this train stops at Dickerson’, which at face value says nothing beyond that the train will in fact stop there at some point. It’s bad information, communicated poorly. Union Station’s robot has acceptable grammar, but she pronounces the names of stations completely wrong. Speech synthesizers generally have two components: the synthesizer that knows how to make phonemes (the sounds that make up our speech), and a layer that translates the words in a given language to these phonemes. My old buddy S.A.M. had the S.A.M. speech core, and Reciter which looked up word parts in a table to convert to phonemes. This all had to fit into considerably less than 64K, so it wasn’t perfect, and (if memory serves), one could override Reciter with direct phonemes for mispronounced words. Apple’s
say command (well, their Speech Synthesis API) allows on-the-fly switching between text and phoneme input using
[[inpt TEXT]] and
[[inpt PHON]] within a speech string. So again, given just how limited the robot’s vocabulary is (none of these trains are adding station stops with any regularity), someone should have been able to review what the robot says and suggest overrides. Half the time, this robot gets so confused that she sounds like GLaDOS in her death throes.
Which brings me to my final point – the robots simply aren’t human. Even when they are pronouncing things well, they can be hard to understand. On the flipside, the DC Metro robot sounds realistic enough that she creeps me the hell out, which I can only assume is the auditory equivalent of the uncanny valley. I suppose a synthesized voice could have neutrality as an advantage – a grumpy human is probably more off-putting than a lifeless machine. But again, this is solvable with human recordings. I cannot imagine any robot being more comforting than a reasonably calm human.
Generally speaking, we’re reducing the workforce more and more, replacing the workforce with automation, machinery. It’s a necessary progression, though I’m not sure we’re prepared to deal with the unemployment consequences. It’s easy to imagine speech synthesis as a readily available extension of this concept – is talking a necessary job? But human speech is seemingly being replaced in instances where the speaking does not actually replace a human’s job and/or a human recording would easily suffice. In some instances, speaking being replaced is a mere component of another job being replaced – take self-checkout machines (which tend to be human recordings despite the fact that grocery store inventories are far more volatile than train routes, hence ‘place your… object… in the bag’). But I feel like I’m seeing more and more instances that seem to use speech synthesis which is demonstrably worse than a human voice, and seemingly serves no purpose (presumably beyond lining someone’s pockets).
By now we all know that Twitter has killed off Vine, or is slowly killing off Vine, or has killed off part of Vine and will kill off the rest of it in the future. My initial reaction to this was pure joy, for I have long hated Vine. That enthusiasm was tempered by promptly hearing from source after source how Vine was a huge creative outlet for oft-ignored black youth. That my experiences never crossed paths with this version of Vine is purely a failing on my part, plain and simple. A wake-up call to attempt to be less complacent and lazy in my media consumption.
If I were left to my own personal experiences with Vine, however, I would still be delighted with the news. This is because, put simply, I have never watched a Vine and felt like I got anything out of the video that I did not get out of the screencap. This is not a problem unique to Vine by any means, it seems that increasingly we live in a world where video is considered the most captivating medium, thus all content should be video. Rather than letting a creative work dictate its own medium and leaving the excitement factor as the responsibility of the creator, video checks off that box from the get-go. I guess if audiences are largely eating it up, then that’s true enough and fair enough. But I wonder just how many people clicked Vine after Vine and felt that they weren’t getting an appropriate return on their time investment.