The voice of a wizard hacking away

My pals at Sandy Pug Games have opened up preorders for WIZARDPUNK, a zine of various wizard stories and whatnot. It’s full of brilliant work, and I highly recommend checking it out! I have a little epistolary slice-of-life piece in it, which I’m honestly pretty proud of. In addition to this, I was asked if something rather curious was possible, if there was any way some audio-producing computer code could be squished down to a reasonable size such that someone could theoretically type it in.

VTL-2: golfing and preservation

I’ve been playing with Gary Shannon and Frank McCoy’s minimalist programming language from the ‘70s, VTL-2 PDF as of late. Written for the Altair 8800 and 680 machines, it was designed around being very small – the interpreter takes 768 bytes in ROM. It has quite a few tricks for staying lean: it assumes the programmer knows what they’re doing and therefore offers no errors, it uses system variables in lieu of many standard commands, and it requires that the results of expressions be assigned to variables. It is in some ways terse, and its quirks evoke a lot of the fun of the constrained languages of yore. So, I’ve been playing with it to come up with some solutions for challenges on Programming Puzzles and Code Golf Stack Exchange (PPCG)1.

I mentioned that it is in some ways terse, and this is a sticking point for code golf. VTL-2 is a line-numbered language, and lines are rather expensive in terms of byte count. Without factoring in any commands, any given line requires 4 bytes: (always) two for the line number, a mandatory space after the line number, and a mandatory CR at the end of the line. So, at a minimum, a line takes 5 bytes. This became obvious when golfing a recent challenge:

3 B=('/A)*0+%+1

saved a byte over

3 B='/A
4 B=%+1

These almost certainly look nonsensical, and that is largely because of two of the things I mentioned above: the result of an expression always needs to be assigned to a variable, and a lot of things are handled via system variables instead of commands. For example, ' in the code above is a system variable containing a random number. There is no modulo nor remainder command, rather % is a system variable containing the remainder of the last division operation. Thus originally, I thought I had to do a division and then grab that variable on the next line. As long as the division is performed, however, I can just destroy the result (*0) and add the mod variable, making it a single shot. It’s a waste of our poor Altair’s CPU cycles, but I’m simulating that on modern x64 hardware anyway. And despite the extra characters, it still saves a byte2.

Other notable system variables include ? for input and output:

1 A=?
2 ?="Hello, world! You typed "
3 ?=A

Line 1 takes input – it assigns variable A to the I/O variable, ?. Line 2 prints “Hello, world! You typed &rquo;, and then line 3 prints the contents of variable A. Lines 2 and 3 assign values to the I/O variable. The system variable # handles line numbers. When assigned to another variable (I=#), it simply returns the current line number. When given an assignment (#=20), it’s akin to a GOTO. The former behavior seems like it could come in handy for golf: if you need to assign an initial value to a variable anyway, you’re going to be spending 4 bytes on the line for it. Therefore, it may come in handy to, say, initialize a counter by using its line number: 32 I=#.

Evaluation happens left-to-right, with functional parentheses. Conditionals always evaluate to a 1 for true and a 0 for false. Assigning the line number variable to a 0 in this way is ignored. With that in mind, we can say IF A==25 GOTO 100 with the assignment #=A=25*100. A=25 is evaluated to a 1 or a 0 first, and this is multiplied by 100 and # is assigned accordingly. ! contains the last line that executed a #= plus 1, and therefore #=! is effectively a RETURN.

There’s obviously more to the language, which I may get into in a future post3. Outside of the syntactical quirks which make it interesting for hobbyist coding, the matter of running the thing makes it less than ideal for programming challenges. Generally speaking, challenges on PPCG only require that a valid interpreter exists, not that one exists in an online interpreter environment such as Try It Online (TIO). In order to futz around in VTL-2, I’m running a MITS Altair 8800 emulator and loading the VTL-2 ROM. TIO, notably, doesn’t include emulation of a machine from the ‘70s with a bundle of obscure programming language ROMs on the side.

This brings me to my final point: how much effort is being put into preserving the lesser-known programming languages of yore, and how much should be? I personally think there’s a lot of value in it. I’ve become so smitten with VTL-2 because it is a beautiful piece of art and a brilliant piece of engineering. Many languages of that era were, by the necessity of balancing ROM/RAM limitations with functionality and ease of use. Yet, there’s no practical reason to run VTL-2 today. It’s hard to even justify the practicality of programming in dc, despite its continued existence and its inclusion a requirement for POSIX-compliance. New esoteric languages pop up all the time, often for golfing or for sheer novelty, yet little to no effort seems to be out there to preserve the experimental languages of yesteryear. We’re seeing discussions on how streaming gaming platforms will affect preservation, we have archive.org hosting a ton of web-based emulators and ROMs, we have hardware like Applesauce allowing for absolutely precise copies to be made of Apple II diskettes. Yet we’re simply letting retro languages… languish.

To be clear, I don’t think that this sort of preservation is akin to protecting dying human languages. But I do think these forgotten relics are worth saving. Is “Hello, world!” enough? An archive of documentation? An interpreter that runs on an emulator of a machine from 1975? I don’t know. I don’t feel like I have the authority to make that call. But I do think we’re losing a lot of history, and we don’t even know it.


Interpreting 69lang (a ;# dialect) in dc

PPCG user caird coinheringaahing came up with a language, ;#, and a code golf challenge to implement an interpreter thereof. The language has two commands: ; adds one to the accumulator, # mods the accumulator 127, outputs its ASCII counterpart, and resets the accumulator. It is, quite clearly, a language that only exists for the sake of this challenge. I mostly do challenges that I can do in dc, which this challenge is not (dc simply cannot do anything with strings).

What I can do is establish a dialect of ;# wherein the commands are transliterated to digits1. I’ve opted for 6 and 9, because I am a mature adult. So, assuming the input is a gigantic number, the challenge is fairly trivial in dc: 0sa[10~rdZ1<P]dsPx[6=Ala127%P0sa]sR[la1+saq]sA[lRxz0<M]dsMx

0sa initializes our accumulator, a, to zero. Our first macro, [10~rdZ1<P]dsPx breaks a (presumably very large) number into a ton of single-digit entries on the stack. ~ yields mod and remainder, which makes the task quite simple – we keep doing ~10, reversing the top-of-stack, and checking the length of our number. Once it’s down to a single digit, our stack is populated with commands.

The main macro, [lRxz0<M]dsMx runs macro R, makes sure there are still commands left on the stack, and loops until that is no longer true. R, that is, [6=Ala127%P0sa]sR tests if our command is a 6 (nee ;), and runs A if so. A has a q command in it that exits a calling macro, which means everything else in R is essentially an else statement. So, if the command is a 9 (or, frankly, anything but a 6), it does the mod 127, Print ASCII, and reset a to zero stuff. All we have left is [la1+saq]sA, which is macro A, doing nothing but incrementing a.

66666666666666666666666666666666666666666666666666666666666666666666666696666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666696666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666669666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666966666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666696666666666666666666666666666666666666666666696666666666666666666666666666666696666666666666666666666666666666666666666666666666666666666666666666666666666666666666669666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666966666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666696666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666669666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666696666666666666666666666666666666669
0sa[10~rdZ1<P]dsPx[6=Ala127%P0sa]sR[la1+saq]sA[lRxz0<M]dsMx

Hello, World!

Antiquine

A quine is a program that does nothing but output its own source code. In various programming challenges (notably, on PPCG) the word is often generalized to refer to those which are somehow related to this behavior – print the source code backward, input n and print n copies of the source, etc. An interesting challenge from 2013 floated to the top of PPCG this week a few weeks ago (I’ve been sitting on this post for a while), Print every character your program doesn’t have. While I don’t particularly feel the need to dive into everything I attempt on PPCG, this was a very interesting challenge in how seemingly trivial decisions that appear to shrink the code could have very well ended up growing it.

The premise was, for the printable ASCII set, print every character not present in your source code. In dc, the sort of baseline solution to beat is one which simply contains and comments out every single printable ASCII character, # !"$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ – 95 bytes. So that’s the thing to beat. Even if I couldn’t beat it, I’m not sure I would have submitted that – it’s just so boring1. My strategy was to:

The program, then, starts by pushing a list of ASCII code points that we need to exclude onto the stack, which I’ll get to after the fact. The meat of the program is [l.dP]s;34s.[dl.!=;s,l.1+ds.127>/]ds/x. This defines two macros, ; and / and runs / (as main essentially). Originally I had thought I would name all of my registers (macros included) with characters otherwise already used in the code, so as to not add bytes to the exclusion list. But because I (probably) needed every digit between 0 and 9, my list was already going to be partially incrementally generated, and since I needed > and < which are just a few code points away from 9, it was actually beneficial to introduce the ‘missing’ characters into the code to factor more into the incremental generation process. Theoretically, I could comment out some of these characters, but then I’d have to add the comment character, # to the list as well.

The macro / is rather straightforward. It duplicates top of stack, compares it to the counter ., runs the printing macro ; if they don’t match, ‘drops’2 the top of stack (which has an extra value on it if ; ran), increments the counter ., and continues running as long as we’re still below 127. The macro ; does nothing save for printing the ASCII value of . and leaving some padding on the stack – since / always deletes an item from the stack, we want it to be unimportant unless it’s removing an item from the list of exclusions.

Our list of exclusions is 120d115 112 108 100 93 91 80 62[d1-d48<:]ds:x45 43. We’re printing from low to high, so we make sure that from top to bottom, the stack is also arranged low to high. Since we compare our current ASCII value to this list of exclusions every time, we need to make sure there’s always something on the stack to compare to, so our largest/last value is duplicated. There’s a little decrementing macro in there to automatically push every value from 62 to 48 – the digits 0-9 as well as the aforementioned nearby characters that I used for macro names, etc. I tried generating the list in a handful of different bases – either to reduce bytes with fewer digits or to reduce exclusions with fewer possible numerals. In every case, the code to switch bases plus the overhead of adding that command to the exclusions list made it longer than simply using decimal.

This was an interesting challenge, one that I didn’t readily think I could beat the straightforward solution to. Many of the things that worked in my favor were counterintuitive – like introducing more characters that I had to exclude.


Golfing in Eukleides

Eukleides is decidedly not a golfing language, but when a geometry-related question came up on PPCG, I had to give it a shot. Eukleides can be quite rigid; for starters it is very strongly typed. Functions are declared by what they intend to return, so while set would be the shortest way to declare a function, it can’t really be exploited (unless returning a set is, in fact, desired). Speaking of sets, one thing that is potentially golfable is ‘casting’ a point to a set. Given a point, p, attempting a setwise operation on p will fail because a point does not automatically cast to a set (strict typing). p=set(p) will overwrite point p with a single-item set containing the point that was p. If, however, it is okay to have two copies of the point in the set, p=p.p is three bytes shorter.

If user input is required, the command number("prompt") reads a number from STDIN. The string for the prompt is required, though it can be empty (""). Thus, if more than four such inputs (or empty strings for other purposes) are required, it saves bytes to assign a variable with a one-letter name to an empty string.

Whitespace is generally unavoidable, but I did come to realize that boolean operators do not need to be preceded by whitespace if preceded by a number. So, if a==7or a==6 is perfectly valid. aor a, 7ora, 7 or6 are all invalid, however. This may be an interpreter bug, but for the time being it is an exploitable byte.

Finally, loci. Loci are akin to for-loops that automagically make sets. Unfortunately, they don’t seem to care much for referencing themselves mid-loop, which meant that I couldn’t exploit how short of a construction they are compared to manually creating a set in a for-loop.

This was a fun exercise, and just goes to show that if you poke around at any language enough, you’ll find various quirks that may prove useful given some ridiculous situation or another.


Nearest Fibonacci number in dc

I hadn’t really exercised my code golf skills in a while, but saw a fun challenge today that seemed like an easy solve, well-suited for dc. The challenge was, given a positive integer, return the nearest Fibonacci number. The Fibonacci sequence is fun on its own in a stack language; in dc one step can be performed with something like dsf+lfr. Given that the stack is prepared with two integers, this will duplicate the top, store it in an intermediate register f, add the two previous values, push f back onto the stack, and then swap them such that they’re in the correct order. It doesn’t pollute the stack, either, these two values are all that remain.

For this challenge, those two values are all I ever need – my input (which I will refer to as i) is always going to fall between two consecutive numbers in the Fibonacci sequence (or on one of them, in which case I would only need that value to test against). Keeping as much work on the stack as possible is ideal when golfing in dc because the byte cost of working with registers adds up quickly. So my strategy is to seed the Fibonacci generator with two 1s, and run it until the larger of the two Fibonacci numbers is greater than i. One of those two Fibonacci numbers will be the right one, and if i happened to be a Fibonacci number, I’ve just generated an extra one for no reason. I convert both of the Fibonacci numbers to their respective differences from i. Since I know for a fact that the top of the stack is greater than i, and the second value on the stack is either less than or equal to i, I don’t have to worry about dc’s lack of an absolute value mechanism; I simply subtract i from the big one and subtract the small one from i. Since I know which difference is which, I have no need to retain the Fibonacci numbers. I simply compare the differences, and then reconstruct the Fibonacci number by adding or subtracting the difference to i depending on which difference ‘won’. The code:

?si1d[dsf+lfrdli>F]dsFxli-rlir-sd[lild-pq]sDdld<Dli+p

…and the explanation:

?si                  Store user input in register i
1d                   Push 1 to stack, duplicate it to seed Fibonacci sequence
[dsf+lfrdli>F]dsFX   Macro F: Fibonacci generator as described above, with the
                       addition of loading register i and continuing to run F
                       until the top-of-stack Fibonacci number is larger than i
li-                  Turn our high Fibonacci number into its difference
rlir-                Swap the stack, and turn our low Fibonacci number into its
                       difference. The stack stays swapped, but it doesn't 
                       matter
sd                   Store our low Fibonacci difference in register d
[lild-pq]sD          Macro D: reconstruct the low Fibonacci number by 
                       subtracting d (the difference) from i; print the result
                       and quit. Macro is stored for later use.
dld<D                Duplicate the high Fibonacci difference and push the low 
                       Fibonacci difference onto the stack. Compare them and run
                       D if the low difference is less than the high difference
li+p                 Else, add i to the high difference and print the result

wo: A truth machine in wo3

In wo3, we get two extra instructions for a total of four. OISCs can be Turing complete, so four instructions should be enough to give us some brunt. At this point, some amount of math is probably a good idea. Instruction three, then, is ADD, which does what you think it does. Since we’re capable of entering negatives, we can pretty much get to any number we want at this point, albeit terribly inefficiently.

Branching would be very convenient at this point as well, so a simple SKIPZERO instruction gets the no. 4 position; it pops a value from the stack and if that value is zero, it skips until past the next instruction. All odd (input) words are skipped until an even (instruction) word is encountered, at which point that is skipped and the next word is interpreted. Skip if zero is a standard simple conditional branch, but I’ll likely change this to skip if (skip register), so a user can theoretically set the branch condition via a register (with a command certain to be available in wo4).


wo: Numbers

A draft of this post began:

A few thoughts on numbers in wo. Numbered input is pushed to the stack via LSB==1. LSB==0 signifies an instruction, so in either case, effective word size is 1 bit less than total word size. The stack itself won’t care what sort of numbers it’s holding, input is the tricky part. wo3 would allow you to enter the integers 0-3, or 2-1 depending on unsigned vs. signed input. wo4 0-7 or 4-3. These aren’t big numbers to work with, so getting instructions in early to allow for more complicated input would be welcome. The easiest solution here is simply getting the WORDSIZE instruction out of the way as early as possible, ideally in two bits. With signed input, this would only get a user up to a 4-bit word size with 6 bits of input. Unsigned, a user could get up to 6-bit word size in the same 6-bit input, which raises the question of whether allowing a single word input of 1 is more important than more immediate access to slightly higher integers. Signed input would reduce the need for a sign-changing instruction at the low end of the set, and there are little hacks that could be put in place – for example, if 1s’ complement was used instead of 2s’, 0 could act as a stand-in for the next highest byte size or the like. Thus 6 bits of input could immediately kick word size up to a full byte.

As outlined in this post killing off WORDSIZE, I have changed my thoughts on that matter. I do think input will (by default) be 1s’ complement, with 0 receiving special treatment, what I am internally referring to as the Magic Number or MNum. MNum would allow some instructions to serve multiple purposes, which could theoretically put us on the path to larger integers at lower word sizes. Additionally, MNum can act as a stand-in for a predetermined value for other instructions, again opening up some integer growth options.


wo: Word size

A few things have occurred to me regarding the WORDSIZE instruction in wo. Namely, that changing word size in the middle of a program could be entirely unpredictable – my first thought would be to break every word out into an array before interpreting (aside from the WORDSIZE instruction itself), but that would be impossible – the stack would necessarily have to be computed before WORDSIZE could execute. Even before this, I was wondering if I should really prioritize it so low in the instruction set, or if it just turns into an abuse of a reduced instruction set claim.


wo: Introduction

I’ve been thinking a lot about languages designed for code golf and programming puzzles/challenges. They run no risk of extinction — seems a new utterly inexplicable dialect is always freshly hatching. And here I am, about to fan these flames with my own bonkers idea. I call it wo1. Here’s the thing – there will be several posts in this series, as I figure out how I would ideally like this thing to work. There will be a lot of theoretical code. But it is very possible that I will never actually bother writing a reference implementation for it. I’d like to think that I will, but I’m really not a great programmer and the fun for me is largely in the thought experiment, the puzzle.

There are a few key points that I think will make this language unique. First and foremost, it’s based on binary instructions with variable word sizes. The interpreter should set the initial word size based on invocation: wo4 would start with 4 bit words, wo7 with 7 bit words, etc. Smaller word sizes mean more instructions per byte, but fewer instructions are available (and smaller numbers can be pushed to the stack). It is postfix, stack-based, influenced by RPN, RPL, Forth, and dc. The main stack is initialized at zero, and can never be empty. I am very seriously considering a stack that grows but does not shrink (aside from clears) – drop operations would duplicate the bottom of the stack like RPN. Word size would be resettable via a relatively low (preferably 3-bit) instruction. A separate stack would handle certain scratch activities like string manipulation. Least Significant Bit (LSB) would determine stack input vs. instruction execution. Nearly every instruction would be rewritable (don’t need GOTO? plop something else in there before dropping your word size and taking care of business). That pretty much sums up the core philosophy/principals, the big remaining question is implementation — primarily, instruction set.


Collatz sequences in dc

Inspired by this recent post and code snippet by John Gruber, I decided to have a go at outputting Collatz conjecture sequences in dc. Running

    dc -e '?[r3*1+]so[d2~1=opd1<x]dsxx'

from the command line will take a single value as input, and run the sequence. ? takes input, puts it on the stack. [r3*1+]so is our macro for n3+1, what we do when we encounter an odd value. [d2~1=opd1<x]dsxx is our main macro, as well as our macro for handling even numbers (n/2). First it duplicates the value on the stack (last iteration) and divides by 2 to return the quotient and the remainder. 1=o uses the remainder to test for oddness and run o if true. Here I do something rather lazy for the sake of concise code: that initial duplication was solely for the sake of oddness. o swaps the top of the stack to get that value back, then multiplies by three and adds one, leaving this value on the stack. Evenness will never run this, and since the test for evenness leaves the outcome for evenness on the stack (what luck!), and the initial duplication is below it. Either way, the bottom of the stack is going to fill up with a lot of garbage, which should be inconsequential unless our sequences become absurdly long. At this point, the top of our stack is the result of the current step, so we print it, duplicate it, and run x if it’s greater than 0. Finally, we duplicate our main macro, save it as x, and then execute the copy we left on the stack.


dc as a code golf language

Code golf is a quirky little game – complete the challenge at hand in your preferred programming language in as few bytes as possible. Whole languages exist just for this purpose, with single-character commands and little regard for whitespace. dc always seemed like a plausible language for these exercises, and I recently attempted a few tasks which I would not ordinarily use dc for, notably 99 Bottles of Beer.