The Grumpy Generalist

What does the perfect programming language look like? Certainly every programmer has a language they favor, but can we come up with an objective measure of quality and apply it directly to language syntax? It's likely that no single language can please everyone, and there are certainly many aspects of language success and practicality beyond syntax alone. But I believe that we can still endeavor to objectively describe at least fragments of the perfect programming language's syntax.

When writing programs, the programmer should have a short list of goals. Correctness is always the highest priority, followed by maintainability and runtime efficiency in some order. Of these three goals, correctness and maintainability can be highly influenced by programming language syntax. Thus the goal of the perfect language's syntax should be to allow the programmer to specify the behavior of the program while minimizing logical errors and maximizing readability.

I want to spend these next few posts examining some specific constructs that are commonly present in programming languages and evaluate how they measure up against this goal. To kick off the series, we'll examine a rather fundamental language concept: variables.

Declaration vs Initialization

Many languages separate the concepts of variable declaration and variable initialization. Most often the declaration step appears to be necessary to establish a variable's valid scope, while the initialization step is necessary to ensure that the variable contains useful data. Unfortunately this strategy in many languages can lead to the existence of uninitialized variables, which create a source of error that can become quite painful if not dealt with appropriately.

These languages are doing a poor job of using their syntax to minimize logical errors. A more interesting idiom used by many dynamically typed and functional languages is to combine the declaration and initialization steps. There can be no variable without an initialization, and therefore the uninitialized variable has been completely removed as a source of error at the syntax level.

But a number of these languages also take it a step further. When variable initialization and declaration are combined, the type of the variable can always be inferred to be the type of the expression used to initialize it. The explicit type declaration becomes redundant, forcing the programmer to tell the compiler or interpreter something it already knows. When choosing between a machine that always knows the correct answer and a human, the desire to minimize logical errors will lead one to choose the machine every time.

Multiple Initialization

Many programmers find that occasionally they need to write a function that returns two values rather than one. It seems that many early language designs didn't account for this, but workarounds quickly popped up. How many times have you seen this idiom in C or C++ code?

int a;
int b = myFunction (&a);

The programmer decided that myFunction needed to return two separate values to the caller, yet the C and C++ languages can make this difficult to do cleanly. The programmer is left to choose between creating an uninitialized variable or initializing with a temporary value that will immediately be overwritten.

Furthermore, there's no guarantee that the function will actually write a value into our a variable. It seems that our ideal language needs some sort of explicit syntactical support for initializing multiple variables with a single expression.

Python, among other languages, handles this with a tuple type that can be expanded at initialization time. To create our two variables a and b in Python, one might use this code:

(a, b) = myFunction()

This syntax works well, and some similar functionality will be a requirement of our ideal language. We can support not only one or two, but potentially infinite return values from a function. We can guarantee syntactically that the function returns the expected number of values and no less. We can determine the proper type of each variable as part of the assignment, and we still have no way to create an uninitialized variable.

Type Conversions and Undefined Behavior

We have a lot for which to thank dynamic languages, but if our goal is to use our language's syntax to minimize logical errors then we're left with no reasonable choice but strong static typing. The perfect language can't afford to defer error checking til runtime as is often the case in dynamic languages.

Much like uninitialized variables, implicit type conversions are a source of error that can be entirely eliminated by careful language design. A language offering implicit conversions is presumably optimizing for authoring efficiency. While the ability to create programs quickly is important, I believe it is nowhere near as important as program correctness.

Similarly, undefined or unspecified behavior is not a sensible inclusion in our language. Why add behavior that's syntactically valid but logically invalid? At that point one has to wonder if the language designer is trying to maximize rather than minimize sources of error. If behavior must be undefined, it should be rejected outright by the compiler or interpreter.

Mutability vs Immutability

Functional languages appear to highly value the concept of immutable state. Immutable variables are those which cannot have their value changed after assignment. To fans of imperative languages this often seems like nonsense, but it's not so far-fetched. If you're writing in C or C++, there's a good chance that your compiler is translating your code into SSA form anyway, essentially turning all of your variables into immutable ones.

I believe that the perfect language should strongly encourage the programmer to use immutable data as often as possible. Mutable variables simply have more operations that can be performed on them, implying that they provide the programmer with more opportunity to perform incorrect operations.

Can we ever completely remove the concept of mutability from a programming language? There's a lot of research around this, and almost cult-like fanaticism around some of the proposed solutions. I'm not convinced, but at a bare minimum the ideal language should provide clear and unambiguous support for immutable variables and clearly distinguish them from mutable ones.

Conclusion

There's certainly a lot more to a programming language than just variables, and we haven't even touched on readability yet. But I hope that even this short trip through language design has been interesting. How does your favorite language stack up? What about the language you work in the most? None of mine fare particularly well so far, but I'm anxious to continue the analysis in future posts. There's a lot of ground to cover and I hope you'll stay with me.

Your assignment: Write a ZMachine capable of playing the original Zork.

The game Zork was published in 1980 and remains one of the most loved interactive fiction games of all time. The game was originally written in a variant of MIT's MDL language, and run in a virtual machine known as the ZMachine.

While I was familiar with Zork, my first real exposure to the ZMachine itself came when we decided to include a working version of Zork as an easter egg in the game Call of Duty: Black Ops. We couldn't find an existing ZMachine with licensing that met our requirements, so I sat down one weekend and started to write one from scratch. Another engineer also got involved, and in a very short time our easter egg was working perfectly. We introduced a new generation to a wonderful game, and as a happy side-effect our implementation probably ended up becoming the most commercially successful one of all time.

Since then, writing a ZMachine has become my default project for messing around in new programming languages or just getting myself out of a creative funk. There are a number of reasons that I think this project is great for programmers of all experience levels.

It's fun!

I'm not sure why, but I always end up having a blast with this project. Maybe it's the VM itself, which has a very nice overall design. Maybe it's the fact that you're seeing a piece of gaming history come to life before your eyes. Maybe it's just the fact that you're playing a classic of the interactive fiction genre. But whatever the reason, it's a ton of fun.

There's nothing quite like finally implementing the last instruction and seeing the classic Zork intro pop up on screen in an interpreter that you just finished writing. I'm on my third ZMachine implementation and it's still incredibly rewarding every time I see that first prompt.

It's challenging, but not difficult.

The ZMachine itself is a relatively straightforward piece of software, but it includes some complex interactions that present an interesting design challenge. Looking at how you solve these problems and comparing with others can give you some interesting insight into how you write software and how you can improve the ways you design code.

I feel like system and code design is one of the areas in which novice and beginning programmers are weakest. For more experienced engineers, maybe we spend more time fixing bugs or maintaining existing code than we do actually thinking out how a new system will work. Writing a ZMachine exercises those muscles. I find it valuable, especially when working in new languages where idiomatic code may look different than what I'm used to writing.

It's not time-consuming... unless you want it to be.

A good ZMachine can be written in a weekend, or more likely spread out across a few of them if you don't want to rush it. But it can become complicated if that's what you want.

Did you enjoy writing an interpreter? How about writing a debugger? What about writing a compiler for the original Zork MDL source? What about writing your own interactive fiction game, toolkit, or VM?

Want to ease into a new UI toolkit? Hook your ZMachine up to it. Want to play around with an IRC bot or web service? Have it run the ZMachine. There are thousands of ways to branch this project into a series of others.

Interested?

Here are some resources to help you get started.

Z-Machine Standards Document - This is your bread and butter. The 1.1 standard is hard to read and there are a few non-obvious things about how some of the instructions work, but this is the best place to start. If you are just interested in running Zork, you can ignore everything that's higher than Version 3.
ZTools - The disassembler provided by this toolkit can prove quite helpful in checking things out.

You'll also need a copy of zork.z3 or another Version 3 binary, but I can't distribute any of them here.