Tuesday, August 19, 2014

Chasing the Perfect Programming Language

What does the perfect programming language look like?  Certainly every programmer has a language they favor, but can we come up with an objective measure of quality and apply it directly to language syntax?  It's likely that no single language can please everyone, and there are certainly many aspects of language success and practicality beyond syntax alone.  But I believe that we can still endeavor to objectively describe at least fragments of the perfect programming language's syntax.

When writing programs, the programmer should have a short list of goals.  Correctness is always the highest priority, followed by maintainability and runtime efficiency in some order.  Of these three goals, correctness and maintainability can be highly influenced by programming language syntax.  Thus the goal of the perfect language's syntax should be to allow the programmer to specify the behavior of the program while minimizing logical errors and maximizing readability.

I want to spend these next few posts examining some specific constructs that are commonly present in programming languages and evaluate how they measure up against this goal.   To kick off the series, we'll examine a rather fundamental language concept: variables.

Declaration vs Initialization


Many languages separate the concepts of variable declaration and variable initialization.  Most often the declaration step appears to be necessary to establish a variable's valid scope, while the initialization step is necessary to ensure that the variable contains useful data.  Unfortunately this strategy in many languages can lead to the existence of uninitialized variables, which create a source of error that can become quite painful if not dealt with appropriately.

These languages are doing a poor job of using their syntax to minimize logical errors.  A more interesting idiom used by many dynamically typed and functional languages is to combine the declaration and initialization steps.  There can be no variable without an initialization, and therefore the uninitialized variable has been completely removed as a source of error at the syntax level.

But a number of these languages also take it a step further.  When variable initialization and declaration are combined, the type of the variable can always be inferred to be the type of the expression used to initialize it.  The explicit type declaration becomes redundant, forcing the programmer to tell the compiler or interpreter something it already knows.  When choosing between a machine that always knows the correct answer and a human, the desire to minimize logical errors will lead one to choose the machine every time.

Multiple Initialization


Many programmers find that occasionally they need to write a function that returns two values rather than one.  It seems that many early language designs didn't account for this, but workarounds quickly popped up.  How many times have you seen this idiom in C or C++ code?

int a;
int b = myFunction (&a);

The programmer decided that myFunction needed to return two separate values to the caller, yet the C and C++ languages can make this difficult to do cleanly.  The programmer is left to choose between creating an uninitialized variable or initializing with a temporary value that will immediately be overwritten.

Furthermore, there's no guarantee that the function will actually write a value into our a variable.  It seems that our ideal language needs some sort of explicit syntactical support for initializing multiple variables with a single expression.

Python, among other languages, handles this with a tuple type that can be expanded at initialization time.  To create our two variables a and b in Python, one might use this code:

(a, b) = myFunction()

This syntax works well, and some similar functionality will be a requirement of our ideal language.  We can support not only one or two, but potentially infinite return values from a function.  We can guarantee syntactically that the function returns the expected number of values and no less.  We can determine the proper type of each variable as part of the assignment, and we still have no way to create an uninitialized variable.

Type Conversions and Undefined Behavior


We have a lot for which to thank dynamic languages, but if our goal is to use our language's syntax to minimize logical errors then we're left with no reasonable choice but strong static typing.  The perfect language can't afford to defer error checking til runtime as is often the case in dynamic languages.

Much like uninitialized variables, implicit type conversions are a source of error that can be entirely eliminated by careful language design.  A language offering implicit conversions is presumably optimizing for authoring efficiency.  While the ability to create programs quickly is important, I believe it is nowhere near as important as program correctness.

Similarly, undefined or unspecified behavior is not a sensible inclusion in our language.  Why add behavior that's syntactically valid but logically invalid?  At that point one has to wonder if the language designer is trying to maximize rather than minimize sources of error.  If behavior must be undefined, it should be rejected outright by the compiler or interpreter.


Mutability vs Immutability


Functional languages appear to highly value the concept of immutable state.  Immutable variables are those which cannot have their value changed after assignment.  To fans of imperative languages this often seems like nonsense, but it's not so far-fetched.  If you're writing in C or C++, there's a good chance that your compiler is translating your code into SSA form anyway, essentially turning all of your variables into immutable ones.

I believe that the perfect language should strongly encourage the programmer to use immutable data as often as possible.  Mutable variables simply have more operations that can be performed on them, implying that they provide the programmer with more opportunity to perform incorrect operations.

Can we ever completely remove the concept of mutability from a programming language?  There's a lot of research around this, and almost cult-like fanaticism around some of the proposed solutions.  I'm not convinced, but at a bare minimum the ideal language should provide clear and unambiguous support for immutable variables and clearly distinguish them from mutable ones.

Conclusion


There's certainly a lot more to a programming language than just variables, and we haven't even touched on readability yet.  But I hope that even this short trip through language design has been interesting.   How does your favorite language stack up?  What about the language you work in the most?  None of mine fare particularly well so far, but I'm anxious to continue the analysis in future posts.  There's a lot of ground to cover and I hope you'll stay with me.

No comments:

Post a Comment