Are VHLLs Really High-Level?

by Greg Wilson
12/01/1999

Content
  1. Introduction
  2. The Current State of Play
  3. Low-Level In, Low-Level Out
  4. You're Not in the 1970s Anymore, Dorothy
  5. Some Modest Proposals
  6. Conclusion
  7. Bibliography

Introduction

In the last twenty years, we have lived through the desktop revolution, the advent of GUI interfaces, and the rise of the Internet. Our profession has changed the working lives of hundreds of millions of people, to the point where a secretary thinks nothing of sketching a new office floor plan with a WYSIWYG drawing tool, then embedding that diagram in a letter written with an editor whose native storage format is richly structured.

Now contrast that with the programmer in the next cubicle. She is probably sitting in front of a glass TTY like emacs or the Visual Studio editor, transcribing the hexadecimal addresses presented by a debugger to create a hand-drawn box-and-arrow picture of an improperly-linked list. There is no simple way for her to embed that picture in her source code. She will therefore probably have to re-draw two weeks from now, when she realizes that the recursive makefile she was using wasn't setting debugging flags properly in the sub-sub-directory that contained the source for the linked list class.

Once you open your eyes, it's hard not to believe that programmers want to change everything except the way they themselves work. One sign of this is the silent veto most programmers have given to better working practices, even though there is overwhelming empirical evidence that these practices would make them more productive [McConnell 1996].

Another sign is the subject of this article. Over the past few years, I have done several medium-sized projects using both Perl and Python. At first, I was very excited by what these Very-High Level Languages (VHLLs) let me do, and how quickly. The more I played with them, however, the less satisfied I was. In particular, I no longer believe that they deserve the 'V' in their name. This article explores why, and suggests some ways in which they could evolve.

The State of Play

So, what is a "Very High-Level Language" anyway? The term first appeared in the mid-1990s to identify a group of languages typically used for rapid prototyping, one-time scripting, and similar tasks [USENIX 1994]. Among their common features are:

  • interactive interpretation (often, but not necessarily, preceded by silent compilation to bytecode);

  • dynamic typing: values, rather than variables, have types, and some languages perform many conversions silently and automatically;

  • much less boilerplate than mainstream languages (i.e. no variable declarations or #include directives); and

  • some form of automatic memory management (usually reference counting).

In addition, many VHLL libraries are written by wrapping code written in a lower-level language (such as C) with a higher-level interface.

Much of today's thinking about VHLLs can be traced to the Unix tool AWK [Aho 1988]. Its authors stated explicitly that AWK was designed to support the filter-style programming of most common Unix tools. Line-oriented input and output was handled automatically, as was pattern-matching on those lines. Memory was allocated and deallocated as needed, variables didn't need to be declared, and both dynamic arrays and dictionaries were built into the language. In short, AWK handled all the common problems involved in building a generic Unix filter, and left the programmer free to concentrate on issues that were specific to a particular filter.

Twenty years later, AWK has been superseded by Perl and Python, Tcl and Visual Basic have evolved to do for GUIs what AWK did for streams of text, and Scheme (which predates AWK) has retroactively been added to the same family of languages in many people's minds. Like AWK, all of these languages handle many everyday issues automatically. Because they are so often used to drive other software (using shell calls, wrapper libraries, or COM), these languages are often referred to as "scripting languages". Many tasks that were once done using shell scripts are now done using Perl or Python, as are tasks such as CGI scripting that would have been impractially difficult with /bin/sh and its offspring.

However, the rest of the programming language world has not stood still in those twenty years either. C has largely been replaced today by C++ and Java, both of which let programmers work at a much higher level of abstraction. As a result, the gap between "production" and "scripting" languages has become much narrower--so much narrower that it is no longer clear exactly what the added value of the latter category is. Built-in dictionaries? Java has them, and so does C++'s Standard Template Library (STL). Regular expressions? You can get at them through a library interface as easily from C++ or Java as you can from Python. Automatic memory management? Many C++ libraries use some combination of overloaded assignment operators and reference counting to take care of things just as well (or rather, just as poorly) as Perl and Python. Java, on the other hand, has real garbage collection, so that programmers can build graphs, or pass callbacks around, without worrying about the possibility that they are creating circular references.

In fact, the only two advantages that so-called VHLLs still have over their "merely HLL" counterparts are dynamic typing and interactive interpretation. Many experienced developers question whether the first is really a good thing: for everyone who believes that strong typing is a crutch for people with weak memories, there is someone else who argues that strong typing helps catch, or prevent, many errors, and thereby reduces program development time.

Similarly, a modern development environment like Microsoft Visual C++ is effectively as interactive as the Python command line. If I make a three-line change to a C++ program, then press F5 to re-compile, re-link, and run the executable under the debugger, I'm back into my debugging session faster than if I make a similar change to my Python script and re-run it. And yes, it is handy to be able to interrupt a program, change the implementation of a method, and then have the program continue, but I simply don't believe this makes much difference to overall development time once programs grow above a certain (relatively small) size.

I therefore think that the future development of VHLLs should be guided by the answers to two deeper questions:

  1. Can VHLLs be more than just wrappers around other software?

  2. Can VHLLs be used to drag programmers out of the 1970s?

Low-Level In, Low-Level Out

As mentioned earlier, many VHLL modules are built as wrappers around pre-existing libraries written in languages such as C. This is a quick way to add functionality into the VHLL, just as building a C-to-Fortran call bridge is a quick way to get access to high-performance numerical libraries.

However, modules built in this way are almost guaranteed to be as low-level as their starting points. As a result, while modules written on a clean sheet of paper tend to be noticeably higher-level than modules based on legacy libraries, the latter tend to colonize their respective ecological niches first, and thereby prevent the former from ever evolving.

For example, compare Perl's widely-used HTML generation and scripting module CGI.pm with its relational database interface modules. The former encourages programmers to manipulate an abstract tree representing hypertext entities, and easy conversion between that representation and others (such as arrays of list elements). The database interfaces, on the other hand, require programmers to do most of their work by constructing strings that (hopefully) consist of legal SQL. Such "programming with sprintf" is the VHLL equivalent of goto statements, but seems unremarkable to many programmers because "we've always done it that way".

As another example, compare make and Cons. While make is probably the most widely used auxiliary programming tool in the world, its mish-mash of declarative and imperative syntax makes even Perl look readable, and it is very poorly suited to large projects. Cons [Sidebotham 1996], on the other hand, is a Perl module. Customizing the build process for a particular project is a matter of passing some file names as constructor arguments, or overriding some methods, rather than learning a complex, arbitrary, and specialized syntax. What's more, integration with other build activities (such as running regression tests) is much easier.

You're Not in the 1970s Anymore, Dorothy

I think that the relative stasis of scripting languages during the last decade is a reflection of a surprising conservatism. When you write a Python (or Java, or C++) program, you are essentially travelling back in time to the TTY world of the late 1970s. Ever since MacWord appeared in the mid-1980s, non-programmers been able to include line drawings of floor plans in documents. Why can I still not include a line drawing of my data structures as a comment in my program? More seriously, why should the fact that Pascal was the dominant teaching language of the 1970s and early 1980s make the developers of today's VHLLs so reluctant to adopt features from other language families?

"Oh, well," you say. "Many programmers still use emacs or vi, or some other ASCII editor. If you allowed WYSIWYG source, they wouldn't be able to read each other's programs. And most programmers don't know enough about programming languages to make heads or tails of lazy evaluation, type inference, or exotic concurrency mechanisms." The first point is true, but the fact that a few people still use lynx as a browser doesn't stop the rest of us from putting image maps in web pages. As for the second, the speed with which the C++ community has adopted generic programming (in the form of templates), and the degree to which multi-threaded programming is now taken for granted (in Java) leads me to believe that the average programmer is actually pretty smart.

Reviewers of early versions of this article tried to explain the conservatism of VHLLs in several ways. First, some reviewers felt that the VHLLs we have are good enough, and that greater improvements in productivity would come from better libraries and supporting tools, rather than yet more syntax. I agree that the real need is higher-level libraries, but think that languages such as Linda, Haskell, Icon, and J have proved that the incorporation of certain language features makes it much easier to build libraries in some domains.

A refinement of this argument was that while new language features might be useful on their own, their overall effect would be small, or even negative, because they would complexify the language. The addition of references to Perl, for example, made an already hard-to-read syntax even worse. One reviewer pointed out the way in which "simpler" scripting languages, like VBScript and PerlScript, keep appearing, then growing, until there is room underneath them for yet another "simpler" language to appear.

I think the solution here is that language designers need the courage to throw things away. Every Python tutorial I have read, for example, devotes a few paragraphs to justifying the existence of tuples. Their functionality is a strict subset of the functionality of Python lists--why not bite the bullet and get rid of them? Similarly, Python's three-level memory hierarchy, and its requirement that the body of a lambda can only be an expression, makes it needlessly difficult to write applicative programs. I believe that Scheme and other languages have proved that applicative programming is as powerful, as general, and as comprehensible an abstraction as object-oriented programming. Why not upgrade Python to let programmers take full advantage of it? Similarly, it must surely be time to take picture-format output out of Perl.

A subtler argument is that even if significant improvements in today's tools are possible, programmers are too overwhelmed by other changes in their environment to take advantage of them. One reviewer said that even if templates were added to Java, he'd be too busy learning SWING and Enterprise JavaBeans to figure them out.

The only long-term answer here is education. I believe that one reason for Java catching on so quickly as an educational language (it is now used in 80% of first-year college computing courses) is that its very conservative design contains little to frighten a generation of professors raised on Pascal. Students who aren't exposed to other programming paradigms as undergraduates will often exercise a silent veto later on by not adopting them, even when they are the best solution to the problem at hand.

VHLLs can help our profession get out of this trap by highlighting which concepts are worth learning. While some university professors have told me that they don't believe there is any industrial demand for Perl programmers (no, I'm not making that up), most feel growing pressure to make the content of courses more relevant to the real world. At the same time, the incorporation of type inference or programming by contract into a language like Python would do a lot to make it more academically respectable.

Some Modest Proposals

So what would I like to see? First and foremost, I want real garbage collection in so-called VHLLs. There is no significant performance penalty compared to reference counting [Jones 1997], and there are demonstrable productivity gains to be had from implementing directed graphs, callbacks, and the like without worrying about whether or not they are introducing circular data references. Until Perl and Python adopt this, I think their advocates should concede that Java is actually a higher-level language.

Second, I want to see a VHLL defined by an XML DTD. Doing this will allow me to put as much information into my program source as my niece can put into the email messages she composes using Netscape. It will also allow programmers to take direct advantage of the coming wave of XML manipulation tools to create class browsers, design recovery aids, and other source manipulation tools. Finally, if a program's source is defined using <method>, <parameter>, and <block> tags, then individual programmers can choose whatever superficial appearance they want. Three different programmers, for example, could view nesting using indentation (Python), curly braces (Perl), or parenthesized prefix notation (Scheme). I believe this would be as big an innovation in practical programming as applets were, and probably more useful.

Third, I would like VHLLs to start incorporating ideas that have emerged, and proved their worth, in the post-AWK era. Icon, Linda, Erlang, Haskell, Eiffel, and data-parallel languages like J and Fortran-90 (yes, Fortran) can all be plundered--err, used as sources of inspiration. Some specific suggestions include:

  • True multi-dimensional data structures. A matrix is not a vector of vectors; the latter is just one possible implementation of matrices, and a weak one at that. Languages such as APL and MATLAB, and the numerical extensions to Python, have demonstrated the power of being able to subscript one array with another, or to slice a doubly-keyed hash using a list of terms. For reasons that made good sense at the time, C's authors chose not to include direct support for multi-dimensional structures. For less good reasons, the authors of C++ and Java followed suit, but that's no reason for VHLLs to pass them by.

  • Tuple spaces. Linda (the original tuple-based programming system) was invented at about the same time as Perl. At its core is the notion of a tuple space, an associative memory represented as a bag of tuples. These ideas are now central to Sun's JavaSpaces [Freeman 1999], a high-level interface to Jini. Direct support for tuple spaces might well prove to be ubiquitous computing's killer app, in the same way that CGI scripting was for web-centric computing.

  • Type inference. ML and other languages have proved that strong typing doesn't necessarily require type declarations. These languages have also shown that the combination of a strong type system and an applicative programming style is at least as powerful (and leads to at least as large an improvement in programmer productivity) as the encapsulation, inheritance, and polymorphism of the object-oriented model.

  • Lazy evaluation. Just as the Boolean operators and and or only evaluate as many arguments as they need to in C-like languages, so too can user-defined functions be written so that they only evaluate as much of their input as they must in order to produce an answer. Lazy evaluation simplifies many programming tasks (such as stream processing), and ways of implementing it efficiently are well-known.

  • Programming by contract. The greatest weakness of the object-oriented model is that the semantics of classes and methods can change both through derivation and over time. One of the most useful innovations in Eiffel is the use of pre-conditions and post-conditions to specify a contract that a class and its derivatives guarantee to maintain. As [Szyperski 1999] argues, true component-based programming isn't possible without guarantees of this kind.

Conclusion

I like using VHLLs to explore new problem spaces, or to build production versions of solutions to problems that I think I understand. However, I believe that VHLL design has been stuck in a rut for at least a decade, and that Perl, Python, and other languages have some work to do if they are to earn the 'V' in that acronym. I think we can do it, and I think programming in general would be better off if we did.


Bibliography

[Aho 1988] Alfred V. Aho, Brian W. Kernighan, and Peter J. Weinberger: The AWK Programming Language. Addison-Wesley, 1988, 020107981X.

[Freeman 1999] Eric Freeman, Susan Hupfer, and Ken Arnold: JavaSpaces Principles, Patterns, and Practice. Addison-Wesley, 1999, 0201309556.

[Jones 1997] Richard Jones and Rafael D. Lins: Garbage Collection. John Wiley & Sons, 1996, 0471941484, http://www.ercb.com/ddj/1997/ddj.9709.html.

[USENIX 1994] Tom Christiansen et al (eds.): USENIX 1994 Very High Level Languages Symposium Proceedings. October 26-28, 1994, Santa Fe, New Mexico

[McConnell 1996] Steve McConnell: Rapid Development. Microsoft Press, 1996, 1556159005, http://www.ercb.com/feature/feature.0004.html.

[Sidebotham 1996] Bob Sidebotham: Cons: A Software Construction System. FORE Systems, 1996, http://www.dsmit.com/cons/.

[Szyperski 1999] Clemens Szyperski: Component Software. Addison-Wesley, 1998, 0201178885, http://www.ercb.com/ddj/1999/ddj.9905.html.