Wednesday, February 20, 2008

Literate Programming

The primary goal of literate programming is to create well documented code. Accordingly, a literate programmer simultaneously authors both the documentation and the code. Macros enable both the abstraction and a more intuitive ordering of code. Abstraction is performed primarily to improve understanding, secondarily to foster reuse. Compilation of the documentation (called “weaving”) produces a well formatted TeX document which contains the embedded code as well, ordered in an intuitive fashion. Compilation of the code (called “tangling”) is a single-level process whereby documentation is eliminated, macros are expanded, and the desired code is thereby generated. Macros have similar capabilities and drawbacks of C macros. CWEB, in fact, employs C macros to do much of its work. The primary advantage of literate programming is nicely formatted, readily approachable documentation of the code base.

Literate programming is missing the following components of a Template Oriented Programming (TOP) system:

  • Model syntax
  • Multi-level model to model translation
  • Custom errors to identify model misuse
  • Model queries
  • Translation debugging

Most of the CWEB manual focuses on formatting of documentation and code, whereas most of a TOP system focuses on model translation. Only the lowest level textual output of a TOP translation focuses on formatting. In a TOP system, the model serves as the documentation. A model pretty-printer can create a well-formatted, hyperlinked, read-only view of the model. Syntax highlighting within a text editor can be employed to create a reasonably formatted writable version of the model.

Literate programming can be a useful way to organize code. It is hoped that a TOP system will provide many of the benefits of literate programming, but also yield a considerably higher return on investment with respect to abstraction. For example, a literate programmer must always be aware of the underlying code being produced. In contrast, a TOP programmer should be able to think at the level of the model being created, for the most part ignoring the translation of that model to lower level model/code.

Monday, February 18, 2008

Template Oriented Programming

The phrase Template Oriented Programming is new. If you don’t believe me, type that quoted phrase into Google and you’ll find only about 5 unique hits! What is template oriented programming, you ask. This blog post seeks to answer that, and will hopefully, in part, make its way to Wikipedia, and, with any luck at all, to CS201 at your favorite educational institution.

Template oriented programming (TOP), as its title hints, is a programming paradigm that focuses on templates to accomplish a programmer’s goals. Note that when I use the word template here, I am not talking about a specific programming specific language feature such as (C++ template, C# generics, etc). The goal of template oriented programming is not to directly create machine code (or byte codes) to be executed on a (virtual) machine. Instead, the output of TOP is information to be processed by other code (or by humans). This could include familiar template output such as HTML/XML, or more complex code like C/C++/Java/C#, shell scripts, VHDL/Verilog, CAD scripts, etc. TOP allows one to generate one or more tedious artifacts from a simple, user defined model that is input.

Similar to the paradigm shift that occurred when moving from structured programming to object oriented programming, or from unmanaged to managed, garbage collected code, programming in the TOP paradigm requires a different way of looking at the information. The programmer starts with what he desires to have as output of a given set of code. This output is then abstracted as necessary to allow customization based on the input. The abstractions include: Simple field substitution similar to word processing form letters, extracting parts of the template into sub-templates that can be reused, looping over part of a template, making part of a template conditional, etc. The paradigm shift requires thinking about what should be produced instead of thinking about what the computer should do step by step.

This discussion leads to a short rabbit trail. Imperative programming languages (Fortran, BASIC, and C for example), evolved from machine language, and focus on telling the computer what to do. Declarative programming languages (Lisp, Prolog, XML), on the other hand, describe relationships between information. The drawback of pure declarative languages is that they cannot do anything, since they have no imperative aspect. Thus, most declarative languages have some way to start or otherwise control the execution of the program.

TOP applies the technique of declarative programming to code generation. This allows things like referring to information that does “not yet” exist. Such references, when first evaluated, will be null. However, when the information is “later” created, the reference will be created correctly. Notice the time related words were placed in quotes. That is because, in TOP one does not think about time. The processing order is handled by the system. Relieving this burden from the programmer eliminates many common bugs, and frees the mind of the programmer to focus on the more valuable, customer driven requirements.

A simple example is in order at this point. Let’s say a program was to evaluate a parenthesized mathematical expression. Assume the expression is mapped to a tree, with the leaves being the inner most expressions. A traditional imperative evaluator must find the leaves, evaluate these first, and then work up. A declarative solution, on the other hand, can simply state the desired result for each operator. All sub expressions are evaluated independently, and parent expression results will eventually be updated whenever a sub expression result changes. The system is responsible for making sure the result at the top is correct based on the last changes to any sub-expressions.

Declarative programming solutions have predominantly been used only in academia. They have many problems that make them inadequate for most commercial purposes. These issues, however, are mostly related to the runtime use of a declarative system: Difficulty handling the updating of program state, performance issues associated with information updating, etc. The TOP paradigm, instead, uses declarative techniques at compile time, avoiding many of these drawbacks.

This field is similar to that of template meta-programming. Template meta-programming systems are generally applied to a single, existing language. C++ templates, for example, allow adding templates to the C++ language. The focus is how to add compile time abstraction to the C++ language. TOP, on the other hand focuses on the templates, with the output language (or languages) being a secondary concern. In addition, the target language is defined by the programmer.

Template metaprogramming and template processing are not new. What is Precision Software doing that will breath life into these currently niche paradigms? Template solutions today focus on the getting the characters in the right order to produce a computer program. Our solution, however, focuses on a model of the information that is not character based. In this respect, it is similar to Lisp. The system supports translating between input and output, where neither are text—both are model. The closest analogy is to imagine operating on the abstract syntax tree within a compiler. This frees the programmer from having to continually parse the input, and unparsed (convert to text) the output. This approach allows information to be represented in a normalized fashion, translated with one or more stages within the tree format very concisely, and then output to one or more target artifacts.

Friday, February 8, 2008

Intentional Programming

On the surface, our work would seem very similar to Intentional Programming (IP). However, there are some important differences. IP allows for the creation of arbitrary constructs with unconstrained semantics and unconstrained visualization. At the time I worked in IP, both reduction (R3) and visualization were constructed with imperative code. Having this level of power definitely falls in the category of creating radically new “languages”, in the same sense that imperative textual compilers are capable of create new languages (e.g. C++, C#, D, Java, Smalltalk, etc).

What we provide is a shared fabric for abstraction. Our semantics and visualization are constrained. All information is represented using a common model, and that model is visualized uniformly with only occasional deviations for brevity. Once information is represented as model, templates may be used to abstract that model in a controlled, type-checked fashion. Templates are used both to abstract above a model and to translate higher level model semantics into lower level model semantics.

Due to constrained visualization, our system will not work well as a structured GPL editor. This is one of IP's goals, but is not one of our goals. Instead, we believe that GPL code is stored and edited quite well as text. Whereas an IP code base may consist of quite a lot of structured GPL (this is, in fact, exactly what the IP code base itself does consist of), a code base in our system will consist of layers with a handful of high-level abstractions at the top and textual GPL code at the lowest level leaves. Terse visualization of model is therefore less important, and a uniform way to view and abstract every level is far more important.

One lesson we've learned is that abstraction breaks fancy visualization. Since our whole world is about abstraction, visualization must be quite general and not depend on a fully realized model structure because the model structure within any given template may only be partially realized. On the other hand, the editor is completely context aware, so that even within a highly abstracted model, it can still intelligently prompt for valid relations and termini. Also, templates at all levels must conform to the syntax within which they are operating, yielding a "type-checked" expansion ("reduction" in IP terminology) from top to bottom.