Language Design Issues

Terrence W. Pratt and Marvin V. Zelkowitz. January, 2001.

Contents:

1. Why Study Programming Languages?
2. A Short History of Programming Languages
3. Role of Programming Languages
4. Programming Environments

1. WHY STUDY PROGRAMMING LANGUAGES?

Hundreds of different programming languages have been designed and implemented. Even in 1969, Sammet [SAMMET 1969] listed 120 that were fairly widely used, and many others have been developed since then. Most programmers, however, never venture to use more than a few languages, and many confine their programming entirely to one or two. In fact, practicing programmers often work at computer installations where use of a particular language such as Java, C, or Ada is required. What is to be gained, then, by study of a variety of different languages that one is unlikely ever to use?

There are excellent reasons for such a study, provided that you go beneath the superficial consideration of the "features" of languages and delve into the underlying design concepts and their effect on language implementation. Six primary reasons come immediately to mind:

1. To improve your ability to develop effective algorithms. Many languages provide features that when used properly are of benefit to the programmer but when used improperly may waste large amounts of computer time or lead the programmer into time-consuming logical errors. Even a programmer who has used a language for years may not understand all of its features. A typical example is recursion, a handy programming feature that when properly used allows the direct implementation of elegant and efficient algorithms. But used improperly, it may cause an astronomical increase in execution time. The programmer who knows nothing of the design questions and implementation difficulties that recursion implies is likely to shy away from this somewhat mysterious construct. However, a basic knowledge of its principles and implementation techniques allows the programmer to understand the relative cost of recursion in a particular language and from this understanding to determine whether its use is warranted in a particular programming situation. New programming methods are constantly being introduced in the literature. The best use of concepts like object-oriented programming, logic programming, or concurrent programming, for example, requires an understanding of languages that implement these concepts. New technology, such as the Internet and World Wide Web, change the nature of programming. How best to develop techniques applicable in these new environments depends on an understanding of languages.
2. To improve your use of your existing programming language. By understanding how features in your language are implemented, you greatly increase your ability to write efficient programs. For example, understanding how data such as arrays, strings, lists, or records are created and manipulated by your language, knowing the implementation details of recursion, or understanding how object classes are built allows you to build more efficient programs consisting of such components.
3. To increase your vocabulary of useful programming constructs. Language serves both as an aid and a constraint to thinking. People use language to express thoughts, but language serves also to structure how one thinks, to the extent that it is difficult to think in ways that allow no direct expression in words. Familiarity with a single programming language tends to have a similar constraining effect. In searching for data and program structures suitable to the solution of a problem, one tends to think only of structures that are immediately expressible in the languages with which one is familiar. By studying the constructs provided by a wide range of languages, and the manner in which these constructs are implemented, a programmer increases his programming "vocabulary." The understanding of implementation techniques is particularly important, because in order to use a construct while programming in a language that does not provide it directly, the programmer must provide his own implementation of the construct in terms of the primitive elements actually provided by the language. For example, the subprogram control structure known as coroutines is useful in many programs, but few languages provide a coroutine feature directly. A C or a FORTRAN programmer, however, may readily design a program to use a coroutine structure and then implement them as C or FORTRAN programs if familiar with the coroutine concept and its implementation.
4. To allow a better choice of programming language. When the situation arises, a knowledge of a variety of languages may allow choice of just the right language for a particular project, thereby reducing the required coding effort. Applications requiring numerical calculations can be easily designed in languages like C, FORTRAN, or Ada. Developing applications useful in decision making, such as in artificial intelligence applications, would be more easily written in LISP, ML, or Prolog. Internet applications are more readily designed using Perl and Java. Knowledge of the basic features of each language's strengths and weaknesses gives the programmer a broader choice of alternatives.
5. To make it easier to learn a new language. A linguist, through a deep understanding of the underlying structure of natural languages, often can learn a new foreign language more quickly and easily than the struggling novice who understands little of the structure even of his native tongue. Similarly, a thorough knowledge of a variety of programming language constructs and implementation techniques allows the programmer to learn a new programming language more easily when the need arises.
6. To make it easier to design a new language. Few programmers ever think of themselves as language designers, yet any program has a user interface that in fact is a form of programming language. A designer of a user interface for a large program such as a text editor, an operating system, or a graphics package must be concerned with many of the same issues that are present in the design of a general-purpose programming language. Many new languages are based on C or Pascal as implementation models. This aspect of program design is often simplified if the programmer is familiar with a variety of constructs and implementation methods from ordinary programming languages.

There is much more to the study of programming languages than simply a cursory look at their features. In fact, many similarities in features are deceiving. The same feature in two different languages may be implemented in two very different ways, and thus the two versions may differ greatly in the cost of use. For example, almost every language provides an addition operation as a primitive, but the cost of performing an addition in C or COBOL or Smalltalk may vary by an order of magnitude. The study of programming languages must necessarily include the study of implementation techniques, particularly techniques for the run-time representation of different constructs.

[contents]

2. A SHORT HISTORY OF PROGRAMMING LANGUAGES

Programming language designs and implementation methods have evolved continuously since the earliest high-level languages appeared in the 1950s. The first versions of FORTRAN and LISP were designed during the 1950s; Ada, C, Pascal, Prolog, and Smalltalk date from the 1970s; and C++, ML, Perl and Postscript date from the 1980s; and Java dates from the 1990s. In the 1960s and 1970s, new languages were often developed as part of major software development projects. When the U.S. Department of Defense did a survey as part of its background efforts in developing Ada in the 1970s, it found that over 500 languages were being used on various defense projects.

2.1 Development of Early Languages

We briefly summarize language development during the early days of computing, generally from the mid-1950s to the early 1970s.

Numerically based languages. Early computer technology dates from the era just before World War II in the late 1930s to the early 1940s. These early machines were designed to solve numerical problems and were thought of as electronic calculators. It is not surprising, then, that numerical calculations were the dominant form of application for these early machines.

In the early 1950s, symbolic notations started to appear. Grace Hopper led a group at Univac to develop the A-0 language, and John Backus developed Speed-coding for the IBM 701. Both were designed to compile simple arithmetic expressions into executable machine language.

The real breakthrough occurred from 1955 through 1957 when Backus led a team to develop FORTRAN, or FORmula TRANslator. As with the earlier efforts, FORTRAN data were oriented around numerical calculations, but the goal was a full-fledged programming language including control structures, conditionals, input and output statements. Since few believed that the resulting language could compete with hand-coded assembly language, every effort was put into efficient execution, and various statements were designed specifically for the IBM 704. Concepts like the three-way arithmetic branch of FORTRAN came directly from the hardware of the 704, and statements like READ INPUT TAPE seem quaint today. It wasn't very elegant, but in those days little was known about "elegant" programming and the language was fast for the given hardware.

FORTRAN was extremely successful — so successful that it changed programming forever, and probably set the stage for its eventual replacement by other languages. FORTRAN was revised as FORTRAN II in 1958 and FORTRAN IV a few years later. Almost every manufacturer implemented a version of the language, and chaos reigned. Finally in 1966, FORTRAN IV became a standard under the name FORTRAN 66 and has been upgraded twice since, to FORTRAN 77 and FORTRAN 90. However, the extremely large number of programs written in these early dialects has caused succeeding generations of translators to be mostly backward compatible with these old programs and inhibits the use of modern programming features.

Because of the success of FORTRAN, there was fear, especially in Europe, of the domination by IBM of the industry. GAMM (the German society of applied mathematics) organized a committee to design a universal language. In the United States, the Association for Computing Machinery (ACM) also organized a similar committee. Although there was initial fear by the Europeans of being dominated by the Americans, the committees merged. Under the leadership of Peter Naur, the committee developed the International Algorithmic Language (IAL). Although ALGOrithmic Language (ALGOL) was proposed, the name was not approved. However, common usage forced the official name change, and the language became known as ALGOL 58. A revision occurred in 1960 and ALGOL 60 (with a minor revision in 1962) became the standard "academic" computing language from the 1960s to the early 1970s.

Although FORTRAN was designed for efficient execution on an IBM 704, ALGOL had very different goals:

1. ALGOL notation should be close to standard mathematics.
2. ALGOL should be useful for the description of algorithms.
3. Programs in ALGOL should be compilable into machine language.
4. ALGOL should not be bound to a single computer architecture.

These turned out to be very ambitious goals for 1957. To allow for machine independence, no input or output was included in the language; special procedures could be written for these operations. While that certainly made programs independent of a particular hardware, it also meant that each implementation would necessarily be incompatible with another. In order to keep close to "pure" mathematics, subprograms were viewed as macro substitutions, which led to the concept of call by name parameter passing, which is extremely hard to implement well.

ALGOL never achieved commercial success in the United States, although it did achieve some success in Europe. However, it did have an impact beyond its use. As one example, Jules Schwartz of  SDC developed a version of IAL (Jules' Own Version IAL, or JOVIAL), which became a standard for U.S. Air Force applications.

Backus was editor of the ALGOL report defining the language [BACKUS 1960]. He used a syntactic notation comparable to the context free language concept developed by Chomsky [CHOMSKY 1959]. This was the introduction of formal grammar theory to the programming language world. Because of his and Naur's role in developing ALGOL, the notation is now called Backus Naur Form (BNF).

As another example of ALGOL's influence, Burroughs, a computer vendor that has since merged with Sperry Univac to form Unisys, discovered the works of a Polish mathematician named Lukasiewicz. Lukasiewicz had developed an interesting technique that enabled arithmetic expressions to be written without parentheses, with an efficient stack-based evaluation process. Although not a major mathematical result, this technique has had a profound effect on compiler theory. Using methods based on Lukasiewicz's technique, Burroughs developed the B5500 computer hardware based upon a stack architecture and soon had an ALGOL compiler much faster than any existing FORTRAN compiler.

At this point the story starts to diverge. The concept of user-defined types developed in the 1960s, and neither FORTRAN nor ALGOL had such features. Simula-67, developed by Nygaard and Dahl of Norway, introduced the concept of classes to ALGOL. This gave Stroustrup the idea for his C++ classes as an extension to C later in the 1980s. Wirth developed ALGOL-W in the mid-1960s as an extension to ALGOL. This design met with only minor success; however, between 1968 and 1970 he developed Pascal, which became the computer science language of the 1970s. Another committee tried to duplicate ALGOL 60's success with ALGOL 68, but the language was radically different and much too complex for most to understand or implement effectively.

With the introduction of its new 360 line of computers in 1963, IBM developed NPL (New Programming Language) at its Hursley Laboratory in England. After some complaints by the English National Physical Laboratory, the name was changed to MPPL (Multi-Purpose Programming Language), which was then shortened to just PL/I. PL/I merged the numerical attributes of FORTRAN with the business programming features of COBOL. PL/I achieved modest success in the 1970s, but its use today is dwindling, being replaced by C, C++ and Ada. The educational subset PL/C achieved modest success in the 1970s as a student PL/I compiler. BASIC was developed to satisfy the numerical calculation needs of the nonscientist but has been extended far beyond its original goal.

Business languages. Business data processing was an early application domain to develop after numerical calculations. Grace Hopper led a group at Univac to develop FLOWMATIC in 1955. The goal was to develop business applications using a form of English-like text. In 1959 the U.S. Department of Defense sponsored a meeting to develop Common Business Language (CBL), which would be a business-oriented language that used English as much as possible for its notation. Because of divergent activities from many companies, a Short Range Committee was formed to quickly develop this language. Although they thought they were designing an interim language, the specifications, published in 1960, were the designs for COBOL (COmmon Business Oriented Language). COBOL was revised in 1961 and 1962, standardized in 1968, and revised again in 1974 and 1984.

Artificial intelligence languages. Interest in artificial intelligence languages began in the 1950s with IPL (Information Processing Language) by the Rand Corporation. IPL-V was fairly widely known, but its use was limited by its low-level design. The major breakthrough occurred when John McCarthy of MIT designed LISP (LISt Processing) for the IBM 704. LISP 1.5 became the "standard" LISP implementation for many years. More recently, Scheme and Common LISP have continued that evolution.

LISP was designed as a list-processing functional language. The usual problem domain for LISP involved searching. Game playing was a natural test bed for LISP, since the usual LISP program would develop a tree of possible moves (as a linked list) and then walk over the tree searching for the optimum strategy. An alternative paradigm was string processing where the usual solution involved the transformation of text from one format to another. Automatic machine translation, where strings of symbols could be replaced by other strings, was the natural application domain. COMIT, by  Yngve of MIT, was an early language in this domain. Each program statement was very similar to a context-free production and represented the set of replacements that could be made if that string were found in the data. Since Yngve kept his code proprietary, a group at AT&T Bell Labs decided to develop their own language, which resulted in SNOBOL.

Although LISP was designed for general-purpose list-processing applications, Prolog was a special-purpose language whose basic control structure and implementation strategy was based on concepts from mathematical logic.

Systems languages. Because of the need for efficiency, the use of assembly language held on for years in the systems area long after other application domains started to use higher-level languages. Many systems programming languages, such as CPL and BCPL, were designed, but were never widely used. C changed all that. With the development of a competitive environment in UNIX written mostly in C during the early 1970s, high-level languages have been shown to be effective in this environment, as well as in others.

2.2 Evolution of Software Architectures

Development of a programming language does not proceed in a vacuum. The hardware that supports a language has a great impact on language design. Language, as a means to solve a problem, is part of the overall technology that is employed. The external environment supporting the execution of a program is termed its operating or target environment. The environment in which a a program is designed, coded, tested, and debugged, or host environment, may be different from the operating environment in which the program ultimately is used. The computing industry has now entered its third major era in the development of computer programs. Each era has had a profound effect on the set of languages that were used for applications in each time period.

Mainframe Era

From the earliest computers in the 1940s through the 1970s, the large mainframe dominated computing. A single expensive computer filled a room and was attended to by horde of technicians.

Batch environments. The earliest and simplest operating environment consists only of external files of data. A program takes a certain set of data files as input, processes the data, and produces a set of output data files, e.g., a payroll program processes two input files containing master payroll records and weekly pay-period times and produces two output files containing updated master records and pay-checks. This operating environment is termed batch-processing because the input data are collected in "batches" on files and are processed in batches by the program. The 80-column punched card or Hollerith card, named after Herman Hollerith who developed the card for use in the 1890 U.S. census, was the ubiquitous sign of computing in the 1960s.

Languages such as FORTRAN, COBOL, and Pascal were initially designed for batch-processing environments, although they may be used now in an interactive or in an embedded-system environment.

Interactive environments. Towards the end of the mainframe ear, in the early 1970s, interactive programming made its appearance. Rather than developing a program on a deck of cards, cathode ray tube terminals were directly connected to the computer. Based on research in the 1960s at MIT's Project MAC and Multics, the computer was able to time share by enabling each user to have a small slice of the computer's processors time. Thus, if 20 users were connected to a computer, and each user had a time slice of 25 milliseconds, then each user would have two such slices or 50 milliseconds of a computer time each second. Because many users spent much of their time at a terminal thinking, the few who were actually executing programs would often get more than their quota of two slices per second. 

In an interactive environment, a program interacts directly with a user at a display console during its execution, by alternately sending output to the display and receiving input from the keyboard or mouse. Examples include word-processing systems, spreadsheets, video games, database management systems, and computer-assisted instruction systems. These examples are all tools, with which you may be familiar.

Effects on language design. In a language designed for batch processing, files are usually the basis for most of the input-output structure. Although a file may be used for interactive input-output to a terminal, the special needs of interactive I/O are not addressed in these languages. For example, files are usually stored as fixed-length records, yet at a terminal the program would need to read each character as it is entered on the keyboard. The input-output structure also typically does not address the requirement for access to special I/O devices found in embedded systems.

In a batch-processing environment, an error that terminates execution of the program is acceptable but costly, because often the entire run must be repeated after the error is corrected. In this environment, too, no external help from the user in immediately handling or correcting the error is possible. Thus the error- and exception-handling facilities of the language emphasize error/exception handling within the program so that the program may recover from most errors and continue processing without terminating.

A third distinguishing characteristic of a batch-processing environment is the lack of timing constraints on a program. The language usually provides no facilities for monitoring or directly affecting the speed at which the program executes. 

The characteristics of interactive input-output are sufficiently different from ordinary file operations that most languages designed for a batch-processing environment experience some difficulty in adapting to an interactive environment. C, as an example, includes functions for accessing lines of text from a file and other functions that directly input each character as typed by the user at a terminal. The direct input of text from a terminal in Pascal, however, is often very cumbersome. For this reason, C (and its derivative C++) has greatly grown in popularity as a language for writing interactive programs.

Error handling in an interactive environment is given different treatment. If bad input data are entered from a keyboard, the program may display an error message and ask for a correction from the user. Language features for handling the error within the program (e.g., by ignoring it and attempting to continue) are of lesser importance. However, termination of the program in response to an error is usually not acceptable (unlike batch processing).

Interactive programs must often utilize some notion of timing constraints. For example, in a video game, the failure to respond within a fixed time interval to a displayed scene would cause the program to invoke some response. An interactive program that operates so slowly that it cannot respond to an input command in a reasonable period is often considered unusable.

Personal Computer Era

In hindsight, the mainframe time-sharing era of computing was very short-lived, lasting perhaps from the early 1970s to the mid-1980s. The Personal Computer (PC) changed that.

Personal computers. The 1970s could be called the era of the microcomputer. These were progressively smaller and cheaper machines than the standard mainframe of that era. Hardware technology was making great strides forward, and the microcomputer, which contained the entire machine processor on s single 1- to 2-inch square piece of plastic in silicon, was becoming faster and cheaper each year. The standard mainframe of the 1970s shrunk form a room full of cabinets and tapes drives to a decorative office machine perhaps 3 to 5 feet long and 3 to 4 feet high.

In 1978, Apple released the Apple II computer, the first true commercial PC. It was a small desktop machine that ran BASIC. This machine had a major impact on the educational market; however, business was skeptical of minisized Apple and its minisized computer.

In 1981, all of this changes. The PC was released by IBM, and Lotus developed 1-2-3 based on Visi-Calc spreadsheet program. This program became the first killer application (killer aps) that industry had to run. The PC became an overnight success.

The modern PC era can be traced to January 1984 during the U.S. football Suberbowl game. During a commercial on television, Apple announced the Macintosh computer. The Macintosh contained a windows-based graphical user interface with a mouse for point-and-click data entry. Although previously developed at the Xerox Palto Alto Research Center (PARC), the Macintosh was the first commercial application of this technology. Quickly mimicked by Microsoft for its Windows operating system, this interface design has become the mainstay of the PC.

Since that time, the machines have gotten cheaper and faster. A contemporary PC is about 200 to 400 times faster, has 200 timer the main memory, 3,000 times the disk space, and costs only one third of the $5,000 cost of the original PC 20 years earlier. It is more powerful than the mainframe computers that it replaced.

Embedded-system environments. An offshoot of the PC is the embedded computer. A computer system that is used to control part of a larger system such as an industrial plant, an aircraft, a machine tool, an automobile, or even your toaster is termed an embedded computer system. The computer system has become an integral part of the larger system, and the failure of the computer system usually means failure of the larger system as well. Unlike in the PC environment, where failure of the program often is simply an inconvenience and the program has to be rerun, failure of an embedded application can often be life-threatening, from failure of an automobile computer causing a car to stall at high speeds on a highway, to failure of an on-board computer causing an aircraft engine to shut down during takeoff, to failure of a computer causing a nuclear plant to overheat, to failure of a hospital computer causing patient monitoring to cease, down to failure of your digital watch causing you to be late for a meeting. Reliability and correctness are primary attributes for programs used in these domains. Ada, C, and C++ are used extensively to meet some of the special requirements of embedded-system environments.

Effects on language design. The PC has again changed the role of languages. Performance is now less of a concern in many application domains. With the advent of user interfaces such as windows, each machine executes under control of a single user. With prices so low, the need to time-share is not present. Developing languages with good interactive graphics becomes of primary importance.

Today windows-based systems are the primary user interface. PC users are quite familiar with the tools of the windows interface. They are familiar with windows, icons, scroll bars, menus, and the assorted other aspects of interacting with the computer. However, programming such packages can be complex. Vendors of such windowing systems have created libraries of these packages. Accessing these libraries to enable easy development of windows-based programs is a primary concern of application developers.

Object-oriented programming is a natural model for this environment. The use of languages like Java and C++ with its class hierarchy allows for easy incorporation of packages written by others. 

Programs written for embedded systems often operate without an underlying operating system and without the usual environment of files and I/O devices. Instead, the program must interact directly with nonstandard I/O devices through special procedures that take account of the peculiarities of each device. For this reason, languages for embedded systems often place much less emphasis on files and file-oriented input-output operations. Access to special devices is often provided through language features that give access to particular hardware registers, memory locations, interrupt handlers, or subprograms written in assembly or other low-level languages.

Error handling in embedded systems is of particular importance. Ordinarily each program must be prepared to handle all errors internally, taking appropriate actions to recover and continue. Termination, except in the case of a catastrophic system failure, is often not an acceptable alternative, and usually there is no user in the environment to provide interactive error correction.

Embedded systems almost always operate in real time; that is, the operation of the larger system within which the computer system is embedded requires that the computer system be able to respond to inputs and to produce outputs within tightly constrained time intervals. For example, a computer controlling the flight of an aircraft must respond rapidly to changes in its altitude or speed. Real-time operation of these programs requires language features for monitoring time intervals, responding to delays of more than a certain length of time (which may indicate failure of a component of the system), and starting up and terminating actions at certain designated points in time.

Finally, an embedded computer system is often a distributed system, composed of more than one computer. The program running on such a distributed system is usually composed of a set of tasks that operate concurrently, each controlling or monitoring one part of the system. The main program, if there is one, exists only to initiate execution of the tasks. Once initiated, these tasks usually run concurrently and indefinitely, since they need to terminate only when the entire system fails or is shut down for some reason.

Networking Era

Distributed computing. As machines became faster, smaller, and cheaper during the 1980s, they started to populate the business environment. Companies would have central machines for handling corporate data, such as a payroll, and each department would have local machines for providing support to that department, order processing, report writing, and so on. For an organization to run smoothly, information on one machine had to be transferred and processed on another. For example, the sales office had to send purchase order information to the production department's computer and the financial department needed the information for billing and accounting. Local area networks (LANs) using telecommunication lines between the machines were developed within large organizations using a client-server model of computing. The server would be a program that provided information and multiple client programs would communicate with the server to obtain that information.

An airline reservation system is one well-known example of a client-server application. The database of airline flight schedules would be on a large mainframe. Each agent would run a client program that conveyed information to the agent (or traveler) about flights. If a new flight was desired, the client program would send information to the server program to receive or download information from the server to the client application about the new flights. In this way, a single-server application could serve many client programs.

Internet. The mid-1990s saw the emergence of the distributed LAN into an international global network, the Internet. In 1970, the Defense Advanced Research Projects Agency (DARPA) started a research project to link together mainframe computers into a large reliable and secure network. The goal was to provide redundancy in case of war so that military planners could access computers across the nation. Fortunately, the ARPANET was never put to that use, and the mid 1980s, the military ARPANET evolved into the research-oriented Internet. Over time, additional computers were added to the network, and today any user wide can have a machine added to the network. Millions of machines are connected in a complex and dynamically changing complex of network server machines.

Accessing the Internet in its early days required two classes of computers. A user would be sitting at a client personal computer. To access information, the user would connect to an appropriate server machine to get that information. The protocols for performing those were telnet and file transfer protocol (FTP). The telnet protocol made it appear as if the user were actually executing as part of the distant server, whereas FTP simply allowed the client machine to send or receive files from the server machine. In both cases, the user had to know what machine contained the information that was desired.

At the same time a third protocol was being developed — Simple Mail Transfer Protocol (SMTP). SMTP is the basis for today's e-mail. Each user has a local login name on the client machine, and each machine has a unique machine name. Sending a message to an individual was then a simple manner of using a program that adhered to the SMTP protocol and sending mail to a user at a specific machine. What is important here is that the specific location of the machine containing the user is often unnecessary. There was no need to actually know the address of the machine on the Internet. 

A goal in the late 1980s was to make the retrieval of information as easy to accomplish as sending e-mail. The breakthrough came in 1989 at CERN. the European nuclear research facility in Geneva, Switzerland. Berners-Lee developed the concept of the HyperText markup Language (HTML) as a way to navigate around the Internet. With the development of the Mosaic web browser in 1993 and the HyperText Transfer Protocol (HTTP) addition to Internet technology, the general population discovered the Internet. By the end of the 20th century, everyone was web surfing, and the entire structure of knowledge acquisition and search worldwide had changed.

Effects on language design. The use of the World Wide Web (WWW) has again changed the role of the programming language. Computing is again becoming centralized, but in a way much different from the earlier mainframe era. Large information repository servers are being created worldwide. Users will access servers via the Web to obtain information and use their local client machines for local processing, such as word processing the information into a report. Rather than distributing millions of copies of a new software product, a vendor can simply put the software on the Web and have users download the copies for local use. This requires the use of languages that allow interaction between the client and server computers, such as the user being able to download the software and the vendor being able to charge the user for the privilege of downloading the software. The rise of Electronic commerce (E-commerce) depends on these features.

The initial Web pages were static. That is, text, pictures, or graphics could be displayed. Users could click on a Uniform Resource Locator (URL) to access a new Web page. In order for E-commerce to flourish, however, information had to flow both ways between client and server, and Web pages needed to be more active. Use of languages like Perl and Java provide such features.

The Web poses programming language issues that were not apparent in the previous two eras. Security is one. A user visiting a web site wants to be certain the owner of that site is not malicious and will not destroy the client machine by erasing the disk files of the user. Although a problem with time-sharing systems, this problem did not exist on single user PCs. Access to local user files from the server web site has to be restricted.

Performance is another critical problem. Although PCs have gotten extremely fast, the communication lines connecting a user to the Internet are often limited in speed. In addition, although the machines are fast, if many users are accessing the same server, then server processing power may be taxed. A way out of that is to process the information at the client site rather than on the server. This requires the server to send small executable programs to the client to offload work from the server to the client. The problem is that the server does not know what kind of computer the client is, so it is not clear what the executable program needs to look like. Java was developed specifically to handle this problem.

2.3 Application Domains

The appropriate language to use often depends upon the application domain for the problem to be solved. The appropriate language to use for various application domains has evolved over the past 30 years. Some important languages are summarized in Table 1.

Era Application Major Languages Other Languages
1960s Business COBOL Assembler
Scientific FORTRAN ALGOL, BASIC, APL
System Assembler JOVIAL, Forth
AI LISP SNOBOL
Today Business COBOL, C++, Java, spreadsheet C, PL/I, 4GLs
Scientific FORTRAN, C, C++, Java BASIC
System C, C++, Java Ada, BASIC, Modula
AI LISP, Prolog  
Publishing TEX, Postscript, word processing  
Process UNIX shell, TCL, Perl, JavaScript AWK, Marvel, SED
New Paradigms ML, Smalltalk Eiffel

Table 1 Languages for various application domains.

Applications of the 1960s

During the 1960s, most programming could be divided into four basic programming models: business processing, scientific calculations, systems programming, and artificial intelligence applications.

Business processing. Most of these applications were large data processing applications designed to run on "big iron" mainframes. These included order entry programs, inventory control, personnel management, and payroll. They were characterized by reading in large amounts of historical data on multiple tape drives, reading in a smaller set of recent transactions, and writing out a new set of historical data. For a view of what this looked like, watch any 1960s science fiction movie. They liked to show lots of spinning tapes to indicate "modern computing."

COBOL was developed for these applications. The COBOL designers took great pains to ensure that such data processing records would be processed correctly. Business applications also include business planning, risk analysis, and "what if" scenarios. In the 1960s, it often required several months for a COBOL programmer to put together a typical "what if" application.

Scientific. These applications are characterized by the solution of various mathematical equations. They include numerical analysis problems, solving differential or integral functions, and generating statistics. It is in this realm that the computer was first developed, for use during World War II to generate ballistics tables. FORTRAN has always been the dominant language in this domain. Its syntax has always been close to mathematics, and scientists find it easy to use.

System. For building operating systems and for implementing compilers, no effective language existed. Such applications must be able to access the full functionality and resources of the underlying hardware. Assembly language was often the choice in order to gain efficiency. JOVIAL, a variation on ALGOL, was used on some U.S. Department of Defense projects, and toward the end of the 1960s languages like PL/I were used for this application.

A related application domain is process control, the controlling of machinery. Because of the expense and size of computers during this era, most process control applications were large, such as controlling a power station or automatic assembly line. Languages like Forth were developed to address this application domain, although assembly language was often used.

Al. Artificial intelligence was a relatively new research area, and LISP was the dominant language for AI applications. These programs are characterized by algorithms that search through large data spaces. For example, to play chess, the computer generates many potential moves and then searches for the best move within the time it has to decide what to do next.

Applications of the 21st Century

Although Ada was developed to eliminate much of the duplication among competing languages, the situation today is probably more complex than it was during the 1960s. We have more application domains where programming languages are especially well adapted with multiple choices for each application domain.

Business processing. COBOL is still the dominant language in this domain for data processing applications, although C and C++ are sometimes used. However, the "what if" scenario has totally changed. Today the spreadsheet on the PC has totally reformed this application domain. Whereas it once took a programmer several months for a typical business planning program, today an analyst can "cook up" many spreadsheets in just a few hours.

Fourth-generation languages (4GLs) have also taken over some of this market, 4GLs are languages adapted for specific business application domains and typically provide a window-based programmer interface, easy access to database records, and special features for generating "fill-in-the-blank" input forms and elegant output reports. Sometimes these 4GL "compilers" generate COBOL programs as output.

E-commerce, a term referring to business activity conducted over the WWW, has greatly changed the nature of business programming. Tools that allow for interaction between the user (i.e., purchaser) and company (i.e., vendor) using the Web as the intermediary has given rise to new roles for languages. Java was developed as a language to ensure privacy rights of the user, and process languages such as Perl and JavaScript allow for vendors to obtain critical data from the user to conduct a transaction.

Scientific. FORTRAN is still hanging on here, too, although FORTRAN 90 is being challenged by languages like C++ and Java.

System. C, developed toward the end of the 1960s, and its newer variant C++, dominate this application domain. C provides very efficient execution and allows the programmer full access to the operating system and underlying hardware. Other languages like Modula, and modern variations of BASIC are also used. Although intended for this area, Ada has never achieved its goal of becoming a major language in this domain. Assembly language programming has become an anachronism.

With the advent of inexpensive microprocessors running cars, microwave ovens, video games, and digital watches, the need for real-time languages has increased. C, Ada, and C++ are often used for such real-time processing.

Al. LISP is still used, although modern versions like Scheme and Common LISP have replaced the MIT LISP 1.5 of the early 1960s. Prolog has developed a following. Both languages are adept at searching applications.

Publishing. Publishing represents a relatively new application domain for languages. Word processing systems have their own syntax for input commands and output files. Some books are composed using the TEX text processing system, and for lack of a better term, chapters are "compiled" in order to put in figure and table references, place figures, and compose paragraphs.

The TEX translator produces a program in the Postscript page description language. Although Postscript is usually the output of a processor, it does have a syntax and semantics and can be compiled by an appropriate processor. Often this is the laser printer that is used to print the document. We know of individuals who insist on programming directly in Postscript, but this seems to be about as foolish today as programming in assembly language was in the 1960s.

Process. During the 1960s the programmer was the active agent in using a computer. To accomplish a task the programmer would write an appropriate command that the computer would then execute. However, today we often use one program to control another, e.g., to backup files every midnight; to synchronize time once an hour; to send an automatic reply to incoming electronic mail when on vacation; to automatically test a program whenever it compiles successfully, etc. We call such activities processes, and there is considerable interest in developing languages where such processes can be specified and then translated to execute automatically.

Within UNIX, the user command language is called the shell and programs are called shell scripts. These scripts can be invoked whenever certain enabling conditions occur. Various other scripting languages have appeared; both TCL and Perl are used for similar purposes.

New paradigms. New application models are always under study. ML has been used in programming language research to investigate type theory. While not a major language in industry, its popularity is growing. Smalltalk is another important language. Although commercial Smalltalk use is not very great, it has had a profound effect on language design. Many of the object-oriented features in C++ and Ada had their origins in Smalltalk.

Languages for various application domains are a continuing source of new research and development. As our knowledge of compiling techniques improves, and as our knowledge of how to build complex systems evolves, we are constantly finding new application domains and require languages that meet the needs of those domains.

[contents]

3. ROLE OF PROGRAMMING LANGUAGES

Initially, languages were designed to execute programs efficiently. Computers, costing in the millions of dollars, were the critical resource, whereas programmers, earning perhaps $10,000 annually, were a minor cost. Any high-level language had to be competitive with the execution behavior of hand-coded assembly language. John Backus, chief designer of FORTRAN for IBM in the late 1950s, stated a decade later [IBM 1966]:

Frankly, we didn't have the vaguest idea how the thing [FORTRAN language and compiler] would work out in detail. ... We struck out simply to optimize the object program, the running time, because most people at that time believed you really couldn't do that kind of thing. They believed that machine-coded programs would be so terribly inefficient that it would be impractical for very many applications.

One result we didn't have in mind was this business of having a system that was designed to be utterly independent of the machine that the program was ultimately to run on. It turned out to be a very valuable capability but we sure didn't have it in mind.

There was nothing organized about our activities. Each part of the program was written by one or two people who were complete masters of what they did with very minor exceptions — and the thing just grew like Topsy... [When FORTRAN was distributed] we had the problem of facing the fact that these 25,000 instructions weren't all going to be correct, and that there were going to be difficulties that would show up only after a lot of use.

By the middle of the 1960s when the above quote was made, after the advent of FORTRAN, COBOL, LISP, and ALGOL, Backus already realized that programming was changing. Machines were becoming less expensive, programming costs were rising, there was a growing need for moving programs from one system to another, and maintenance of the resulting product was taking a larger share of computing resources. Rather than compiling programs to work efficiently on a large, expensive computer, the task of a high-level language was to make it easier to develop correct programs to solve problems for some given application area.

Compiler technology matured in the 1960s and 1970s and language technology centered on solving domain-specific problems. Scientific computing generally used FORTRAN, business applications were typically written in COBOL, military applications were written in JOVIAL, artificial intelligence applications were written in LISP, and embedded military applications were to be written in Ada.

Just like natural languages, programming languages evolve or pass out of use. ALGOL from 1960 is no longer used, COBOL use is dropping for business applications, and APL, PL/I, and SNOBOL4, all from the 1960s, have all but disappeared. Pascal, from the early 1970s, is well past its prime, although many of its constructs continue in Ada.

The older languages still in use have undergone periodic revisions to reflect changing influences from other areas of computing. FORTRAN has undergone several standardized revisions, as has COBOL. Ada has a new 1995 standard. LISP has been updated with Scheme and later with Common LISP. Newer languages like C++ and ML reflect a composite of experience gained in the design and use of these and the hundreds of other older languages. Some of these influences include:

1. Computer capabilities. Computers have evolved from the small, slow, and costly vacuum-tube machines of the 1950s to the supercomputers and microcomputers of today. At the same time, layers of operating system software have been inserted between the programming language and the underlying computer hardware. 
2. Applications. Computer use has spread rapidly from the original concentration on military, scientific, business, and industrial applications in the 1950s, where the cost could be justified, to the computer games, personal computers, and applications in every area of human activity seen today. The requirements of these new application areas affect the designs of new languages and the revisions and extensions of older ones.
3. Programming methods. Language designs have evolved to reflect our changing understanding of good methods for writing large and complex programs and to reflect the changing environment in which programming is done.
4. Implementation methods. The development of better implementation methods has affected the choice of features to include in new designs.
5. Theoretical studies. Research into the conceptual foundations for language design and implementation, using formal mathematical methods, has deepened our understanding of the strengths and weaknesses of language features and has thus influenced the inclusion of these features in new language designs.
6. Standardization. The need for "standard" languages that can be implemented easily on a variety of computer systems and that allow programs to be transported from one computer to another has provided a strong conservative influence on the evolution of language designs.
To illustrate, Table 2 briefly lists some of the languages and technology influences that were important during the later half of the 20th century. Of course, missing from this table are the hundreds of languages and influences that have played a lesser but still important part in this history.
 
Years
Influences and New Technology 
1951-1955 Hardware: Vacuum-tube computers; mercury delay line memories
Methods: Assembly languages; foundation concepts; subprograms, data structures
Languages: Experimental use of expression compilers
1956-1960 Hardware: Magnetic tape storage; core memories; transistor circuits
Methods: Early compiler technology; BNF grammars; code optimization; interpreters; dynamic storage methods and list processing.
Languages: FORTRAN, ALGOL 58, ALGOL 60, COBOL, LISP
1961-1965 Hardware: Families of compatible architectures; magnetic disk storage
Methods: Multiprogramming operating systems, syntax-directed compilers
Languages: COBOL-61, ALGOL 60 (revised), SNOBOL, JOVIAL, APL notation
1966-1970 Hardware: Increasing size and speed and decreasing cost; minicomputers; microprogramming; integrated circuits
Methods: Time-sharing and interactive systems; optimizing compilers; translator writing systems
Languages: APL, FORTRAN 66, COBOL 65, ALGOL 68, SNOBOL4, BASIC. PL/I, SIMULA 67, ALGOL-W
1971-1975 Hardware: Microcomputers; Age of minicomputers; small mass storage systems; decline of core memories and rise of semiconductor memories 
Methods: Program verification; structured programming; early growth of software engineering as a discipline of study
Languages: Pascal, COBOL 74, PL/I (standard), C, Scheme, Prolog
1976-1980 Hardware: Commercial-quality microcomputers; large mass storage systems; distributed computing
Methods: Data abstraction; formal semantics; concurrent, embedded, and real-time programming techniques Languages: Smalltalk, Ada, FORTRAN 77, ML
1981-1985 Hardware: Personal computers; first workstations; video games; local-area networks; Arpanet
Methods: Object-oriented programming; interactive environments; syntax-directed editors 
Languages: Turbo Pascal, Smalltalk-80, growth of Prolog, Ada 83, Postscript
1986-1990 Hardware: Age of microcomputer; rise of engineering workstation; RISC architectures; global networking; Internet
Methods: Client/server computing 
Languages: FORTRAN 90, C++, SML (Standard ML)
1991-1995 Hardware: Very fast inexpensive workstations and microcomputers; massively parallel architectures; voice, video, fax, multimedia
Methods: Open systems; environment frameworks
Languages: Ada 95, Scripting languages (TCL, Perl), HTML
1996-2000 Hardware: Computers as inexpensive appliances; Personal digital assistants; World Wide Web; Cable-based home networking; Gigabyte disk storage
Methods: E-commerce

Languages: Java, JavaScript, XML

Table 2 Some major influences on programming languages.

3.1 What Makes a Good Language?

Mechanisms to design high-level languages must still be perfected. Some reasons for the success or failure of a language may be external to the language itself. For example, use of COBOL or Ada in the United States was enforced in certain areas of programming by government mandate. Likewise, part of the reason for the success of FORTRAN may be attributed to the strong support of various computer manufacturers that have expended large efforts in providing sophisticated implementations and extensive documentation for these languages. Part of the success of SNOBOL4 during the 1970s can be attributed to an excellent text describing the language [GRISWOLD 1975]. Pascal and LISP have benefited from their use as objects of theoretical study by students of language design as well as from actual practical use.

Attributes of a Good Language

Despite of the major importance of some of these external influences, it is the programmer who ultimately, if sometimes indirectly, determines which languages live and die. Many reasons might be suggested to explain why programmers prefer one language over another. Let us consider some of these.
1. Clarity, simplicity, and unity. A programming language provides both a conceptual framework for thinking about algorithms and a means of expressing those algorithms. The language should be an aid to the programmer long before the actual coding stage. It should provide a clear, simple, and unified set of concepts that can be used as primitives in developing algorithms. To this end it is desirable to have a minimum number of different concepts, with the rules for their combination being as simple and regular as possible. We call this attribute conceptual integrity. 

The syntax of a language affects the ease with which a program may be written, tested, and later understood and modified. The readability of programs in a language is a central issue here. A syntax that is particularly terse or cryptic often makes a program easy to write (for the experienced programmer) but difficult to read when the program must be modified later. APL programs are often so cryptic that their own designers cannot easily decipher them a few months after they are completed. Many languages contain syntactic constructs that encourage misreading by making two almost identical statements actually mean radically different things. For example, the presence of a blank character, which is an operator, in a SNOBOL4 statement may entirely alter its meaning. A language should have the property that constructs that mean different things look different; i.e., semantic differences should be mirrored in the language syntax.

2. Orthogonality. The term orthogonality refers to the attribute of being able to combine various features of a language in all possible combinations, with every combination being meaningful. For example, suppose a language provides for an expression that can produce a value, and it also provides for a conditional statement that evaluates an expression to get a true or false value. These two features of the language, expression and conditional statement, are orthogonal if any expression can be used (and evaluated) within the conditional statement.

When the features of a language are orthogonal, then the language is easier to learn and programs are easier write because there are fewer exceptions and special cases to remember. The negative aspect of orthogonality is that a program will often compile without errors even though it contains a combination of features that are logically incoherent or extremely inefficient to execute. Because of these opposing qualities, orthogonality as an attribute of a language design is still controversial, since some like it and others do not.

3. Naturalness for the application. A language needs a syntax that when properly used allows the program structure to reflect the underlying logical structure of the algorithm. Ideally it should be possible to translate such a program design directly into appropriate program statements that reflect the structure of the algorithm. Sequential algorithms, concurrent algorithms, logic algorithms, etc., all have differing natural structures that are represented by programs in those languages. 

The language should provide appropriate data structures, operations, control structures, and a natural syntax for the problem to be solved. One of the major reasons for the proliferation of languages is just this need for naturalness. A language particularly suited to a certain class of applications may greatly simplify the creation of individual programs in that area. Prolog, with its bias toward deduction properties, and C++, for object-oriented design, are two languages with an obvious slant toward particular classes of applications.

4. Support for abstraction. Even with the most natural programming language for an application, there is always a substantial gap remaining between the abstract data structures and operations that characterize the solution to a problem and the particular primitive data structures and operations built into a language. For example, C may be an appropriate language for constructing a program to do class scheduling for a university, but the abstract data structures of "student," "class section," "instructor," "lecture room," and the abstract operations of "assign a student to a class section," "schedule a class section in a lecture room," etc., that are natural to the application are not provided directly by C.

A substantial part of the programmer's task is to design the appropriate abstractions for the problem solution and then to implement these abstractions using the more primitive features provided by the actual programming language. Ideally the language should allow data structures, data types, and operations to be defined and maintained as self-contained abstractions. The programmer may use them in other parts of the program knowing only their abstract properties, without concern for the details of their implementation. Both Ada and C++ were developed because of just these shortcomings in the earlier languages of Pascal and C, respectively.

5. Ease of program verification. The reliability of programs written in a language is always a central concern. There are many techniques for verifying that a program correctly performs its required function. A program may be proven correct by a formal verification method, it may be informally proven correct by desk checking (reading and visually checking the program text), it may be tested by executing it with test input data and checking the output results against the specifications, etc. For large programs some combination of all these methods is often used. A language that makes program verification difficult may be far more troublesome to use than one that supports and simplifies verification, even though the former may provide many more features that superficially appear to make programming easier. Simplicity of semantic and syntactic structure is a primary aspect that tends to simplify program verification.
6. Programming environment. The technical structure of a programming language is only one aspect affecting its utility. The presence of an appropriate programming environment may make a technically weak language easier to work with than a stronger language that has little external support. A long list of factors might be included as part of the programming environment. The availability of a reliable, efficient, and well-documented implementation of the language must head the list. Special editors and testing packages tailored to the language may greatly speed the creation and testing of programs. Facilities for maintaining and modifying multiple versions of a program may make working with large programs much simpler. Smalltalk is a language that was specifically designed around a programming environment consisting of windows, menus, mouse input, and a set of tools to operate on programs written in Smalltalk.
7. Portability of programs. One important criterion for many programming projects is that of the transportability of the resulting programs from the computer on which they are developed to other computer systems. A language that is widely available and whose definition is independent of the features of a particular machine forms a useful base for the production of transportable programs. Ada, FORTRAN, C, and Pascal all have standardized definitions allowing for portable applications to be implemented. Others, like ML, come from a single source implementation allowing the language designer some control over portable features of the language.
8. Cost of use. The tricky criterion of cost has been left for last. Cost is certainly a major element in the evaluation of any programming language, but different cost measures are feasible:

(a) Cost of program execution. In the earlier years of computing, questions of cost were concerned almost exclusively with program execution. Research on the design of optimizing compilers, efficient register allocation, and the design of efficient run-time support mechanisms was important. Cost of program execution, although always of some importance in language design, is of primary importance for large production programs that will be executed repeatedly. Today, however, for many applications, speed of execution is not of highest concern. With desktop machines running at several million instructions per second and sitting idle much of the time, a 10% or 20% increase in execution time can be tolerated if it means better diagnostics or easier user control over development and maintenance of the program.

(b) Cost of program translation. When a language like FORTRAN or C is used in teaching, the question of efficient translation (compilation) rather than efficient execution may be paramount. Typically, student programs are compiled many times while being debugged but are executed only a few times. In such a case it is important to have a fast and efficient compiler rather than a compiler that produces optimized executable code.

(c) Cost of program creation, testing, and use. Yet a third aspect of cost in a programming language is exemplified by the language Smalltalk. For a certain class of problems a solution may be designed, coded, tested, modified, and used with a minimum investment of programmer time and energy. Smalltalk is cost effective in that the overall time and effort expended in solving a problem on the computer is minimized. Concern with this sort of overall cost in use of a language has become as important in many cases as the more traditional concern with efficient program execution and compilation.

(d) Cost of program maintenance. Many studies have shown that the largest cost involved in any program that is used over a period of years is not the cost of initial design, coding, and testing of the program, but total life cycle costs including development costs and the cost of maintenance of the program while it is in production use. Maintenance includes the repair of errors discovered after the program is put into use, changes in the program required as the underlying hardware or operating system is updated, and extensions and enhancements to the program that are needed to meet new needs. A language that makes it easy for a program to be repeatedly modified, repaired, and extended by different programmers over a period of many years may be, in the long run, much less expensive to use than any other.

3.2 Syntax and Semantics

The syntax of a programming language is what the program looks like. To give the rules of the syntax for a programming language means to tell how statements, declarations, and other language constructs are written. The semantics of a programming language is the meaning given to the various syntactic constructs. For example, in C, to declare a 10-element vector, V, of integers, you would give a declaration, such as
int V[10];

In contrast, in Pascal, it would be specified as

V: array [0..9] of integer;

Although both create similar data object at run time, their syntax is very different. To understand the meaning of the declaration, you need to know that such a declaration placed at the beginning of a subprogram means to create the vector on each entry to that subprogram and destroy the vector on exit. The vector can be referenced by the name V during execution of the subprogram. In both examples, the elements of V are V0, ..., V9.

However, if V is created as a list in LISP, then you need to know that the size of the object is arbitrary and determined when the object is created, it can be created at arbitrary times during the execution of the program, and the first member is referenced as (car V) or (head V).

In programming language manuals and other language descriptions, it is customary to organize the language description around the various syntactic constructs in the language. Typically the syntax is given for a language construct such as a particular type of statement or declaration, then the semantics for that construct is also given, describing the intended meaning. BNF and EBNF notations are usually used to describe programming language syntax.

3.3 Language Standardization

What describes a programming language? Consider the following C code:
int i;
i = (1 && 2) + 3;
Is this valid C and what is the value of i? How would you answer these questions? Three approaches are most often used:
1. Read the definition in the language reference manual to decide what the statement means.
2. Write a program on your local computer system to see what happens.
3. Read the definition in the language standard.

Option 2 is probably the most common. Simply sit down and write a two-or three-line program that tests this condition. Therefore, the concept of a programming language is closely tied into the particular implementation on your local computer system. For the more "scholarly," a language reference manual, typically published by the vendor of your local C compiler, can also be checked. Since few have access to the language standard, option 3 is rarely employed.

Options 1 and 2 mean that a concept of a programming language is tied to a particular implementation. But is that implementation correct? What if you want to move your 50,000-line C program to another computer that has a compiler by a different vendor. Will the program still compile correctly and produce the same results when executed? If not, why not? Often, language design involves some intricate details and one vendor may have a different interpretation from another, yielding a slightly different execution behavior.

On the other hand, one vendor may decide that a new feature added to the language may enhance its usefulness. Is this "legal"? For example, if you extend C to add a new dynamic array declaration, can you still call the language C? If so, programs that use this new feature on the local compiler will fail to compile if moved to another system.

To address these concerns, most languages have standard definitions. All implementations should adhere to this "standard." Standards generally come in two flavors:

1. Proprietary standards. These are definitions by the company that developed and owns the language. For the most part, proprietary standards do not work for languages that have become popular and widely used. Variations in implementations soon appear with many enhancements and incompatibilities.
2. Consensus standards. These are documents produced by organizations based upon an agreement by the relevant participants. Consensus standards, or simply standards, are the major method to ensure uniformity among several implementations of a language.
Each country typically has an organization assigned with the role of developing standards. In the United States, that is the American National Standards Institute, or ANSI, with the role of programming language standards assigned to committee X3 of the Computer Business Equipment Manufacturers Association, or CBEMA. The Institute of Electrical and Electronic Engineers, or IEEE, also may develop such standards. In the United Kingdom, the standards role is assumed by the British Standards Institute, or BSI. International standards are produced by the International Standards Organization (ISO) with headquarters in Geneva, Switzerland.

Standards development follows a similar process in all of these organizations. At some point, a group decides that a language needs a standard definition. The standards body charters a working group of volunteers to develop that standard. When the working group agrees on their standard, it is voted upon by a larger voting block of interested individuals. Disagreements are worked out, and the language standard is produced.

While it sounds good in theory, the application of standards making is partially technical and partially political. For example, vendors of compilers have a strong financial stake in the standards process. After all, they want the standard to be like their current compiler to avoid having to make changes in their own implementation. Not only are such changes costly, but users of the compiler using the features that have changed now have programs that do not meet the standard. This makes for unhappy customers.

Therefore, as stated above, standards making is a consensus process. Not everyone gets their way, but one hopes that the resulting language is acceptable to everyone. Consider the following simple example. During the deliberations for the 1977 FORTRAN standard, it was generally agreed that strings and substrings were desirable features, since most FORTRAN implementations already had such features. But there were several feasible implementations of substrings: If M = "abcdefg", then the substring "bcde" could be the string from the second to fifth character of M (M[2:5]) or could be the string starting at position 2 and extending for four characters (M[2:4]). It could also be written M[3:6] by counting characters from the right. Since no consensus could be reached, the "consensus" was simply to leave this out of the standard. While not fulfilling most of the goals for a language as expressed by this chapter, it was the expedient solution that was adopted. For this reason, standards are useful documents, but the language definition can get colored by the politics of the day. In order to use standards effectively, we need to address three issues:

1. Timeliness. When do we standardize a language?
2. Conformance. What does it mean for a program to adhere to a standard and for a compiler to compile a standard?
3. Obsolescent. When does a standard age and how does it get modified?
We consider each question below.

Timeliness. One important issue, is when to standardize a language. FORTRAN was initially standardized in 1966 when there were many incompatible versions of "FORTRAN." This led to many problems since each implementation was different from the others. At the other extreme, Ada was initially standardized in 1983 before there were any implementations; therefore, it was not clear when the standard was produced whether the language would even work. The first effective Ada compilers did not even appear until 1987 or 1988, and several idiosyncrasies were identified by these early implementations. One would like to standardize a language early enough so that there is enough experience in using the language, yet not too late to encourage many incompatible implementations.

FORTRAN was standardized fairly late when there were many incompatible variations, Ada was standardized very early before any implementations existed, and C and Pascal were standardized while use was growing and before there were too many incompatible versions.

ML generally exists as a single implementation (Standard ML or SML) that everyone uses and most versions of Smalltalk, C++, and Prolog are quite similar, although there are variants of C that add objects that are similar to, but different from C++. LISP has probably suffered the most by being a widely used language with no standard reference. Dialects of LISP (Scheme, Common LISP, IBCL) exist and are similar, but incompatible.

Conformance. If there exists a standard for a language, we often talk about conformance to that standard. A program is conformant if it only uses features defined in the standard. A conforming compiler is one which, when given a conformant program, produces an executable program that produces the correct output.

Note that this does not say anything about extensions to the standard. If a compiler adds additional features, then any program that uses those features is non-conformant, and the standard says nothing about what the results of the computation should be. Standards generally only address conformant programs. Because of this, most compilers have features that are not addressed by the standard. This means that one must be careful in using your local implementation as a final authority as to the meaning of a given feature in a language.

Obsolescence. As our knowledge and experience of programming evolve, new computer architectures require new language features. Once we standardize a language, it seems "quaint" a few years later. The original FORTRAN 66 standard is quite out of date without types, nested control structures, encapsulation, block structure, and the numerous other features in more modern languages.

The standardization process already takes some of this into account. Standards have to be reviewed every five years and either be renewed or dropped. The five-year cycle often gets stretched out somewhat, but the process is mostly effective. FORTRAN was first standardized in 1966, revised in 1978 (although called FORTRAN 77 even though the proposed finalization date of 1977 was missed by a few months), and again in 1990. Ada was standardized in 1983 and again in 1995.

One problem with updating a standard is what to do with the existing collection of programs written for the older standard. Companies have significant resources invested in their software, and to rewrite all of this code for a new version of a language is quite costly. Because of this, most standards require backward compatibility; the new standard must include older versions of the language.

There are problems with this. For one, the language can get unwieldy with numerous obsolete constructs. More damaging, some of these constructs may be detrimental to good program design. The FORTRAN EQUIVALENCE statement is one such feature. If A is a real number and I is an integer, then

EQUIVALENCE (A, I)
A=A+1
I=I+1
assigns A and I to the same storage location. The assignment to A accesses this location as a real number and adds 1 to it. The assignment to I accesses this location assuming it is an integer and adds 1 to it. Since the representation of integers and reals on most computers is different, the results here are very unpredictable. Leaving this feature in the language is not a good idea. Recently, the concepts of obsolescent and deprecated features have developed.

A feature is obsolescent if it is a candidate feature that may be dropped in the next version of the standard. This warns users that the feature is still available, but in the next 5 to 10 years, it will be dropped. That gives a fair warning to rewrite code using that feature. A deprecated feature may become obsolescent with the next standard, hence may be dropped after two revisions. This gives a longer 10-to 20-year warning. New programs should not use either class of features.

Since standard conforming compilers are allowed to have extensions to the language, as long as they compile standard conforming programs correctly, most compilers do have additions that the vendor thinks are useful and will increase market share for that product. This allows innovation to continue and the language to evolve. Of course, within the academic community, most faculty do not care about such standards and will develop their own products that extend and modify languages as they see fit. This provides a fertile atmosphere where new language ideas get tried and some of the better ones do make it into commercial languages and compilers.

3.4 Internationalization

With the globalization of commerce and the emergence of the WWW, programming is increasingly a global activity, and it is important for languages to be readily usable in multiple countries. There is increasing need for computers to "speak" many different languages. For example, use of an 8-bit byte, which can store up to 256 different character representations, to represent a character is often insufficient. This issue has generally gone under the name internationalization.

Often local convention affect the way data are stored and processed. Such issues as character codes, collating sequences, formats for date and time, and other local standards affect input and output data. Some of the relevant issues are as follows:

Collating sequences. In what collating sequence should the characters be ordered?

Country-specific date formats. 11/26/02 in the United Stated is 26/11/02 in England; 26.11.02 in France; 26-XI-02 in Italy, etc.

Country-specific time formats. 5:40 p.m. in United States is 17:40 in Japan; 17.40 in Germany, 17h40 in France, and son on.

Time zones. Although the general rule is 1 hour of change for each 15 degrees of longitude, it is more a guideline than a reality. Time zones are generally an integer number of hours apart, but some vary by 15 or 30 minutes. Time changes (e.g., daylight savings in the United States and summer time in Europe) do not occur uniformly around the world. Translating local time into a worldwide standard time is nontrivial. In the southern hemisphere, the transformation for summer timer is opposite that of the northern hemisphere.

Ideographic systems. Some written languages are not based on a small number of characters forming an alphabet, but instead use large numbers of ideographs (e.g., Japanese, Chinese, and Korean). Often 16 bits might be needed to represent text in those languages.

Currency. Representation of currency (e.g., $, £, ¥) varies by country.

[contents]

4. PROGRAMMING ENVIRONMENTS

A programming environment is the environment familiar to most readers of this article. It is the environment in which programs are created and tested, and it tends to have less influence on language design than the operating environment in which programs are expected to be executed. A programming environment consists primarily of a set of support tools and a command language for invoking them. Each support tool is another program that may be used by the programmer as an aid during one or more of the stages of creation of a program. Typical tools in a programming environment include editors, debuggers, verifiers, test data generators, and pretty printers.

4.1 Effects on Language Design

Programming environments have affected language design primarily in two major areas: features aiding separate compilation and assembly of a program from components, and features aiding program testing and debugging.

Separate compilation. In the construction of any large program it is ordinarily desirable to have different programmers or programming groups design, code, and test parts of the program before a final assembly of all the components into a complete program. This requires the language to be structured so that individual subprograms or other parts can be separately compiled and executed, without the other parts, and then later merged without change into the final program.

Separate compilation is made difficult by the fact that in compiling one subprogram, the compiler may need information about other subprograms or shared data objects, such as:

1. The specification of the number, order, and type of parameters expected by any subprogram called allows the compiler to check whether a call of the external subprogram is valid. The language in which the other subprogram is coded may also need to be known so that the compiler may set up the appropriate "calling sequence" of instructions to transfer data and control information to the external subprogram during execution in the form expected by that subprogram.
2. The declaration of data type for any variable referenced is needed to allow the compiler to determine the storage representation of the external variable so that the reference may be compiled using the appropriate accessing formula for the variable (e.g., the correct offset within the common environment block).
3. The definition of a data type that is defined externally but is used to declare any local variable within the subprogram is needed to allow the compiler to allocate storage and compute accessing formulas for local data.
To provide this information about separately compiled subprograms, shared data objects, and type definitions either (1) the language may require that the information be redeclared within the subprogram (in FORTRAN), (2) it may prescribe a particular order of compilation to require compilation of each subprogram to be preceded by compilation of the specification of all called subprograms and shared data (in Ada and to some extent in Pascal), or (3) it may require the presence of a library containing the relevant specifications during compilation so that the compiler may retrieve them as needed (in Ada and C++).

The term independent compilation is usually used for option (1). Each subprogram may be independently compiled without any external information; the subprogram is entirely self-contained. Independent compilation has the disadvantage that ordinarily there is no way to check the consistency of the information about external subprograms and data that are redeclared in the subprogram. If the declarations within the subprogram do not match the actual structure of the external data or subprogram then a subtle error appears in the final assembly stage that will not have been detected during testing of the independently compiled program parts.

Options (2) and (3) require a means for specifications of subprograms, type definitions, and common environments to be given or placed in a library prior to the compilation of a subprogram. Usually it is desirable to allow the body (local variables and statements) of a subprogram to be omitted, with only the specification given. The body may be complied separately later. In Ada, for example, every subprogram, task, or package is split into two parts, a specification and a body, which may be separately compiled or placed in a library as required in order to allow compilation of other subprograms. A subprogram call made to a subprogram that has not yet been compiled is termed a stub. A subprogram containing stubs may be executed, and when a stub is reached, the call causes a system diagnostic message to be printed (or other action taken) rather than an actual call on the subprogram. Thus, a separately compiled subprogram may be executed for testing purposes even though code for some of the routines it calls is not yet available.

Another aspect of separate compilation that affects language design is in the use of shared names. If several groups are writing portions of a large program, it is often difficult to ensure that the names used by each group for subprograms, common environments, and shared type definitions are distinct. A common problem is to find, during assembly of the final complete program, that several subprograms or other program units have the same names. Often this means a tedious and time-consuming revision of already tested code. Languages employ three methods to avoid this problem:

1. Each shared name, such as in an extern statement in C, must be unique and it is the obligation of the programmer to ensure that this is so. Naming conventions must be adopted at the outset so that each group has a distinct set of names they may use for subprograms (e.g., "all names used by your group must begin with QQ"). For example, names used within the standard C #include files are usually prefixed with _, so programmers should avoid variables names beginning with the underscore.
2. Languages often use scoping rules to hide names. If one subprogram is contained within another subprogram, only the names in the outermost subprogram are known to other separately compiled subprograms. Languages like Pascal, C, and Ada use this mechanism. 
3. Names may be known by explicitly adding their definitions from an external library. This is the basic mechanism of inheritance in object-oriented languages. By including an externally defined class definition into a subprogram, other objects defined by that class become known, as in Ada and C++ . In Ada, names may also be overloaded so that several objects may have the same name. As long as the compiler can resolve which object is actually referenced, no change is needed in the calling program.

Testing and debugging. Most languages contain some features to aid program testing and debugging. A few typical examples are:
 

1. Execution trace features. Prolog, LISP and many other interactive languages provides features that allow particular statements and variables to be tagged for "tracing" during execution. Whenever a tagged statement is executed or a tagged variable is assigned a new value, execution of the program is interrupted and a designated trace subprogram is called (which typically prints appropriate debugging information).
2. Breakpoints. In an interactive programming environment, languages often provide a feature where the programmer can specify points in the program as breakpoints. When a breakpoint is reached during execution, execution of the program is interrupted and control is given to the programmer at a terminal. The programmer may inspect and modify values of variables and then restart the program from the point of interruption.
3. Assertions. An assertion is a conditional expression inserted as a separate statement in a program, e.g.,
assert( X > 0 and A = 1 ) or ( X = 0 and A > B+10 )
The assertion states the relationships that must hold among the values of the variables at that point in the program. When the assertion is "enabled," the compiler inserts code into the compiled program to test the conditions stated. During execution, if the conditions fail to hold, then execution is interrupted and an exception handler is invoked to print a message or take other action. After the program is debugged, the assertions may be "disabled" so that the compiler generates no code for their checking. They then become useful comments that aid in documenting the program. This is a simple concept that exists in several languages, including C++.

4.2 Environment Frameworks

A support environment consists of infrastructure services called the environment framework to manage the development of a program. This framework supplies services such as a data repository, graphical user interface, security, and communication services. Programs are written to use these services as part of their program design. Programmers need to use the infrastructure services as part of their program design. Accordingly, languages are sometimes designed to allow for easy access to these infrastructure services.

For example, programs written in the 1960s all included specific input-output routines to handle communication with the user. With the growth of interactive systems, the concept of windows displayed on a screen has become the standard output format. Today, an environment framework would contain an underlying window manager, such as Motif, which uses the X Window System, and the program would only need to call on specific Motif functions to display windows, menus, scroll bars, and perform most of the common windowing actions. The X Window interfaces are part of the environment framework, which gives all programs that use it a common behavior pattern for the user at a terminal. Systems like Visual Basic and Microsoft's Visual Studio provide for libraries of services for C++, Java, and BASIC programs to build window-based applications.

4.3 Job Control and Scripting Languages

Related to environment frameworks is the concept of job control. Today, if you want to execute a program such as the C compiler or a word processor, you move the mouse to the appropriate picture or icon on the screen and click the mouse. The appropriate program starts to execute. Before the age of windows-based systems, you would type in the name of the program to run (e.g., as you would do with the MS-DOS window on a PC today). Earlier still, during the punched-card era of computing, you would place a punched card naming the program to run ahead of the cards containing your data.

All of these allow the user to be in direct control of determining what steps to perform. If the compilation fails, the user could invoke an editor to correct the program: if the compilation succeeds, the user could invoke a loader and execute the program.

As we briefly described before, the 1960s saw the rise of processes as a computer took over the management of executing other programs. Rather than have an operator wait for each step to successfully finish, each program would produce a return code, and the command for the next step could test that return code and decide whether to execute. Thus, a sequence of steps — compile, load, execute program 1, execute program 2, — could be preloaded into the computer, and the operating system would sequence through these steps in a predefined manner. IBM used this approach in its Job Control Language for its System 360 beginning in 1963.

The developers of UNIX extended this concept. Rather than simply checking a return code on a previous job step, the control language could be a more complex structure with data, operations, and statements. In this case, the data objects would be the programs and files of the existing computer system. Thus, users could write programs that linked together various operations from other programs. One could then program the operation of a sequence of steps based on the contents of data files and the results of previous operations. This led to the UNIX shell including variations such as the Bourne shell, C shell, Korn shell, and so on. The *.bat file on the PC is a simple form of shell program.

From the concept of a shell, many related languages developed. These all go under the general category of a process or scripting language. They are generally interpreted and have the property that they view programs and files as the primitive data to manipulate. Languages such as AWK, Perl, and TCL have been used for years lo develop such scripts and, for most of the past 20 years, have been viewed as part of the arcane province of the systems programmer. However, their popularity has increased with the advent of the Web. Such scripting languages are important for conveying information back and forth between a user at a web browser and a web server.

Today we are in an environment where every organization is trying to provide products "faster, better, and cheaper." The term Internet time has been coined to describe the process that software development needs to proceed at the speed of the Internet — at megabits per second. The use of interpreted languages, such as these job control and scripting process languages, permits developers to rapidly prototype
applications. A language like Perl allows developers to build simple algorithms that invoke other previously written software using shell scripts in a very rapid manner.

[contents]