Programming Languages
Computer programming is the process of writing, testing and maintaining computer programs that carry out tasks on behalf of a human user (or sometimes, on behalf of other programs or hardware devices). The program's source code is written using a specific programming language, and the person who writes the program is called a programmer.
The programmer must have specialist knowledge of the programming language being used and the computing environment in which it operates, as well as at least some domain knowledge relating to the activities of the business or organisation for which the program is being written. The aim of the programmer is to produce effective, efficient and maintainable software.
Throughout recorded history, various methods have been devised for making mechanical devices carry out a series of actions in the required order. About the only thing many of these methods have in common is that the sequence of actions required was stored in some way. During the nineteenth century, for example, cards with holes punched into them in a predetermined pattern were used to control the operation of various mechanical devices used in the textile industry.
One of the best known examples is the Jacquard loom, a mechanical loom, invented by Joseph Marie Jacquard in 1801 that simplified the manufacture of textiles with complex patterns. Another pioneer in the history of programming was Herman Hollerith, whose tabulator machines were used by the Census office of the United States to tabulate the results of the 1890 census
Although the version of Hollerith's machine used in 1890 was not programmable, later versions were equipped with a control panel that allowed the machine to be set up to carry out a range of different tasks. Hollerith's Tabulating Machine Company (which he founded in 1896) later merged with three other companies to become the Computing Tabulating Recording Corporation. The company was renamed as the International Business Machines Corporation (IBM) in 1924.
The first programmable computer to be designed (although it was never actually completed) is thought to be the Analytical Engine conceived in 1835 by Charles Babbage and intended to be used for solving general computational problems. In this endeavour, Babbage was assisted by Lady Ada Lovelace, who is today widely regarded as the first computer programmer. She is believed to have introduced many programming concepts still in use to this day, including the use of loops to handle repetitive tasks, and the use of subroutines that can be called from anywhere within a program.
The first programmable electro-mechanical computer was built in Germany in 1941 by Konrad Zuse, although its significance seems to have been overlooked by the German military.
During the 1940s, a number of other computer scientists began to develop the idea of stored-program computers in which both program instructions and data could be stored in the computer's memory. A 1936 paper by the mathematician Alan Turing had described such a machine. Turing went on to publish detailed design specifications for a stored-program computer in 1946, but a paper published by John Von Neumann in 1945 was more widely circulated and consequently received far more attention.
Existing computer systems like the Colossus computer used by the British during the war to decipher German military codes, and the ENIAC computer used after the war by the Americans to develop the hydrogen bomb, had been programmed by setting switches and hard-wiring data and control signal paths.
Von Neumann's paper described a computer system in which the processing unit retrieved program instructions and data from a common storage area (memory), into which different programs and data could be loaded depending on what computational task the computer was to carry out. The computer architecture outlined in Von Neumann’s work became known as the "Von Neumann architecture", and is illustrated below.
The Von Neumann architecture
Assembly language
Because the central processing unit (CPU) in a digital computer essentially consists of a number of switches that can be in either the on or off state, the only form of instruction that a computer can "understand" is a collection of binary digits (ones and zeroes) that can be used to either set or retrieve the state of the processor's switches (i.e. on or off). The earliest computer programs thus consisted of machine code instructions written as strings of binary digits ("bits").
The programming language used was known as binary code, and was very difficult to work with due to the absence of symbolic information. To make it easier for programmers to write programs, a new type of programming language was developed that used mnemonics to represent individual machine code instructions. This kind of programming language was known as an assembly language, and consisted of the set of mnemonics needed to represent the complete instruction set used by a particular CPU.
The assembly language program still had to be converted into machine code in order for the computer to execute the program instructions, and this was carries out by a program called an assembler. Assembly language is used today mainly for embedded applications where there is a requirement for compact and efficient code, and where the amount of memory available may be limited.
High-level languages
Although assembly language enabled the programmer to write programs in a text format using symbolic names instead of machine code instructions, it was still a laborious, time-consuming and error-prone method of writing programs. More human-friendly programming languages began to emerge that allowed programs to be written using English-like commands. Such languages are often referred to as high-level languages, and are much easier to use.
Like assembly language programs, programs written with high-level languages must be converted into machine code so that the computer can understand them Two approaches have been used to achieve this. The first is to use a program called a compiler to convert the high-level code into machine code before the program is loaded into the computer's working memory. One advantage of high-level programming languages is that the source code does not need be written for a specific hardware platform, providing a compiler exists for the target CPU.
The second approach is to use a program called an interpreter to convert each high-level program instruction into machine code at run time (i.e. when the program is loaded into memory). This approach does have some specific advantages, although program execution tends to be slower.
One of the first high-level programming languages to be developed was Fortran (a contraction of Formula Translation) which was originally developed by IBM in 1954. Fortran is very good at carrying out large numbers of complex calculations, and is designed for use in scientific and engineering applications. Fortran is still in widespread use and has continued to evolve. The most recent version is Fortran 2003, with development of a new version currently being undertaken.
COBOL (COmmon Business-Oriented Language) is another very early high-level programming language dating from 1959. As the name suggests, COBOL is designed specifically for use in business, financial and administrative applications. Like Fortran, COBOL both continues to find widespread use and to evolve, with the latest stable release being COBOL 2002.
Development and standardisation of the language is now primarily managed by the International Standards Organisation (ISO), and a revised COBOL standard is expected to emerge within the next few years. The significance of COBOL should not be underestimated, since it has been estimated that currently approximately 75% of the world's business is transacted on systems written in COBOL.
Many other high level languages have emerged over the last few decades, and include both specialised languages such as LISP and Prolog (used for applications involving artificial intelligence) and more general-purpose languages such as C, C++, Perl, Java and Visual Basic.
The popularity of a particular language is hard to measure, but may be inferred from a number of metrics, including the number of jobs advertised for programmers in a particular language, the number of books available on a given language (although this may be misleading, since it tends to exaggerate the importance of languages currently being taught in colleges and universities), and estimates of the number of existing lines of code written in a particular language.
Since the emergence of the Internet and the World Wide Web, scripting languages like JavaScript and PHP have found widespread use, while examples of documents written using markup languages such as HTML and XML can be counted in their tens of billions. For database applications, the use of Structured Query Language (SQL) has become almost universal.
Programming paradigms
Each programming language tends to support a particular style (or paradigm) of programming. Although a number of factors may affect the choice of programming language for a particular task, including personal preference, corporate policy, or simply the availability of sufficient in-house knowledge and experience in a particular language, the language selected should ideally be that best suited to the task in hand. Whatever language is chosen, however, the following features are common to most programming languages:
- input - get data from an input device such as a disk drive or the keyboard
- output - send data to an output device such as a disk drive, visual display unit, printer or network adapter
- arithmetic and logic - perform arithmetic operations such as addition and multiplication, and logical operations such as comparing the value of two variables
- conditional execution - execute a different set of program instructions depending on whether a specified condition is true or false
- repetition - execute a set of program instructions repeatedly until some exit condition evaluates to true (a conditional loop) or for a specified number of iterations (a counting loop)
Procedural programming
Procedural programming is a structured programming paradigm based on the concept of procedure calls. The procedures in question may be subroutines, methods or functions, each of which consists of a collection of program statements that are executed sequentially in order to carry out a specific task. A procedure can be called from any point in a program, and may call other procedures.
One obvious benefit of procedural languages is that they allow the same code to be re-used repeatedly, rather than forcing the programmer to write the same code over and over again. They also avoid the need to use the GOTO or JUMP statements found in unstructured languages, and which in large programs frequently result in so-called "spaghetti code". Procedural languages are also well suited to modularisation.
Modular programs consist of a number of distinct code modules, each of which deals with a particular aspect of an application. For example, the programmer may decide to collect together all of the procedural code for handling data processing functionality in one module, and the code for generating the application's graphical user interface in another. This approach helps to reduce the overall complexity of a large application by organising the applications code into related functional units.
Object oriented programming
Object-oriented programming (OOP) is a programming paradigm that uses objects to model data entities. It thus extends the idea of modularisation, but places the emphasis on the data rather than on the functions carried out by the program. A real world entity such as a bank account or a customer can therefore be modelled as an object. The attributes that describe the real world entity, such as account number or customer ID are embodied within the object as member variables.
The object also has a number of methods (functions or procedures) that can be invoked in order to retrieve information about, or modify, the object's state. Objects interact with the application's users and with each other by sending and receiving messages. The messages to which an object will respond are defined by the object's interface. Applications are built by creating objects to represent the data entities that will be manipulated by the application.
Both the object's data and the precise details of how its methods are implemented are hidden from external entities. Generally speaking, an object's member variables are declared as private, and can only be accessed or modified by the object's own methods. The methods themselves are usually declared to be public, but the manner in which they may be invoked is strictly defined by the object's interface. The ability to hide both the data and the code implementation of an object behind the object's interface is called encapsulation.
Because objects such as customers and bank accounts tend to exist in large numbers in real world situations, one can assume that an application that models a banking system or customer service department will need to provide many objects of the same type. In object oriented programming languages, each object is derived from a template object called a class. The class defines the member variables, methods, and interface for each object created from it.
An object that is created from a particular class is thus said to be a unique instance of that class. The idea of classes can be extended to allow for variations on the original class to be defined as subclasses. For example, we could have a main class called bank_account from which we could derive two subclasses called current_account and savings_account.
The two subclasses would inherit the member variables, methods and interface of bank_account (the parent class), but each could be modified to reflect their more specialised role. The ability of a subclass to inherit the characteristics of its parent class in this way is called inheritance.
Object-oriented programming languages have existed since the 1960s. Simula is generally thought to be the first programming language to display object-oriented characteristics, while the first truly object-oriented language is held by many to be Smalltalk. It was not until the 1990s, however, that object-oriented programming emerged as the predominant programming paradigm, probably due to the rise in popularity of the graphical user interface (GUI), for which an object-oriented approach is ideally suited. While most programming languages have now adopted object-oriented characteristics to some degree, the languages that are currently to the fore include Visual Basic .NET, C# and Java.
Visual programming environments
A visual programming environment is one that allows the programmer to create a graphical user interface for an application by creating and manipulating forms and controls in a graphical environment. Such environments often provide a complete integrated development environment (IDE). That allows the programmer to create the interface, write the program code, and use a range of debugging tools to track down bugs and verify the correct operation of the program.
Microsoft Visual Studio is currently perhaps one of the most widely used and best known examples of such an environment, and includes Visual Basic, Visual C++, Visual C#, and the Visual Web Developer.
The chief advantage of using a visual programming environment is that it enables the software developer to produce a working application very quickly. As such, it fits in very well with development methodologies such as Rapid Application Development (RAD) that have evolved in response to today's rapidly changing and dynamic business environment. The main disadvantage is that the code produced is perhaps not as efficient as it could be, and portability across platforms (the ability to deploy the software on computers using different hardware or operating systems) is often an issue.