Peter Naur: Concise Survey of Computer Methods, 397 p.
Studentlitteratur, Lund, Sweden, ISBN 91-44-07881-1, 1974
ISBN/Petrocelli 0-88405-314-8, 1975
Part 1 - Basic Concepts, Tools and Methods
1. Data and their Applications (Reprinted as section 1.2 of Peter Naur: Computing, a Human Activity, 1992, ACM Press, ISBN 0-201-58069-1)
1.1. Data and What They Represent; 1.2. Data Processes and Models; 1.3. Data Recognition and Context; 1.4. Data Representations; 1.5. Numbers and Numerals; 1.6. Data Conversions; 1.7. Data Processing; 1.8. A Basic Principle of Data Science; 1.9. Limitations of Data Processing; 1.10. Summary of the Chapter; 1.11. Exercises.
Summary of the Chapter
The starting point is the concept of data, as defined in I. H. Gould (ed.): IFIP guide to concepts and terms in data processing, North-Holland Publ. Co., Amsterdam, 1971: DATA: A representation of facts or ideas in a formalised manner capable of being communicated or manipulated by some process. Data science is the science of dealing with data, once they have been established, while the relation of data to what they represent is delegated to other fields and sciences.
The usefulness of data and data processes derives from their application in building and handling models of reality.
Data representation may be chosen freely, and data used in practice differ along several dimensions, being static or dynamic, digital or analog, and using one out of a number of different media.
Numbers and their representation illustrate the concept of data, besides being of central importance in formulating data processes of any kind.
Data conversions are the simplest kind of data processes, but may illustrate wide ranges of data representations, particularly the interplay of static and dynamic representations.
In general data processing, data representing some meaning is processed in accordance with some intent to form new, so far unknown, data. In good data processing these latter may be used directly by humans to guide their actions.
A basic principle of data science is this: The data representation must be chosen with due regard to the transformation to be achieved and the data processing tools available. This stresses the importance of concern for the characteristics of the data processing tools.
Limits on what may be achieved by data processing may arise both from the difficulty of establishing data that represent a field of interest in a relevant manner, and from the difficulty of formulating the data processing that is needed. Some of the difficulty of understanding these limits is caused by the ease with which certain data processing tasks are performed by humans.
2. Computers and Programming Languages
2.1. Sequential Data Processes; 2.2. Computers, Programs, and Input Data; 2.3. Programming Languages, Interpretation and Translation; 2.4. Statements, Levels of Description and Primitive Operations; 2.5. Primitive Operations as State Transitions; 2.6. Data Stores, Variables and Values; 2.7. Machine Language Instructions; 2.8. Flow of Control and Flow Charts; 2.9. Operators and Types of Values; 2.10. Input and Output Conversions and Statements; 2.11. Limitations of Programming Languages Systems; 2.12. Execution Times; 2.13. Optimization in Programming Language Systems; 2.14. Source Program Error Treatment; 2.15. Theoretical Limitations of Programming; 2.16. Summary of the Chapter; 2.17. Exercises.
Summary of the Chapter
The chapter is a review of the common, general ideas and characteristics of electronic digital computers and their associated programming languages. Starting from the notion of sequential data processes, the ideas of computers, programs and and their input data are introduced. This leads to a description of the use of higher level programming languages, with the additional stages of program interpretation or translation.
The notions of levels of process description is introduced as a means for humans to retain the mental grasp of large programs. Each level is characterized by a set of primitive operations belonging to it. The meaning, or semantics, of the primitive operations is explained as the relation between the states of the process before and after the execution of the operation.
A data store is explained as a means for holding a set of values. Values are data viewed in a context where several or many different would be possible. A variable is an identifiable ability to hold a value. Variables may be simple or subscripted.
The essentials of machine language instructions are reviewed, with special attention to the address calculation algorithm.
The low of control of a program indicates how the position of the instruction or statement being executed moved about the program text. Flow charts are program descriptions in which the flow of control is displayed graphically by means of lines and arrows.
Operators are the denotation within programming languages for operations. Monadic operators have one, dyadic two, operands. An operand may be a literal, the identification of a variable, or the denotation of a more complicated process, involving further operators. Values are classified into types. The effect of the operators will generally depend on the types of their operands.
The special problems raised by input and output conversions in the present text that includes a systematic development of the principles of such conversions are discussed.
The chapter end with a discussion of the limitations of programming languages systems and of programming as such. The limitations are partly quantitative, caused by the finite capacity and finite speed of computers, partly qualitative, caused by mistakes in the computer and the associated software. An example of a problem that in principle cannot be solved by programming is finally given.
3. Construction, Documentation, Proof, and Testing of Programs
3.1. Programming, Communication and Documentation; 3.2. The Stylistic Details of Programs and Documentation; 3.3. General Snapshots and Invariants of the Program State; 3.4. The Programming of Loops; 3.5. Data Representations and Action Clusters; 3.6. Structured Programming and Problem Solving; 3.7. Avoiding Go To Statements; 3.8. Proving Programs; 3.9. Program Testing; 3.10. Testing of Subprograms; 3.11. Using Random Data for Testing; 3.12. Documentation Check List; 3.13. Summary of the Chapter; 3.14. Exercises.
Summary of the Chapter
The chapter is concerned with the methodology of constructing correct programs. The importance of documentation of the construction process and of its resulting program is first stressed. The documentation should describe not only the final solution, but also the reasoning leading up to it. It should be worked out carefully, using a clear handwriting style or a typewriter, making deliberate use of a notation and a typographical arrangement that is likely to aid the readers understanding.
An important part of the documentation of programs consists of precise assertions about the values of the variables. A general snapshot is a set of assertions that holds true whenever the flow of control passes through a particular point in the program. An invariant assertion holds throughout a section of the program, or the complete program. General snapshots may be used as an aid during program development. In particular, loops are conveniently constructed by starting with a general snapshot for a point within them.
The data representation used within a program is best described by a set of invariants. To make this possible the changes to the values of the variables should only be made within the context of action clusters, worked out to ensure the continued validity of the invariants.
The overall structure of the program is made to match the structure of the problem it solves by starting the development at a high level of description and only gradually filling in details at lower levels (structured programming). Normally this approach will have to be balanced by a study of alternative solution possibilities, using general methods of problem solving. In order to achieve a transparent program structure, go to statements should best be avoided.
Programs may be proved formally to be correct, on the basis of general snapshots and the formal rules of semantics of the programming language being used. Such proofs are usually impractical, however.
Program testing is used during program development and whenever a change has been made to a program. Diagnostic tests are used to locate mistakes, acceptance tests to verify that the program has no known errors. Test cases should be worked out during the program development, to make sure that all parts of the program test are tried and that the full range of external specifications is covered.
Random data are adequate for testing only to a limited extent, and mostly as an aid to obtaining information about the execution time efficiency of programs.
Part 2 - Processes with Single Data Elements
4. Digital Data Representations
4.1. Digital Data for Humans and for Computers; 4.2. The ISO 7-Bit Character Set; 4.3. Data Holding Abilities; 4.4. Quantitative Measurement of Digital Data; 4.5. Redundancy; 4.6. Redundancy Checking; 4.7. The Design of Redundancy Checking; 4.8. Digital Representations Used Inside Computers; 4.9. Variable-Length Representations; 4.10. Internal Representation of Properties and States; 4.11. Building the Action Table; 4.12. Primary Conversion of Character Strings; 4.13. Levels of Analysis of Character Strings; 4.14. Data Transmission; 4.15. Summary of the Chapter; 4.16. Exercises.
Summary of the Chapter
The chapter is concerned with the most detailed properties of the digital representations used by people and in computers. As a concrete starting point the ISO 7-bit character set is first described, as an example of a character set that includes the feature that are required when it is intended to be used both by people and in computers.
The quantitative measurement of digital data is discussed on the basis of the concepts of data holding abilities and their degrees of freedom and the variability of the phenomena represented by data. Quantities of data are expressed in terms of equivalent binary digits.
This is followed by a discussion of redundancy and its use for increasing process efficiency, for improving the match to computer storage structures, and for checking.
The representations used with single computer words are described, including field packing and arithmetical packing. Variable-length representations are discussed, and a comparison of fixed length, variable-length-by-delimiter, and variable length-by-size representations is made.
A systematic procedure for developing representations of properties and states, for use in controlling the flow of processes, is described and illustrated in an application to the primary conversion of character strings. The analysis of character string is further discussed in terms of the levels of analysis that are involved.
The chapter ends with a review of the basic concepts involved in data transmission over long distances.
5. Processes of Action Choice
5.1. Classification of Single Data Items; 5.2. Multi-Item Choices; 5.3. Decision Tables; 5.4. The Use of Quantitative Measures of Uncertainty; 5.5. Composite Binary Conditions; 5.6. Testing for Rare Agreement; 5.7. Summary of the Chapter; 5.8. Exercises.
Summary of the Chapter
The chapter treats the problem of the most effective way to formulate the choice of alternative flow paths in programs. As the first case, the selection of one out of several paths on the basis of a single data item is discussed. The use of multi-way switches, rather than two-way, is advocated.
The second case is flow selection on the basis of two or more data items. The discussion brings out the importance of taking the relative frequency of outcomes into account, and of looking for the bottleneck of the execution.
Next, the description of choice processes by means of decision tables is discussed.
As an approach to the design of truly optimal choice processes, quantitative measures of the uncertainty of choice situations are discussed. The discussion includes the entropy of information theory and the weighted average length of the optimal binary coding of the outcomes.
Next, choices having binary outcome determined from a number of independent binary tests are considered. They may be designed for optimum performance on the basis of a single rule.
Finally, choices made on the basis of a complicated rule and having a binary outcome, with one result very rare, are discussed. The use of specially designed intermediate choice classes is described, using the spelling error problem as illustration.
6. Numbers and Arithmetic
6.1. Three Classes of Numbers; 6.2. Computer Integers; 6.3. Computer arithmetic with Integers; 6.4. Programmed Multiple-Length Integers; 6.5. Floating-Point Representations; 6.6. Floating-Point Arithmetic; 6.7. Avoiding the Pitfalls of Floating-Point Arithmetic; 6.8. Summary of the Chapter; 6.9. Exercises.
Summary of the Chapter
The representation of numbers in computers is discussed on the basis of a distinction between mathematical numbers, application numbers, and computer numbers.
Computer integers are defined in terms of the concepts of positional representations. This gives the basis for discussing the arithmetic of computer integers and the use of programmed multiple-length integers.
Floating-point representations are described as the most important means for handling continuous variables in computers. The central concepts as far as the user is concerned are the precision of the representation and the accuracy of particular number representations.
The chapter ends with a discussion of the characteristics and shortcomings of floating-point arithmetics, including illustrations of the loss of significant digits.
Part 3 - Intermediate Amounts of Data
7. Searching, Ordering and Sorting
7.1. Search; 7.2. Simple Search; 7.3. Scatter Storage Search; 7.4. Binary Search; 7.5. Programming Binary Search; 7.6. Internal Ordering or Sorting; 7.7. The Shell Sort Method; 7.8. Programming the Shell Sort Method; 7.9. Testing the Performance of SHLSRT (Shell Sort); 7.10. Other Sorting Methods; 7.11. Summary of the Chapter; 7.12. Exercises.
Summary of the Chapter
The chapter is concerned with data organized as files consisting of a number of records of similar structure. The first half deals with search, that is the process of locating, within a file, of a record having a specified key. Three methods of search are described, simple search, scatter store search, and binary search. They differ as to the way the file must be organized and in their execution time efficiency.
As an illustration of the methods of systematic program development discussed in chapter 3, the details of the design steps leading to a program for binary search is given.
The second half of the chapter deals with methods for ordering or sorting of files. Only one sorting method is described in detail, the so-called Shell sort method. In addition to an explanation of the logic of the method in general terms, the design steps leading to a program for the method, and the details of a test of this program are given.
8. Structure and Analysis of Linear Texts
8.1. Generative Syntax Descriptions; 8.2. Finite State Analysis of Linear Texts; 8.3. The Testing of Finite State Control Tables; 8.4. The Equivalence of Control Tables and Syntax Descriptions; 8.5. Summary of the Chapter; 8.6. Exercises.
Summary of the Chapter
The chapter begins with a description of several ways of expressing the rules for forming texts, generative syntactic rules. Starting from the used of examples and forms, the main stress is on metalanguages. The notation introduced by Backus, BNF, is described in detail. In addition, several extensions of this notation are described.
The second part of the chapter deals with the analysis of texts through a sequential treatment of the characters of which they are composed. The use of a finite state algorithm, based on a control table, is demonstrated. Following the construction of the control table in a special example, the construction of a complete set of internal test cases is described.
The chapter ends with a discussion of some of the relations between generative syntax descriptions and finite state analysis algorithms. In particular it is shown that any text structure that is defined by means of a control table may also be described in BNF.
9. Evaluative Expressions
9.1. Nested Structures; 9.2. Extending the State with a Stack; 9.3. Sequential Evaluation; 9.4. Adequacy of the Postfix Form of Expressions; 9.5. Conversion from Infix to Postfix Form; 9.6. Left-to-Right Evaluation and Pseudo-Evaluation; 9.7. Summary of the Chapter; 9.8. Exercises.
Summary of the Chapter
The chapter starts with a description of nested structures, and with a demonstration that they cannot be adequately checked by means of a finite state algorithm. This leads to a description of the use of a stack for recording left parentheses during the left-to-right check of nested structures.
The problem of a sequential evaluation of ordinary expressions in infix form is next discussed. It is suggested that for convenience of evaluation the infix form should be replaced by the postfix, or Reverse Polish, form, in which the operator is placed after both of its operands. To prepare for the study this form a review of the rules of evaluation of expressions in infix form is given. These rules are concerned with the treatment of expressions within parentheses, with operator priorities, and with additional ordering of evaluation, such as left-to-right evaluation.
A syntax of expressions in postfix form is now given. It is shown how expression in this form may be analyzed into their constituent parts and evaluated. The evaluation is first shown as it may be done on the basis of the syntactic analysis of the expression and then as it may be performed by means of a stack of operand values. A proof of the correctness of this latter form of evaluation is given.
The adequacy of the postfix form is now demonstrated, in that it is shown that any expression in infix form, including parentheses, and with arbitrary evaluation rules attached, may be rendered by an expression in postfix form. Conversely, it is shown that any expression in postfix form can be rewritten as an expression in infix form.
A sequential method for converting expression in infix form to postfix form is now described. This uses a stack to hold left parentheses and operators. This is finally combined with the evaluation algorithm for expressions in postfix form given earlier, to produce an algorithm for left-to-right evaluation of expressions in infix form. Several uses of this algorithm for pseudo-evaluation during program translation are described briefly.
10. Lists and Pointers
10.1. Explicit Data Association by Pointers; 10.2. List Processes; 10.3. General List Structures; 10.4. Storage Allocation; 10.5. Deletion and Garbage Collection; 10.6. List Processing Systems and Languages; 10.7. Binary Tree Search; 10.8. Summary of the Chapter; 10.9. Exercises.
Summary of the Chapter
Explicit association of data items by means of pointers is explained by means of examples of string representations. The notion of lists is introduced, and several simple list processes are shown.
More general data structures based on pointers are explained in terms of an example, a structure that allows representations of family relations of a group of people.
The discussion continues with the problems of storage allocation of general data structures. The use of storage boxed of a fixed size and of a list of free boxes is explained.
The problem of regaining the storage capacity of items that have been deleted from a data structure is described. Alternative solutions are described, including the use of a garbage collection process that collects the unused boxes of storage whenever the list of free boxes is empty.
A few general remarks concerning list processing languages are made. Although these languages may be very useful, the ideas of lists and pointers may be exploited successfully even without access to any such language.
As an example of the use of pointers purely for facilitating internal processing, binary tree structures and the processes of search, insertion, and deletion, of records in them are developed in detail.
Part 4 - Data Interchanges between Man and Computer
11. Input Data from Human Sources
11.1. Humans as Sources and Receivers of Data; 11.2. Input Devices; 11.3. Principles of Design of Input from Humans; 11.4. Input Mistake Analysis; 11.5. Issues of Psychology; 11.6. The Input Format as a Language; 11.7. Summary of the Chapter; 11.8. Exercises.
Summary of the Chapter
The chapter is concerned with input data to programs, particularly the problems related to humans as sources of data. The essential characteristics of humans in this role and of the devices used for transfer of data from humans to computers are first briefly reviewed.
The chief contents of the chapter is a description of a number of concrete guidelines for the designers of input data formats. special attention is given to mistakes and their consequences.
As a way to overcome some problems of human psychology it is suggested that the people who will actually act as sources of data are consulted during the design of the input data format, and that their motivation for the work is taken into account.
12. Computer Output for Human Use
12.1. Output Devices; 12.2. Output Format Design Considerations; 12.3. Experiments on the Format of Output; 12.4. Digital Curves and Pictures; 12.5. Large Tables; 12.6. Summary of the Chapter; 12.7. Exercises.
Summary of the Chapter
The chapter gives a number of rules for helping to made the printed output from a computer convenient and useful to human readers. The rules center around the need to adjust the volume and form of the output to the human capacity and mode of comprehension. The value of output in analogue or pictorial form is emphasized, and some guidance in the use of line printers for producing output in these forms is given. The chapter ends with a few notes on the production of large tables, including a warning against such productions.
13. Conversations between Man and Computer
13.1. Conversational Devices; 13.2. Potentials of Conversational Techniques; 13.3. Experimental Development of Man-Machine Conversations; 13.4. Summary of the Chapter; 13.5. Exercises.
Summary of the Chapter
Conversational interchanges between persons and computers offer increased convenience and speed for the human side of the interchange, by relieving the person of the burden of mastering all details of the language of interchange, by allowing rapid feedback of error messages, and by making it possible to limit the output to precisely that which is desired. In addition conversational techniques open the possibility of a more intimate relationship between man and computer. In attacking complicated problems involving creative problem solving, such as the development of large programs, this close relationship may help to achieve superior solutions.
The development of systems for conversational interchanges must to a considerable extent be based on experimental development, involving the actual users.
Part 5 - Processes with Large Amounts of Data
14. Computer Storage of Large Amounts of Data
14.1. Auxiliary Stores; 14.2. Working Store, Channels and Concurrent Operation; 14.3. Time Sharing, Multiprogramming and Operating Systems; 14.4. Program Environments; 14.5. The Importance of Designing for Low Latency; 14.6. Summary of the Chapter; 14.7. Exercises.
Summary of the Chapter
The functional characteristics of auxiliary stores, such as tapes, drums, and discs, are reviewed. Transfers of data between auxiliary stores and the working stores of computers are made a block at a time. The access time of a block is composed of a latency and a transfer time. In order to achieve good time-effectiveness when using auxiliary stores it is necessary that the relative contribution of the latency to the execution time is kept low. For this reason the block length should, as far as possible, be chosen to be large.
Auxiliary stores are connection to computers via channels. The channels make it possible that data transfers with auxiliary stores may proceed in parallel with other, unrelated, process activity in the central processor of the computer. One way of exploiting this possibility is to arrange the storage block of the working store that is used for the transfers as a double buffer.
In order to achieve an intensive utilization of the various parts of a computer system, with a high degree of concurrency of operation, the system may be organized to allow concurrent execution of several independent programs with the aid of multiprogramming. The complete operation must then be under the control of a special program, the operating system.
The user environment offered by various operating systems varies greatly form one system to the other. As the first, crude distinction one may speak of uniprogramming and multiprogramming environments. Under uniprogramming the part of the computer being used remains under the exclusive control of one user for as long as he wishes; under multiprogramming the part may be expected to be given to another user whenever it is left unused for even the shortest moment.
Irrespective of the environment there is a gain to be had by designing data processes for low latency. This may be done, first, by arranging the processing in such a manner that the data being transferred to or from auxiliary stores have been used for significant processing, and second, by ordering accesses to auxiliary stores in accordance with the physical arrangement of the store.
15. Maintenance, Searching and Sorting of Large Files
15.1. Records and Blocks; 15.2. The Maintenance of Large files; 15.3. Large File Design; 15.4. Searching in Large Files; 15.5. Queuing and Ordering of Transaction; 15.6. Tape File Merging and Sorting; 15.7. Large File Splitting and File Directories; 15.8. Summary of the Chapter; 15.9. Exercises.
Summary of the Chapter
The chapter is concerned with the basic concepts related to files that are so large that they have to be stored in an auxiliary store. In addition to searching and sorting, the operation of maintenance is prominent in dealing with such large files. The maintenance slip is the time interval elapsed, from a change in the part of the real world that is represented by the data of the files has taken place, until the corresponding change has also been made in the file.
The design of a large file requires that several, often conflicting, issues are taken into account: capacity, requests, maintenance, reliability, and cost.
Searching in large files has to be arranged to avoid unnecessary block transfers. For this reason, when the search processes have to be completed one by one, a suitable adaption of the scatter storage search method is generally to be recommended. However, often a more effective solution may be had by collecting the transactions in a queue and sorting them in accordance with most convenient order of access to the auxiliary store. The important principle of this approach may be adapted at the levels of both the computer hardware, machine language and magnetic tape processing and manual handling.
Effective use of magnetic tapes is always based on the process of tape merging, which is perfectly adapted to the characteristics of this medium. Tape merging may be used for sorting of magnetic tape files in a number of different ways. As an example the polyphase merge sort method is described. For use as the first phase of the method the formation of long suites of records by means of the replacement-selection method is discussed.
The chapter ends with a few notes on the splitting of one file into several and on the use of file directories.
16. Miscellaneous Storage Methods
16.1. Tape Libraries; 16.2. Multipass Processing; 16.3. Paging and Virtual Stores; 16.4. Summary of the Chapter; 16.5. Exercises.
Summary of the Chapter
The chapter describes certain methods of handling large amounts of data that cannot in a natural way be regarded as files of records. The methods have important applications to the handling of large program texts, but are not restricted in their use to such data.
The first method deals with libraries of files held on tape. It gives a rules about ordering the files in such a way that the average latency incurred in transferring the files is minimized.
The second topic discussed is the processing of a large amount of data by means of a large program using multipass processing. The variants of this approach that are described include alternation of the processing in the forward and backward directions and the organization of the auxiliary storage capacity used for the large amount of data as a ring.
The third part of the chapter describes paging and virtual stores, that allow the used of a computer system that is equipped with a small working store and a large auxiliary store in the form of a drum or a disc to ignore the difference of the two stores and to refer directly to the locations of the auxiliary store.
Part 6 - Large Data Systems
17. Large Data Systems in Human Society
17.1. Large Data System Characteristics; 17.2. Political and Ethical Issues; 17.3. People Involved in Large Data Systems; 17.4. Large Data System Adequacy and Convenience; 17.5. Large Data System Reliability and Stability; 17.6. Large Data System Security and Supervisability; 17.7. The Political Weight of Large Data Systems; 17.8. Summary of the Chapter; 17.9. Exercises.
Summary of the Chapter
The chapter starts with a characterization of a class of large data systems in terms of the number of people involved, the cost of development and maintenance, the lifespan, the amounts of data included, the coupling of the actions of the computer and people involved, and the need for a whole family of programs to control the computer work. The concept is illustrated by examples of large data systems used in land surveying, business administration, and public administration.
The problems raised by a large data system may be not only technical, but also political and ethical. To describe these problems, a characterization is given of several groups of people who are in various ways related to the data system: the owner, the operators, the programmers, the managers, the clients, the adversaries, the society.
Large data systems have problems of their adequacy and convenience. These are of concern to the owner, the managers, the operators, and the clients. Further they raise problems of their reliability and stability. The problems concern the manager and the operators. Finally these systems have problems of security and supervisability. These problems are created by the possible presence of adversaries. They further concern the operators, the programmers, and the managers.
Large data systems may influence the power balance of various groups of the society. So far the most conspicuous effect of this kind is the tendency for a weakening of the position of the individual relative to that of large organizations, or a problem of privacy for the individual.
18. Design and Development of Large Data Systems (Reprinted as section 5.5 of Peter Naur: Computing, a Human Activity, 1992, ACM Press, ISBN 0-201-58069-1)
18.1. Controversial Issues of Large Data Systems; 18.2. The Management View of the Large Data System Project; 18.3. Overall Design and the Experimental Attitude; 18.4. The Human Feedback; 18.5. Techniques of Problem Solving and Large Data System Design; 18.6. Documentation of Large Data Systems; 18.7. Documentation of Computer Operation; 18.8. Design Decisions of Large Data Systems; 18.9. Experiments as Part of the Design Process; 18.10. Design Check Points; 18.11. Summary of the Chapter; 18.12. Exercises.
Summary of the Chapter
The development of large data systems has given rise to difficulties owing to: lack of the technical understanding of the problems; lag in the development of the appropriate educations; poor mutual understanding between project manager, system designers and programmers, and computer scientists; inadequate management methods, lack of standards of methods and performance; tendency to overpromise the achievement to be expected of projected systems.
As one approach to managing large data system projects, the development may be regarded as taking place by 5 stages: analysis of requirements; design; implementation; installation; and maintenance. The analogy with projects of development of physical items, which is the basis of this view of data system projects, is not so close as to prevent difficulties from this approach, however.
As an alternative approach, the development problem may be regarded as primarily one of overall design. Recognizing that the overall design can only be regarded as a hypothesis until all its parts have been worked out in detail, one arrives at a view of the project as a succession of experimental stages.
Large data systems contain important human elements, both during their design and operation. It is suggested that the insight into the psychological and sociological factors which will be of importance to the system design be gained through a dialog involving the designers and the people who will be involved in the operation.
The actual work on the design of large data systems may profit from an insight into general problem solving methods. Some of the essential issues are that the designer should view the problem from many different sides before making any design decisions, and that he should keep an open eye for alternative solutions.
With large systems a major design problem is to find an order of treatment of the usually conflicting design requirements. A sketch of a systematic method for finding such an order is given.
As a technique for arriving at the initial design goals it is suggested that a meeting between the people that will be involved in the system be held. During the meeting a list of design goals is set up, with a grading of each goal along both a usefulness and a cost scale.
As a help towards solving the difficult problem of documentation of large data systems 4 rules are given: First, let the documentation be produced while the actual development is in progress. Second, maintain a plan and a table of contents of the intended final documentation at all times during the development of the system. Third, help the reader to find his way in the documentation. Fourth, choose and hold a suitable terminology.
As part of the development of the design of a large data system it is helpful to look at the proposed design from the angle of several design areas: the interfaces of the processing done by people and that done by computers; the processing cycles of the system; the computer programs; the processing to be done by people; what will not be done and why.
Certain areas of a large data system can only be designed on the basis of experiments. Typical areas of this kind are: input and output formats; statistical properties of input data; the difficult processes.
A proposed design of a large data system should be checked against several general points: coverage of goals; simplicity; performance; reliability and stability; security and supervisability; modifiability.