Data Flow Diagrams
Data flow diagrams (DFDs) are a way of representing a system's business processes, the flow of data into and out of those processes, and the flow of data between the system and the external agencies with which it interacts. The resulting graphical views of the system, its processes, and its data flows can be used as a basis for discussion between system developers and users of the system. A hierarchy of DFDs is produced, starting with an overview that provides a very abstract view of the system and ending with a number of diagrams representing the lowest-level sub-processes. The highest level DFD is the context diagram, which simply shows the system of interest, the external entities with which it interacts, and the data flows between the system and the external entities. A typical context diagram is shown below.
A context diagram for an order management system
You will notice several things about this diagram. An external entity (in this case "Customer") is shown as an ellipse, appropriately labelled. An external entity is either a source of data entering the system, or a destination for data leaving the system, or (more often than not) both. The system itself is simply shown as a rectangular box with a suitable name. A data flow is represented by a line, suitably labelled, with an arrow at one end (or in some cases both ends) showing in which direction the data is flowing. The Customer external entity is duplicated on the diagram for the sake of clarity, to avoid too many data flows close together. The diagonal line in the top left corner of the Customer external entity symbol is to indicate that more than one instance of this entity appears on the diagram. The last thing of note is the representation of a resource flow ("Goods"), which is shown as a line with a double-headed arrow. The context diagram serves to define the system boundary. Any entity with which the system interacts, but which is not a part of the system itself, is an external entity.
Before doing too much work on a set of data flow diagrams, it is worth drawing up a list of the external entities providing inputs to or receiving outputs from the system, and identifying those inputs and outputs. In addition, it would be useful to identify all of the high-level business activities included within the system boundary, and relate these activities to specific inputs and outputs. The sort of questions to be asked include questions such as "Who does what, when, where and how?" and "What data do each of these people need to carry out their tasks?".
We can represent the high-level processes within the system of interest by creating a level 1 data flow diagram. An example of a level 1 DFD for our order management system is shown below.
A level 1 data flow diagram for an order management system
We now have three processes, "Manage enquiry", "Manage order", and "Manage sales ledger". Each process is represented by a rectangle, subdivided into three smaller rectangles. Each has a descriptive name that provides a clue as to the type of activity taking place within it, and an ID number in the top left corner. Note that the ordering of these ID numbers is purely arbitrary, and there is no priority implied by it. The space to the right of the ID number can be used, if required, to identify the person or department responsible for the process, or the location at which it occurs. A process is some activity that receives data, transforms it in some way, and (usually) outputs it again in a modified format.
We also have three data stores, "Sales orders", "Quotations" and "Invoices". The data store symbol is also a rectangle, subdivided into two smaller rectangles and open at one end. The boxed in area at the left side of the data store symbol contains an ID number, prefixed with a capita "D" (for data, presumably!). Again, no priority is implied by the numbering of data stores. To the right of the ID number is the name of the data store, which usually gives a clue as to the kind of information held. The data store is a generic representation of some physical or electronic data storage medium, such as index cards or a database file. Like external entities, data stores can be duplicated on the same diagram for the sake of clarity.
Some rules for data flow diagrams:
- all processes must have at least one data flow in, and one data flow out
- each process should represent only one activity at a particular level
- each data store must have both inputs and outputs, and relate to at least one data flow
- each external entity must relate to at least one data flow
- each data flow must be attached to at least one process
- a data flow from an external entity must flow into a process
- a data flow to an external entity must flow from a process
- a data flow to a data store can only come from a process
- a data flow from a data store can only go to a process
- As a general rule, the number of processes shown in a DFD should not exceed twelve
The context diagram and level 1 DFD may well be re-drawn a number of times before a consensus is reached between developers and users that the diagrams accurately represent all of the high level processes, data stores, and data flows. Looking at the above DFD, for example, it will become apparent that we do not currently have a data store for customer information. Once these diagrams are considered to be substantially correct, however, each of the high-level processes included in the level 1 DFD will probably require further analysis to break them down into their constituent sub-processes, resulting in a level 2 DFD being produced for each process shown on the level 1 DFD. A level 2 DFD for the "Manage enquiry" process is shown below.
A level 2 data flow diagram for the "Manage enquiry" process
Each (parent) process in the level 1 DFD will be decomposed into lower level (child) sub-processes. The lower level processes may be further decomposed if necessary, although it is unusual to have to do this beyond level 3, and often level 2 is sufficient. Note that the data flows in and out of a parent DFD must be evident in the child DFD. Note also that the ID number allocated to the parent process is carried down to each child process. In the example above, the "Manage enquiry" process has an ID number of 2, and the three sub-processes have the ID numbers 2.1, 2.2 and 2.3. Once a low level process is considered to be a discrete task that is sufficiently atomic in nature, no further decomposition is necessary and an elementary process description (EPD) can be produced for each low-level process (see example below).
Elementary process descriptions for "Manage enquiry"
Data flow diagrams can be used to represent the system, not only at different levels of detail, but from different perspectives. The four main types of data flow diagram are described below:
- Current logical DFD - describes what the system does, but not necessarily how it does it. This is useful for discussing the functionality of the system without getting bogged down in too much detail.
- Current physical DFD - describes what the system does and how the functionality is currently implemented. This type of diagram is useful for highlighting redundant processes and data stores, and for giving the analyst an insight into how the system operates in its present form.
- Required logical DFD - describes what the new system must be able to do, but not necessarily how it should do it. This is useful for achieving consensus between developers and users on a requirements specification.
- Required physical DFD - describes what the new system will do and how the functionality will be implemented. This type of diagram is produced during the design stage, and is useful for conveying to users how the system will be implemented.