Text of Chapter 6, Client-Server dBASE Programming, Martin Rinehart, Addison-Wesley, 1994,

Chapter 6

Database Design

I use a semantic, object-oriented approach to database design. By "semantic" I mean that I concentrate on the meaning of the data. By "object-oriented," I mean tha I concentrate on the real-world things that are modelled in a database.

In this chapter I'll show you my approach to database design. I'll also discuss the classic "normalization" theory that has been recommended for database design. Critics of relational databases are unanimous in condemning the normalization steps that have been recommended. I agree with the critics, but will give you the normalization steps anyway so you can judge for yourself.

We have agreed to use a relational database since that is the model used by the vast majority of today's available software, including dBASE.

A relational database is one where all data is stored in tables. Whether our back-end is .DBF-based or built around an RDBMS server, our data will be in tables. But how do we organize the data into the right tables? What is the right number of tables? How does a semantic, object-oriented approach yield a realtional database design?

To answer these questions, let's start with considering our goals.

Design Goals

Other than observing that the number of tables depends on the application, there is no way to make a rule about the right number of tables. But we can identify the characteristics of a good design.

No Redundant Data

First, we do not want redundant data. Every fact we need should have one and only one place in the database. For example, we won't record a customer's name and address on every invoice. That would give us the chance to misspell the name or get the address wrong in many places. It makes it just about impossible to make the needed updates if the customer moves.

We will hava a customer table, and store the customer's name and address just once in this table. If we make a mistake, it is then simple enough to correct it. If the customer moves, one change and our whole system is brought up to date.

A warning: There are other design goals that may contradict this one. You can sometimes improve processing speed immensely by storing a small amount of redundant data. Later on, we'll violate this rule, but only very infrequently and very carefully.

No Accidental Deletes

In every active table you will be adding and deleting data. But you don't want this to happen by accident. If your hospital system stores recommendeed treatments along with the patient's name and bed assignment, you don't want to delete the patient when the recommended treatment is changed.

Every time you store data about two or more independent things in a single record, you risk an unintended deletion. If your hospital billing system stored the insurance company's name and address in the patient record, deleting the last patient insured by a particular insurance company would remove that insurance company from your files.

Update Anomalies

The generic name for problems such as accidental deletes is "update anomalies." At their most subtle, update anomalies can be difficult to predict, and can present serious problems. An inadvertent relational operation could create bogus records (see the "lossless join" in the discussion of higher normal forms later in this chapter).

Update anomalies almost always stem from one design flaw: a single table is used to store data on two or more things. As we get into our design method, we will concentrate on separating our real-world items into individual classes so this will not happen.

Objects and Events

Over the years, I have evolved a design method that I call ORE&D (pronounced ore'n'dee, like R&D). I first started experimenting with ORE&D when I found that it wasn't possible to teach the classic normalization methodology to my clients in any reasonable length of time.

Why do I need to teach my clients? Only the people closest to the database problem have the knowledge you need to design a database, so it is critical that they understand the database design process. I created ORE&D as a method that could be explained simply to the people who really understood the problems the database system is designed to solve.

In recent years, non-relational database proponents (semantic and object-oriented) have commented extensively on the deficiencies of the normalization process. There is now considerable academic support for abandoning the old approach.

While ORE&D is much simpler than the classic normalization steps, it is not a simplified approach. If this seems like a contradiction, consider two carpenters: one patiently drives nails with a hammer, while the other uses a pneumatic nail gun. In a single "pop" the pneumatic gun drives a nail perfectly.

Objiously, the pneumatic gun achieves the same result as the manual process, but with far less work. Similarly, ORE&D achieves a first-rate, thoroughly normalized design, but with much less work. The last part of this chapter compares ORE&D to classic normalization.

ORE&D depends on a taxonomy of table types. Dividing tables into different classes may seem to contradict one of the most fundamental principles of relational databases: All tables are the same. In fact, there is no contradiction. ORE&D actually classifies real-world entities and places them in tables. The tables are the same from the point of view of the relational model, but are differentiated as to the real-world entities they model.

In the following sections, we'll discuss each table type, starting with the most common, Objects, and ending with Relationships, the least common. (The logical order for this taxonomy is OED&R, but I think you'll agree that ORE&D leads to a better pronunciation.)

Objects

Step one in ORE&D design is listing the real-world Objects that concern your system. Here I am not using "Object" as in "object-oriented." I mean real, tangible, hurt your toe if you kick one, Objects. Buildings, companies, products, customers, people, and employees are Objects for ORE&D work.

If relevant, less kickable Objects can also be listed. You may list musical compositions or mathematical theorems if they are part of the world your database must model.

Each Object type that you list is a table. Give it a table name and list the facts you need to know about each Object. These Object characteristics, subject to the discussion below, will become the columns of your table.

Events

After listing the Objects, list the Events that are important to your database. Events are items such as making a sale, admitting a patient, or making a shipment.

If you think of Objects as the nouns in your database, Events are the verbs. Objects are more or less permanent. Events are things that happen at a specific point in time.

Each type of Event that you list is another table. Again, give it a name and list the facts you need to know about the event. There are two things that you will note as you list the characteristics of your Events.

First, you will need a time stamp for each Event. When did it happen? This time stamp must be detailed enough to separate two otherwise identical events. In many cases, recording the date the event occurs will be adequate. In other cases, you will want to know the hour and minute, as well as the day.

A fast-food establishment will want to record the time of each transaction. (How many burgers should we have on the grill at noon? How many at 3:30 P.M.?) A software company that ships products once a month to distributors and dealers will find a date more than adequate for identifying a shipment.

Second, you will notice tha Events involve Objects. You ship products to customers, assign patients to beds, purchase components from suppliers. An Event is a transaction among Objects at a point in time.

Some of your Objects may be pieces of paper used to record Events. An invoice, for example, is a paper generated at the time of a sale. You should attempt to classify the real-world Objects and Events and attach data abouty these pieces of paper to the Events they record. For example, a sale Event may have an invoice number as one of its characteristics. Bear in mind that these pieces of paper are just another model of the real-world Objects and Events you're tracking.

As you list the Objects and Events relevant to your system, you may note that this taxonomy is not exhaustive. Extend it to meet your requirements. You may need to record data about items that are not permanent, like Objects, but that are not happenings at a single point in time, like Events. For example, your employer may build houses.

Biologists call fur-bearing animals that have babies and suckle their young "mammals." Animals that have scales and lay eggs are reptiles. The furry platypus suckles its young, after its eggs hatch. The platypus is neither reptile nor mammal, but this doesn't mean that the distinction between mammals and reptiles is not commonly useful. Similarly, our classification of Objects and Events is also commonly useful, even though it is not exhaustive.

Before you leave Objects and Events, make sure that you have listed an abstract, primary key as one of the characteristics of each Object and Event. Then scrutinize each characteristic carefully. Ask, "Is this really a fact about this Object (or Event)?" Create new tables for any characteristics that are not completely part of the subject Object or Event.

Details

Using the relational model, we cannot accommodate data that does not fit in a table. Some of our Objects and Events will require multiple tables to maintain a complete set of data. For instance, an invoice will commonly consist of two tables: one holding items such as the invoice date and the customer, and another listing the products sold, quantity and price.

Repeating Groups

The invoice is an example of an Event with a repeating group. A sale typically transfers one or more products to a customer. At this point, you should look through your Object and Event characteristics and note all those that involve repeating groups.

Some typical examples are: A patient receives one or more treatments; a shipment is held in one or more cartons; a student enrolls in one or more courses; an employee has one or more dependents. In each case, the repeating group is separated from the parent table into its own Detail table.

The first field in the Detail table is its own primary key. The second field is the foreign key, naming the parent record. For instance, if your SALES table has a related SALDTL (sales detail) table listing the products sold, the SALDTL table would include these fields:


     SALDTL
         Field   Type    Length
         ------- ------- --------
         SALDTL_K  N       8
         SALES_K   N       8
         (etc.)

If you were listing an employee's dependents, your detail table might start like this:

     DEPENS
         Field   Type    Length
         ------- ------- --------
         DEPENS_K  N       8
         EMPLOY_K  N       8
         (etc.)

In the Detail table, you list the characteristics of the repeating item (or move these characteristics out of the Object or Event into the Detail table). The sales detail table would include the product key and quantity. The dependent table would include names, birthdates, and so on.

Occasionally, a Detail table may have repeating characteristics of its own. These are broken out of the Detail table just as other Details were broken out of Objects and Events. Suppose you needed to record the sports in which your dependents participate. (Possibly your medical insurance is more expensive for race-car drivers or cheaper for joggers.) Since each dependent could participate in zero or more sports, another Detail table is required. This is an example:


     DEPENS
         Field   Type    Length
         ------- ------- --------
         DEPENS_K  N       8
         EMPLOY_K  N       8
         (etc.)

     SPORTS
         Field   Type    Length
         ------- ------- --------
         SPORTS_K  N       8
         DEPENS_K  N       8
         (etc.)

Histories

Another type of Detail table is the history table. At this point, go through each characteristic of every Object, Event, and Detail. Ask if historical data will be needed.

Prices are one example of a characteristic for which historical data is normally needed. As with other Details, move the characteristic out of the parent table into its own history table. Here is a price history Detail table, related to a parent product Object table.


     PRODCT
         Field   Type    Length
         ------- ------- --------
         PRODCT_K  N       8
         NAME      C      24
         (etc.)

     P_HIST
         Field   Type    Length
         ------- ------- --------
         P_HIST_K  N       8
         PRODCT_K  N       8
         P_DATE    D       8
         PRICE     N       6

Unlike other Detail tables, history tables always require at least one date. More precisely, you must have a start time and an end time for each history item. Frequently, the start date is adequate. (In this case, each start date implies that the previous entry ended the previous day.) However, if your organization changes prices in the middle of a day, for example, you may need a more exact time stamp.

You should be aware that almost every characteristic can change over time. This does not imply that every characteristic should have a history Detail table. For example, people change their names. (Jane marries Bill and adopts Bill's family name.) We can simply edit the record, replacing the old data with new data. Our databases maintain histories via regular backups and audit trails. Before you decide that separate History detail tables are required, ask if the frequency of use of the data justifies the extra effort required to maintain and use history tables.

Relationships

The "R" in ORE&D stands for Relationships. In smoe cases, fields in your Objecf, Event, and Detail tables will record relationships. In other cases, additional tables will be required.

1:1

Each student at your university is permitted one major field of study. The major field is simply a characteristic field. If you have a fixed list of major subjects, your software might show the user a picklist and record a two-byte code in the database. This would be structured as follows:

     STUDNT
         Field   Type    Length
         ------- ------- --------
         STUDNT_K  N       8
         NAME      C      40
         MAJOR     C       2
         (etc.)

Alternatively, your university could require that major fields of study correspond to one of the university's academic departments. If academic departments are recorded in an Object table in your database, the STUDNT table could look like this:

     DEPMNT
         Field   Type    Length
         ------- ------- --------
         DEPMNT_K  N       8
         (etc.)

     STUDNT
         Field   Type    Length
         ------- ------- --------
         STUDNT_K  N       8
         NAME      C      40
         DEPMNT_K  N       8

1:M and M:1

Regardless of the quantity (M, M+ [one or more], M* [zero or more], or other) 1:M and M:1 relationships are recorded the same way. In the table of the Many side of the relationship, the key of the one-side is a characteristic.

Here are many invoices, each related to one customer:

     CUSTMR
         Field   Type    Length
         ------- ------- --------
         CUSTMR_K  N       8
         (etc.)


     INVOIC
         Field   Type    Length
         ------- ------- --------
         INVOIC_K  N       8
         DATE      D       8
         CUSTMR_K  N       8
         (etc.)

In the next example, I'll show mny cartons as part of a single shipment.

     SHPMNT
         Field   Type    Length
         ------- ------- --------
         SHPMNT_K  N       8
         (etc.)

     CARTON
         Field   Type    Length
         ------- ------- --------
         CARTON_K  N       8
         SHPMNT_K  N       8

Both types of Detail tables, repeating groups and histories, are examples of M:1 relationships: many details to one parent Object, Event or Detail.

M:M

Many-to-many relationships almost always require a separate Relationship table, recording the relationship and its characteristics. Here is a table showing which of a chain's stores sell which products.
     STORE
         Field   Type    Length
         ------- ------- --------
         STORE_K   N       8
         (etc.)

     PRODCT
         Field   Type    Length
         ------- ------- --------
         PRODCT_K  N       8
         (etc.)

     PD_STR
         Field   Type    Length
         ------- ------- --------
         PD_STR_K  N       8
         PRODCT_K  N       8
         STORE_K   N       8
         (etc.)

The data "(etc.)" about the PD_STR relationship could include, for example, the number of units on hand if you maintain a chain-wide inventory. If each store maintains its own inventory, the "(etc.)" here might be null. It is not uncommon to have relationship tables that simply record the fact that the relationship exists. If the relationship is temporary, or subject to change, it might include dates (and possibly times) like history tables.

There is one exception: the all-to-all relationship. Suppost that each of a chain's stores sold every product. In that case, a table of all the stores and another table of all the products is adequate. The contents of the relationship table are implied by the fact that each store sells every product.

However, the fact that a relationship is all-to-all does not eliminate the need for the M:M table if other data about the relationship is required. For example, of you maintain a chain-wide inventory where you record the number of each product available at each store, you would still need the M:M table.

ORE&D Summarized

In ORE&D, you first list the Objects and Events relevant for your database system. Each Object and Event is a separate table. You start each table with an abstract, primary key, then add fields for all the other characteristics relevant to your system.

Repeating characteristics are separated into Detail files, as are any characteristics about which you need to record change histories. You repeat this step on the characteristics of the Detail files if they are repeating or require histories.

Relationships are handled by listing a characteristic or foreign key in a 1:1 relationship; including a foreign key on the M side of a 1:M or M:1 relationship, or by including a separate table for an M:M relationship.

Normalization

ORE&D is a simple, teachable method. Your analysis concentrates on the needs of the system, not on abstract database concepts. This is in contrast to the traditional normalization process. In the traditional process, you start with a list of everything you need to know.

First Normal Form

Given a master list, you put your data into tables by removing repeating items. Once all your data is in tables, you have achieved first normal form. Note that at this point you have a master parent table that might include your customers names, your products' names and your employees' names.

You then begin to assign concrete keys chosen from the data itself, not abstractly constructed.

Second Normal Form

To achieve second normal form, you separate the data in your tables into individual tables until all the characteristics of each table are "fully functionally dependent" on your keys.

Functional dependence means that a key specifies a value. For example, your personnel file might have an employee number and a social security number. Both are called candidate keys. Both candidate keys have only a single name associated with them, so the name is functionally dependent on both candidate keys.

Candidate keys may be built from multiple fields. For instance, the combination of social security number and name is a candidate key. Each SSNUM-NAME combination specifies a single employee number (EMPNUM), so the EMPNUM is functionally dependent on the SSNUM-NAME key.

Fully functionally dependent means that a characteristic (called "attribute" in normalization) depends on each part of a a candidate key. The EMPNUM above is not fully functionally dependent on SSNUM-NAME as NAME could be removed.

If your chain were keeping inventory for all stores, the table that contained STORENUM-PRODUCTNUM-QUANTITY would have STORENUM-PRODUCTNUM as a candidate key. In this table, QUANTITY is fully functionally dependent on this key — you need both parts of the key to select a particular quantity.

Second normal form is achieved when you have a first normal form database with all your attributes fully functionally dependent on each key.

Third Normal Form

Functional dependencies may be transitive: A is functionally dependent on B and B is functionally dependent on C; therefore A is functionally dependent on C. For example, a shipping company has a shipment table that includes a shipment number, SHIPNUM; an ORIGIN and a DESTINATION; and a DISTANCE.

ORIGIN and DESTINATION are functionally dependent on SHIPNUM. (Since SHIPNUM is not a compound key, functional dependence is automatically full-functional dependence.) DISTANCE is also functionally dependent on SHIPNUM, but the relationship is transitive, since DISTANCE is functionally dependent on ORIGIN and DESTINATION.

A database is in third normal form when it is in second normal form and has no transitive functional dependencies. So SHIPNUM-ORIGIN-DESTINATION-DISTANCE is second noremal form, but not third normal.

The third normal form would have two relations: SHIPNUM-ORIGIN-DESTINATION and ORIGIN-DESTINATION-DISTANCE. Third normal form avoids update anomalies present in second normal form. For example, under second normal form structure, if you deleted the only shipment from Boston to London, you would lose the distance data between these points.

An equivalent ORE&D design is:


     ROUTE
         Field   Type    Length
         ------- ------- --------
         ROUTE_K   N       8
         ORIGIN    C      32
         DESTNATN  C      32
         DISTANCE  N       4

     SHPMNT
         Field   Type    Length
         ------- ------- --------
         SHPMNT_K  N       8
         ROUTE_K   N       8

The complexity of the normalization process stems as much from its opaque jargon as from the concepts themselves. Most opponents of normalization argue that starting with a more meaningful grouping of data items results in a better design.

For example, The ORE&D design above replaces multiple instance of origin/destination pairs with a simple route key.

In much of the literature, third normal form is considered a "good" database design.

Boyce-Codd Normal Form

Update anomalies are still possible in third normal form. Boyce-Codd normal form is of particular interest to us, as it is trivial to prove that any ORE&D design is also in Boyce-Codd normal form.

A database is in Boyce-Codd normal form if it is in first normal form and every functional dependency is an attributes' dependence on a key.

Boyce-Codd normal form does not depend on second or third normal form. It is provable that a relation in Boyce-Codd normal form is also in third normal form, but the reverse is not necessarily true — a relation can be in third normal form but not in Boyce-Codd normal form.

This formulation skips some details. First, keys are fully functionally dependent on themselves. This trivial dependence is ignored. The same applies to a part of a key that depends on a compound key. Advanced normalization also introduces the concept of "superkeys" which are combinations of keys and other attributes. A superkey has at least those attributes needed to form a key, and may have additional attributes.

So far I have not discussed certain accidental dependencies that may occur, whatever database design method is used. For example, many addresses include a city and postal code. In many jurisdictions, the postal code uniquely identifies the city, so city is functionally dependent on postal code.

Most database systems simply ignore these implied dependencies. Of course, a large mail-order concern would hava a postal code table, from which the city would be looked up based on the postal code. In designing the database, you shuold consider any possible consequences of the update anomalies that may result if you ignore the dependency. In many cases, the consequences of entering an impossible combination of city and postal code simply means that the address nees correcting, so this dependency is ignored.

In the U. S., all taxpayers are assigned unique tax identification numbers, whose use as identifiers is legally restricted due to privacy concerns. All employee data is fully functionally dependent on this tax identifieer, as well as on whatever employee number system we adopt. But because it is legally restricted, the database design ignores its potential use as a key.

So more correctly, a database is in Boyce-Codd normal form if it is in first normal form and every functional dependency you do not choose to ignore is a dependency of an attribute on a key.

ORE&D designs are obviously in first normal form, since the data is all in tables. Since each table has a unique, abstract key, and since we use no other attributes as keys, every meaningfull dependency is a dependency of attributes on a key. So an ORE&D database is, by its definition, in Boyce-Codd normal form.

Fourth Normal Form

Boyce-Codd normal form does not eliminate all update anomalies. Certain bad designs can be in Boyce-Codd normal form, but still have serious update problems. One source of these problems is "multivalued dependencies."

Consider the example we looked at earlier, where we recorded the sport of our employee's dependants.


     DEPENS
         Field   Type    Length
         ------- ------- --------
         DEPENS_K  N       8
         EMPLOY_K  N       8
         (etc.)

     SPORTS
         Field   Type    Length
         ------- ------- --------
         SPORTS_K  N       8
         DEPENS_K  N       8
         (etc.)

Now assume that we also need to record any diseases our emplyees have. Then do not use ORE&D; just place the data in a table that includes DEPENS_K-SPORT-DISEASE. Here is an instance of this table:

    DEPENS_K        SPORT           DISEASE
    ------------    -------         -------------------
    10000001        Jogging         Asthma
    10000001        Skiing          Asthma
    10000001        Jogging         Near-sightedness
    10000001        Skiing          Near-sightedness
    (etc.)

A multivalued dependency exists since every value in SPORT implies every value in DISEASE. If our dependent takes up another sport, or develops another disease, we need to add multiple entries to keep this table complete. Similary, dropping a sport or recovering from a disease requires deleting multiple records.

A database is in fourth normal form if it is in first normal form and every multivalued dependency is a dependency of attributes on a superkey. (Note that the single-valued dependencies we have discussed thus far are the minimal case of a multivalued dependency. If A implies B, then A multivalued implies B. So fourth normal form is a higher form of Boyce-Codd normal form.)

In an ORE&D design, the multivalued dependencies are not created. For example, when you see dependents having repeating SPORT and DISEASE attributes, you would create separate detail tables.


     DEPENS
         Field   Type    Length
         ------- ------- --------
         DEPENS_K  N       8
         EMPLOY_K  N       8
         (etc.)

     SPORTS
         Field   Type    Length
         ------- ------- --------
         SPORTS_K  N       8
         DEPENS_K  N       8
         (etc.)

     DISEAS
         Field   Type    Length
         ------- ------- --------
         DISEAS_K  N       8
         DEPENS_K  N       8
         (etc.)

(Note that the six-character table name limitation is not a feature of ORE&D design. It is an unfortunate necessity stemming from DOS filename limitations. Using any design method, you should take advantage of longer table names if they are available.)

Given that each repeating group is separated into its own Detail table, it is also true that an ORE&D design is in fourth normal form.

Higher Normal Forms

If you have followed the discussion of normal forms into fourth normal, you have advanced past the point most normalization achieves. However, additional anomalies are possible in fourth normal form designs. These require consideration of the effects of joins on tables.

As if the normalization jargon were not already sufficiently opaque, we will now consider lossless joins. Lossless joins are not at all lossless. On the contrary, some joins can create tables with invalid, extra records. A lossless join is one which does not create these invalid, extra records.

Let's begin by joining a part of a sales detail table with a small product table:


     SALDTL:
         SALDTL_K, INVOIC_K, PRODCT_K, QUANTITY
         12345678  22222222  10000001      3
         12345679  22222222  10000015      1
         12345680  22222222  10000023      2

     PRDUCT:
         PRDUCT_K, NAME,            COLOR, PRICE
         10000001  Widget           Black   1.95
         . . .
         10000015  Small Widget     Red     4.95
         . . .
         10000023  Gold Widget      Gold   99.95

Joining these tables on the product key yields a table that includes these columns:

     SALDTL_K, QUANTITY, NAME,         PRICE
     12345678      3     Widget         1.95
     12345679      1     Small Widget   4.95
     12345680      2     Gold Widget   99.95

(The other columns, not shown here, are INVOIC_K, PRODCT_K, and COLOR.) This is a example of a lossless join, since we did not create any invalid records.

A database is in "projection-join" normal form if the dependencies guarantee that all joins will be lossless joins. An ORE&D design is provably in projection-join normal form. I leave it to you to work out the proof.

Additional normal forms include domain-key normal form and fifth normal form.

In relational theory, the domain of an attribute is the set of possible values. The domain of a NAME column might be any combination of alphabetic characters up to the specified length. The domanin of a SALARY column might be integer numbers between specified minimum and maximum values. Domain-key normal form uses the concept of domains to eliminate some update anomalies associated with fourth normal form designs.

Fifth normal form considers even more advanced dependencies. Some authors have claimed that fifth normal form is the "final" normal form, although this has not been universally agreed upon.

I have not proved that an ORE&D design is in fifth normal form, though I suspect this to be the case. At any rate, I have never seen an example of an ORE&D design that shows any update anomalies.

Perhaps one of you readers with a mathematical inquisitiveness will answer this question: Are ORE&D designs in fifth normal form?

If you are only interested in designing practical, working systems, be assured that an ORE&D design is guaranteed to be in projection-join normal form, which is substantially better than most practitioners of normalization achieve.

Summary

In this chapter, I proposed a semantic, object-oriented design methodology that yields a relational database design. First, I explained my methodology, and then I compared it with the classic normalization steps.

Our first consideration was the database design goals. We want to have each fact stored in one and only one place in the database. We want to be able to delete a single item, without fear that other data will inadvertently disppear. We considered other "update anomalies" that we want to avoid.

I proposed a way of achieving these goals that is free from the jargon of traditional relational database methodology. My methodology is called "ORE&D" to help you remember the four basic types of things we will put in tables.

The "O" in ORE&D stands for "Objects." By Object, I mean any durable thing, such as a customer, a product, a machine, building or hospital bed. You create one table for each separate type of Object that you need to track. The fields in that tabel are the characteristics of the Object that you need to track: customer's names and addresses, product's weights and prices, and so on.

The "E" in ORE&D stands for "Events." An Event is a transaction among our Objects that occurs at a point in time. A sale is a typical Event, transferring some of our products to a particular customer at the time of the sale. Each class of Event also gets a table, and the fields in those tables are also the characteristics of the Event, such as its time, the Objects involved (specified by foreign keys), and so on.

The "D" in ORE&D stands for "Details." We looked at two types of Detail data that you break out of Object and Event tables into separate Detail tables. First, some items are inherently repeating. A sale Event transfers one or more product Objects to a Customer. When there are "one or more" (or "zero or more," or some other number) a Detail table is created. This has a foreign key back to the Object or Event about which it is recording details, and the characteristics of the detailed item, such as the number of products sold.

Another Detail table is created whenever you need to track the history of a characteristic. A price history Detail table might be required to record product prices as they change over time. Each history Detail table records the key o the object about which it records history, the time or time span for which its value applies, and the value.

Relationships are the "R" in ORE&D designs. While Event tables show relationships that happened at a single point in time, separate Relationship tables are used for longer-duration relationships (excepting 1:1 relationships). If each employee works for a single department, you can simply place a foreign key to the correct department as a field in the employee's record. However, if some employees are on assignments to multiple departments, you create a Relationship table.

Each Relationship table includes foreign keys to the Objects that are related, the time span over which the relationship applies, and any other characteristics of the relationship.

The process of building an ORE&D design is the process of identifying Objects, and Events and listing their characteristics. Detail tables are separated out for repeating items and for histories. Relationship tables are created if you need to know about the relationships between or among your other tables. As each table is created, you reapply the process of separating out Detail tables and creating Relationship tables until you have no additional tables to add. The result is a relational database design.

The ORE&D method appears to be far simpler than the traditional normalization method, so you might reasonably wonder if it results in a design of the same quality. To explore this, I compared it to the classic normalization steps.

Most relational analysts are trained in first, second and third normal form. First normal form means that all data is in tables, which is true of ORE&D. Second normal form specifies that all data in each table is fully functionally dependent on the primary key, which is also true of ORE&D designs. Third normal form eliminates transitive dependencies, which are also eliminated in the ORE&D design process.

There are several higher normal forms, including Boyce-Codd and fourth normal form. Each higher normal form eliminates certain update anomalies, which is one of our design goals. We see that it is trivial to prove that an ORE&D design is in Boyce-Codd normal form and quite simple to extend that proof to show that an ORE&D design is also in fourth normal form.

Higher normal forms come into play when we consider such complexities as trinary and higher-degree relationships. I showed that ORE&D designs are in projection-join normal form. Proving that an ORE&D design is in domain-key normal form or fifth normal form is left up to you. These advanced normal forms are well past those that most relational analysts consider in practical database design work.

This completes our database design work. Now that you can design solid, projection-join normal form databases (without even needing to know any of the normalization jargon) you are ready to start programming. We'll begin with support software, before we get to the client-side software.