BLOGS BY TOPIC
BLOGS BY AUTHOR
BLOGS BY YEAR
Designing relational databases is easy once you learn the few basic rules, and the procedure is identical whether you're using SQL Server or Access. This blog illuminates the subject!
- Designing Databases - an Introduction
- Designing our Example Database (this blog)
- Modelling Many-to-many Relationships
Posted by Andy Brown on 23 April 2012
You need a minimum screen resolution of about 700 pixels width to see our blogs. This is because they contain diagrams and tables which would not be viewable easily on a mobile phone or small laptop. Please use a larger tablet, notebook or desktop computer, or change your screen resolution settings.
Designing our Example Database
Remember that we wanted to create a relational database from the following raw data:
The first few clients (out of presumably many thousands)
Step One - Identifying the Pegs
If you look at the above example, some of the information is repeated. For example, every time someone works for Shell we get repeated:
- Shell; and
- Oil and gas supplier
This is a bad idea for 3 reasons:
- It wastes space in the database (not a big issue if you only have a few thousand records, but significant if you have a few million).
- It wastes typing time (someone is going to have to type in the repeated information).
- Above all it makes data errors significantly more likely (if you have 100 people working for Shell, the chance of mistyping the company name is high). As a consequence, any future attempt to show all the people who work for Shell may omit some data - which is really, really bad news.
To avoid repeated information as above, follow the main rule of database design (as shown in the hint below).
If you find yourself typing in the same information twice, you should extract it to another table.
For the above example, it is I hope by now reasonably obvious that there are two "pegs" to hang data on:
|Peg||What it contains|
|Companies||A list of all the companies|
|People||A list of all the people who work for these companies|
By storing the company name and description in a separate table, we'll avoid having to type things in twice. Given this insight, we should now be able to assign the bits of information in our spreadsheet to the two tables.
Some people call these pegs "entities", corresponding to physical things. For example, in a movies database like IMDB entities would include the fllms themselves, actors, directors, studios, certificates, genres, ratings and languages, each of which would have a separate table.
Step Two - Identifying the Fields/Columns
We can assign the fields/columns shown in our spreadsheet to tables as follows:
|Date of birth||21-Feb-89||People|
|Description||Oil and gas supplier||Companies|
Thus database design is simply a matter of good housekeeping: putting fields/columns into the tables to which they belong. This gives us the following two tables:
The two tables have no links between them as yet (you can't tell which people work for which companies).
However, the tables are not yet linked in any way - but before we can consider this, we must first create primary keys for each table.
Step Three - Adding Primary Keys
For every table you create in a database, you should create a primary key: a field/column whose value is different for each row/record.
For the tblPerson table, for example, you can't choose any of the existing fields/columns - otherwise you will only be able to have one person called Alan or Arbib or born on 21st February 1989 in your table. The easiest solution is usually to create a new field/column to number the records:
Our revised tables: each now has an additional field which numbers the rows or records.
Thus the PersonId field/column, for example, will hold a different number for each person:
Here Fiona McBride is selected, who has unique id number 3.
I think creating an id field/column to automatically number the records in a table is always the best and simplest solution, even when another candidate field/column exists. So if every product that your company sells has a unique code, I'd still create a ProductId field/column and use this as the primary key instead.
Step Four - Adding a Link Field
The only problem now is that we have two tables, with no way of linking them:
|The table of people||The table of companies|
To tie the two tables in together, we need to add one more field/column to the tblPerson table, identifying for each person to which company they belong:
The new CompanyId field/column ties each person into a company.
Thus (for example) Alan Arbib works for company number 1, which is Shell. The field called CompanyId in the tblPerson table is sometimes called a foreign key.
Hence the beauty of relational databases: instead of storing all of Shell's details against Alan Arbib, we're now just storing a 4-byte integer, and using this to look up the other details. In the front-end menu of our application, we can make sure that our user doesn't need to know the CompanyId for each company by providing a drop-list instead.
Step Five - Creating the Relationship
Potentially we can now tie each person to the company they work for. To make this formal, we create a relationship:
The link between the two tables, as it appears in Access (SQL Server is similar).
Every relationship is one-to-many: in the example above, for each company we can have potentially an infinite number of people working for it.
Referential Integrity and Cascade Deletion
The last buzzwords of this page, promise! For every relationship that you create, you should normally enforce referential integrity and cascade deletion. Here's what they mean:
|Term||What it means|
|Referential integrity||It's impossible to have orphans in the system (for our example, a person can't have a CompanyId if no such corresponding row/record exists in the table of companies). This is vital to preserve data integrity.|
|Cascade deletion||If you delete a parent, you should also delete all of the parent's children (for our example, if you delete Shell from the companies table, you should also delete all of the people who work for Shell; the obvious solution is not to delete Shell in the first place!).|
That is almost all of the theory involved in relational databases. However, there's one more thorny issue to discuss, which is how to cope with the many-to-many relationships that exist everywhere in life.