Designing databases for SQL Server and Access
Part two of a three-part series of blogs

Designing relational databases is easy once you learn the few basic rules, and the procedure is identical whether you're using SQL Server or Access. This blog illuminates the subject!

  1. Designing Databases - an Introduction
  2. Designing our Example Database (this blog)
  3. Modelling Many-to-many Relationships

This blog is part of our tutorial on writing SQL.  You can also learn SQL on one our introduction or advanced SQL courses.

Posted by Andy Brown on 23 April 2012

You need a minimum screen resolution of about 700 pixels width to see our blogs. This is because they contain diagrams and tables which would not be viewable easily on a mobile phone or small laptop. Please use a larger tablet, notebook or desktop computer, or change your screen resolution settings.

Designing our Example Database

Remember that we wanted to create a relational database from the following raw data:

Our example spreadsheet of clients

The first few clients (out of presumably many thousands)

Step One - Identifying the Pegs

If you look at the above example, some of the information is repeated.  For example, every time someone works for Shell we get repeated:

  • Shell; and
  • Oil and gas supplier

This is a bad idea for 3 reasons:

  1. It wastes space in the database (not a big issue if you only have a few thousand records, but significant if you have a few million).
  2. It wastes typing time (someone is going to have to type in the repeated information).
  3. Above all it makes data errors significantly more likely (if you have 100 people working for Shell, the chance of mistyping the company name is high).  As a consequence, any future attempt to show all the people who work for Shell may omit some data - which is really, really bad news.

To avoid repeated information as above, follow the main rule of database design (as shown in the hint below).

If you find yourself typing in the same information twice, you should extract it to another table.

For the above example, it is I hope by now reasonably obvious that there are two "pegs" to hang data on:

Peg What it contains
Companies A list of all the companies
People A list of all the people who work for these companies

By storing the company name and description in a separate table, we'll avoid having to type things in twice.  Given this insight, we should now be able to assign the bits of information in our spreadsheet to the two tables.

Some people call these pegs "entities", corresponding to physical things.  For example, in a movies database like IMDB entities would include the fllms themselves, actors, directors, studios, certificates, genres, ratings and languages, each of which would have a separate table.

Step Two - Identifying the Fields/Columns

We can assign the fields/columns shown in our spreadsheet to tables as follows:

Field/Column Example Table
First name Alan People
Last name Arbib People
Date of birth 21-Feb-89 People
Company Shell Companies
Description Oil and gas supplier Companies

Thus database design is simply a matter of good housekeeping: putting fields/columns into the tables to which they belong.  This gives us the following two tables:

Two tables created

The two tables have no links between them as yet (you can't tell which people work for which companies).

 

However, the tables are not yet linked in any way - but before we can consider this, we must first create primary keys for each table.

Step Three - Adding Primary Keys

For every table you create in a database, you should create a primary key: a field/column whose value is different for each row/record. 

For the tblPerson table, for example, you can't choose any of the existing fields/columns - otherwise you will only be able to have one person called Alan or Arbib or born on 21st February 1989 in your table.  The easiest solution is usually to create a new field/column to number the records:

Tables with primary keys

Our revised tables: each now has an additional field which numbers the rows or records.

 

Thus the PersonId field/column, for example, will hold a different number for each person:

The table of people with primary key

Here Fiona McBride is selected, who has unique id number 3.

 

I think creating an id field/column to automatically number the records in a table is always the best and simplest solution, even when another candidate field/column exists.  So if every product that your company sells has a unique code, I'd still create a ProductId field/column and use this as the primary key instead.

Step Four - Adding a Link Field

The only problem now is that we have two tables, with no way of linking them:

tblPerosn table tblCompany table
The table of people The table of companies

To tie the two tables in together, we need to add one more field/column to the tblPerson table, identifying for each person to which company they belong:

CompanyId field for each person

The new CompanyId field/column ties each person into a company.

Thus (for example) Alan Arbib works for company number 1, which is Shell.  The field called CompanyId in the tblPerson table is sometimes called a foreign key.

Hence the beauty of relational databases: instead of storing all of Shell's details against Alan Arbib, we're now just storing a 4-byte integer, and using this to look up the other details.  In the front-end menu of our application, we can make sure that our user doesn't need to know the CompanyId for each company by providing a drop-list instead.

Step Five - Creating the Relationship

Potentially we can now tie each person to the company they work for.  To make this formal, we create a relationship:

The relationship created

The link between the two tables, as it appears in Access (SQL Server is similar).

 

Every relationship is one-to-many: in the example above, for each company we can have potentially an infinite number of people working for it. 

Referential Integrity and Cascade Deletion

The last buzzwords of this page, promise!  For every relationship that you create, you should normally enforce referential integrity and cascade deletion.  Here's what they mean:

Term What it means
Referential integrity It's impossible to have orphans in the system (for our example, a person can't have a CompanyId if no such corresponding row/record exists in the table of companies).  This is vital to preserve data integrity.
Cascade deletion If you delete a parent, you should also delete all of the parent's children (for our example, if you delete Shell from the companies table, you should also delete all of the people who work for Shell; the obvious solution is not to delete Shell in the first place!).

 

That is almost all of the theory involved in relational databases.  However, there's one more thorny issue to discuss, which is how to cope with the many-to-many relationships that exist everywhere in life.

 

This blog has 0 threads Add post