efficient in terms of storage and easy to maintain. ▪ The first point, of storage, means redundant data should be avoided and the second point, of maintenance, means that a good design will logically separate data into tables. ▪ Normalisation is a design method that can be used to achieve this. ▪ What is normalisation? ▪ “a technique for designing relational database tables to minimize duplication of information and, in so doing, to safeguard the database against certain types of logical or structural problems” (wikipedia.com)
developed that allows a database to be designed with different levels of efficiency. ▪ These sets of rules are called the 'Normal Forms' and are numbered from 0 to 5. So there is a first, second, third, fourth and fifth normal form. ▪ The short hand for these are 0NF, 1NF, 2NF, 3NF, 4NF, 5NF ▪ You do not need to be concerned with the fourth and fifth normal forms as these are not required on this course. We will only discuss first, second and third normal forms. ▪ You should be aware that normal forms are only guidelines for the database designer - they do not have to follow them! ▪ The designer will choose the form needed depending on what kind of database is to be developed. They may decide not to normalise the database at all, in which case it is called an 'unnormalised database' or 0NF.
be in first normal form (1NF), the following rules have to be met for each table in the database – There are no columns with repeated or similar data – Each data item cannot be broken down any further – Each row is unique i.e. it has a primary key – Each field has an unique name
a data item that cannot be broken down any further. ▪ Examples of atomic data – An identity number – National Insurance number NY344599 – ISBN book reference e.g.1-931841-62-4 – Stock code PN10B – A first name, ’Dylan' – A surname, ’Roberts' – A telephone number – A school name: 'Abington Hall Comprehensive School' – A complete description 'A fountain pen is a writing instrument'. ▪ The point of the two last examples indicates that 'atomic' does not mean 'single word'. It simply means that it would make no sense to reduce the data item any further as it would lose its meaning.
name: John Hunt when a first name and surname is present in the database – A full address: 6 Picton Road, London, WR1 4PG – Data that are part of a larger dataset e.g Oxford Book club and Coventry Health club ▪ In the last example, the database in question is storing club details and there are many different kind of clubs (Health, Book, Chess etc) being stored. So it would make sense to split this data into {Location, Club type}. So the context of the database needs to be understood when deciding if data is atomic or not.
to store contact telephone numbers for each person. The designer has created three fields to hold telephone numbers. This is what is meant by 'repeating data'. The telephone numbers are the same kind of data.
There are no columns with repeated or similar data – Each data item cannot be broken down any further. – Each row is unique i.e. it has a primary key – Each field has an unique name ▪ Which of these tables are NOT in first normal form? 1 2 3 4
is no primary key defined and so this record cannot be guaranteed to be unique. Full name is redundant - data is not atomic - as it is simply a combination of Firstname and Surname.
has a primary key, so it passes that test Data is atomic The colour the shoe can come in is being repeated - and furthermore the fields have the same name - so not in 1NF
meets all the rules for the first normal form. ▪ Question to ask yourself to spot 1NF – Does it have a primary key? – Are each field name unique? – Is the data atomic? – Are there repeating / redundant fields?
compile a database for a fan club web site. Fans visit the web site to find like-minded friends. ▪ The entities to be stored are ▪ This indicates that each band has many fans. ▪ Each person is a fan of only one group. Band Fan
time this person has designed a database and is not really sure of how to go about it. He designs the FAN table and loads it with the following records ▪ He has correctly set up a primary key. He also used a foreign key to refer to the band. ▪ However this is not in 1NF because Mary has two email addresses loaded into the email field. The data is not atomic. Loading data in this way is also going to make it very difficult to extract email addresses. Also the data length of the email field now has to be long enough to handle many email addresses, this is very inefficient and would be a problem if exceeded.
is not a good idea. So then he decides to create two email fields ▪ This is also a poor approach - note that email2 is not being used in Tom's record and so is causing wasted storage, so not 1NF which seeks to avoid wasted / redundant data. ▪ Another problem is what if a fan has many more emails? Adding more and more email fields will make the wasted storage even worse. ▪ Another problem is that the query to extract email addresses is far more complex than it needs to be as it has to examine each email field.
he comes up with a good solution - create another entity called 'email' and use a foreign key in that table to link the fan and email tables together. The ER diagram is as follows: Band Fan Email
can have many emails, but an email can only belong to one fan. ▪ The FAN and EMAIL table now look like this ▪ FAN ▪ EMAIL ▪ Mary (FanID = 2) has two entries in the email table. There is no problem adding even more emails for her. Extracting emails is now simple as there is only one email column. There is no wasted storage. ▪ The tables are now in first normal form (1NF) as they obey the following rules – Each table has a primary key – Each field name is unique – Data is atomic – No repeating / redundant fields
have a single-attribute (i.e. simple) primary key. Like this: CUSTOMER ▪ But sometimes a table has a primary key made up of more than one attribute i.e. it has a compound primary key. CONCERT ▪ The table above is using both the venue and artist as the compound primary key.
situation that the extra rule for second normal form comes in handy. The rule states: – Non-key attributes must depend on every part of the primary key – The table must already be in first normal form ▪ So inherently, any table that is already in 1NF and has a simple primary key is automatically in second normal form as well. ▪ Consider the Concert example again - this is NOT in second normal form. Notice the attribute called Style. This is describing the style of artist - it has nothing to do with where the concert took place! And so its value does not depend on EVERY part of the primary key, so the rule for second normal form is not being met. ▪ The reason for this rule is to ensure there is no redundant data being stored.
another Girls Aloud concert to the table: ▪ Notice that the 'girl band' value is being repeated and so is causing the database to be bigger than it needs to be.
table is formed that has Artist as the simple primary key CONCERT STYLE ▪ Now the rule for 2NF is being met by both tables - every non-key attribute is depending on the complete primary key. There is no redundant data.
be in third normal form, the following rules have to be met – It is already in 2NF – There are no non-key attributes that depend on another non-key attribute ▪ What this is trying to do is to spot yet another source of redundant data. ▪ If the value of an attribute can be obtained by simply making use of another attribute in the table, then it does not need to be there. Loading that attribute into another table and linking to it will make the database smaller. ▪ To clarify, consider the table below: CONCERT ▪ Notice that the country could be obtained by referring to the City - if the concert was in London then you know it is also in the UK - no need to look at the primary key!
database into third normal form, split the table as follows CONCERT COUNTRIES ▪ The new table called COUNTRIES has City as the primary key and country as an attribute. The Concert table has City as a foreign key. So now you can obtain the country in which any particular concert took place and there is no redundant data.
already in 2NF – There are no non-key attributes that depend on another non-key attribute CUSTOMER ▪ This is not in strict 3NF as the City could be obtained from the Post code attribute. If you created a table containing postcodes then city could be derived.
means could be obtained from the certificate attribute - it does not need to refer to the primary key VideoID. So split it out and use the primary key / secondary key approach.
database is almost in 3NF - for some reason the Cinema Address is being repeated in the Client table, even though it can be obtained from the Cinemas table. So simply remove the column from the client table
redundant data, it is smaller in size so less money needs to be spent on storage ▪ 2. Because there is less data to search through, it is much faster to run a query on the data ▪ 3. Because there is no data duplication there is better data integrity and less risk of mistakes. ▪ 4. Because there is no data duplication there is less chance of storing two or more different copies of the data ▪ 5. One change can be made which can instantly be cascaded across any related records.
with trying to make data atomic. Just because you can split some types of data further, it isn't always necessarily the correct thing to do. For example, telephone number might contain the code followed by the number 01234 567890. It wouldn't be sensible to separate out these two items. ▪ 2. You can end up with more tables than an un-normalised database ▪ 3. The more tables and the more complex the database, the slower queries can be to run ▪ 4. It is necessary to assign more relationships to interact with larger numbers of tables ▪ 5. With more tables, setting up queries can become more complex