Tuesday, December 12, 2006

Normalization - Where to Stop?

Why most designers don't go beyond the 3NF?

The First thing we learn in our DBMS Topics is Normalization. Whether you are a data modeler, DBA, or SQL developer, normalization is one of those topics we all learn. We learn this early either at work or during our formal IT degree.

But take a look at most production databases. The best you will find that the database has been implemented using Third normal form (3NF). Very few databases reflect higher normal forms, such as Boyce-Codd normal form (BCNF), the Fourth normal form (4NF), and the Fifth normal form (5NF). So, why don't most database designers go beyond the 3NF?

If you want to know what these normal forms are, they can be found in my previous post for your reference.

How far should you go with normalization?
To be sure, each progressive step may impact upon overall performance. I have seen normalization taken to absurd lengths. In one of the recent discussion one person came out with the idea of different financial document types. As though the world of accounting is going to change. I had to remind him that Accounting is just the recording of historical events :)

A while ago I was reading an article and there the author mentioned about the normalization. Over a period of time I learnt to ask few questions before I decide how much normalization is required.
1. What is the nature of the system. Is it an OLTP or OLAP system?
2. What is the nature of DB Query. Are they mostly Insert or Retrieve?
3. For Part of DB where the inserts are more, its better to have the Data in 3rd normal form.
4. For system where Retrieve operation is more than Inserts, 2nd Normal form is the best.

Where you draw the line in the sand is ultimately up to you, but you will be better equipped to draw it with a sound understanding of the various normal forms and the risks of not going far enough.

Until Next Time... :)

1 comment:

Steve said...

The implications for NOT normalising beyond 3NF are exactly the same as not normalising beneath 3NF. Namely data anomalies. There is no benefit to be gained at all by avoiding normalisation beyond 3NF, to the contrary to not normalise to 4 or 5NF if such an issue exists is just the same as not normalising to 2NF.

Quite simply the entire concept of normalisation relies upon an understanding of functional dependencies, that is attributes which determine other attributes at an atomistic or single set level.

To ignore this by some spurious argument about "absurdity lengths" is to miss the point about normalisation per se.

You cannot take normalisation to absurd lengths, you either normalise out data redundancies fully, or you don't. If you don't then you run the risk of inconsistent data.