Wednesday, August 20, 2014

Data Modelling and DB design


In a Sql Server Pro article from 2005
written by Michelle A. Poolet I encountered some great advice and a great example on how to normalize an existing table that is in first normal form (all the data in one table).

Not to pick on the author here, nor criticize, but I will go further with the normalization process for this database and take it from second normal form (where the author ended up) and go closer to third normal form by creating lookup tables for wood type, finish, size, and a whole other root for shipping. As the author states, we don't know enough about the business process to do so in the real world. I will do this to further illustrate the potential of doing so (after collecting proper usage details of course).

The example is for a custom furniture shop, which means that they use wood and other supplies to build furniture. They probably have some standard wood types that they use and possibly keep stock of some common types. If the database were expanded to include wood type, a lookup table would be really useful to link supply to demand.

I know, I know...build for current use cases, not future use cases. I hear you loud and clear. But let's just assume that they have some standard wood types for sake of this exercise. And let's assume that we've done our due dilligence and we are ready to take it to the next level.

If wood type alone becomes a lookup table, the root table will lose a varchar(30) [2B + nB, n=len(x)] worth of data and gain a smallint [2B] worth. Assuming that the type name is on avg 15 chars, that means a savings of 17B/record or 72 - 17 = 55. Now we can fit 25% more rows per page, about 137 rows per page. If we offloaded ALL of the varchar cols to other tables, we would have even better numbers here.

When we pull off those values into other tables, we will have to perform joins to them, but as long as the record count in those is small (I'm thinking most of it is standard anyways so they might be lookup tables that can be added to for new vals) they should be loaded into the cache and well reused. If anything, they may not be used every time users access the data. Again I don't have the use cases, but I'm making reasonable suggestions about them.

I look at any kind of notes or comments fields as something that should always be it's own table. Refer to the ProcessOrderNotes column in the ProcessOrderStatus table in Figure2 of the article. Each comment can be related in a number of ways to other tables either directly via an FK to the note, or to multiple entities via linking tables. In this way, the comments or notes become easier to query across if someone wants to search for something specific within a note. Eventually, this will be a use case so why not structure the data model to be more adaptable to begin with?

Think open-closed principle here where the model is open to extension, closed to modification. Of you have a new table that requires notes, you can just add an FK or a new linking table depending on what is needed. Now the app code can go on about calling it's addNote function, get the PK and write it to the table as an FK as needed. No reason to reinvent that. Well, suppose that all notes now need a timestamp and username. Guess what? Just add a table related to the Notes table, update the code to save to that each time the note is updated. Or store a new note record each time for auditing purposes and show them all - use cases?

Now I'm getting carried away, so I best stop here.

No comments:

Post a Comment