Showing posts with label PK. Show all posts
Showing posts with label PK. Show all posts

Friday, August 15, 2014

RMDB Surrogate v Natural Key

An integral part of transactional relational databases is the Primary Key (PK). A PK is a value that defines the data row and that related tables use to relate via a Foreign Key (FK).

Four standard criterion are commonly used to measure a PK.

1. Is it unique?
2. Is it immutable?
3. Is it minimal?
4. Is it never null?

A key can consist of one or more columns in a table as long as the combination meets the criteria.

My recommendation is that you stick to a Surrogate Key if you don't know. Here's why-

Natural keys made under an assumption that some business rule will not change can cause significant problems later when it does.

Most arguments for natural keys place emphasis on ease of use. If you find yourself joining multiple columns because your PK is a combo key then you lose that value. Create views to ease your suffering from joining a few more tables in your queries.


The exception I would make is for SOME lookups. Esp when a well defined code is used as the natural key. Bit even then, shit changes all the time. Suppose you assumed that the US state codes were immutable. Or telephone area codes. You would have to write scripts to update all those FKs that use 'Calif' or use some hacked in solution to fix your data and update all the apps that consume the data to support the hack. Contain volatility in your data!

Saturday, July 5, 2014

Primary Key: ID or Combo Key?

Debating a coworker about a table schema in SQL server. The debate is about a parent child relationship. Should the child have its own identity?

Here is the situation (in a generic way).

PeopleTable (a proxy of a table in another db)
- PersonID PK

PlanYearTable
-PlanYear PK

SpecificPlanPersonYearDetailsTable
-PersonID FK PK
-PlanYear FK PK
-other details columns

I am of the mind that the third table should have its own ID rather than the combination PK that is also an FK from two other tables. Even though there will only be one instance per person, per year.

Here's my rationale -
the specific plan is its own entity, even if it can only relate to the specific person and year. SQL Server needs to work harder to order by two columns also to join on two columns (as so users running queries). And if you ever want to create a rest API, what does the URL look like?

The other side of the argument is rooted in difficulties encountered with a database that created its own local ID for referencing entities from another db.

DB Main

BucketTable
-bucketId PK

DropletTable
-dropletId PK
-bucketId FK PK

DB Two

BucketsTable
-bucketId PK

DropletsTable
-LocalDropletId PK
-RemoteDropletId
-bucketId FK

this was done because the droplets in main are partitioned per bucket
Bucket 1 - Droplet 1,2,3.
Bucket 2 - Droplet 1,2,3.
so that there is no way to refer to a droplet without referring to the bucket as well. So this schema was born of another schema and was an attempt to simplify referring to a single droplet entity in the db.

there are other structures that can make our database more open to future change. EAV for the plans would be a great option! Plans change, new ones are added. EAV would handle this beautifully.

PlanEntities
-ID
-Type

PlanAttributeTypes
-ID

ValueTypes
ID

PlanAttributeValues
-ID
-PlanEntityID
-AttributeID
-ValueTypeID
-Value

Of course in this case the plans will not change that often or by that much, so EAV is un-necessary since it adds complexity.

In any case, my preference is to have an ID for each table. The PersonID and Year are still necessary and can be used for identifying the relationship. But in this way the plan has its own identity. Other entities relates to the plan itself can refer to the plan directly.
Guess we will see how it goes.