Showing posts with label Database. Show all posts
Showing posts with label Database. Show all posts

Tuesday, January 31, 2017

Database Development The Iterative Way

As I recently became more involved in developing a data layer (including database) I've been learning a new style of database migration. Previously, I've written plain SQL for database migrations for each deployment. Those were handed off to a DBA to run in each environment.


These days, I've entered the realm of Liquibase. Liquibase is my first experience with this sort of tool that allows for source controlled, database versioning. It's written in declarative style and translates the declarations into the underlying SQL. It's cross-platform and supports multiple declarative languages (JSON, XML).


Here's how it's changed my development process: In the old days, I used to write the SQL and run each file or put it all in one file and run as a batch. Then write some queries to validate, interrogate the schema etc. I would do mostly up-front design of the database components. Most of the queries and commands were generated by EF or in code. Other factors have changed that as well.


Nowadays, I'm writing a lot more stored procedures with none of the sql in code or generated by code. I'm writing tests first. Then I'm writing stored Procs, then creating/modifying tables declaratively via Liquibase. Sometimes I don't care about the table of the column until it's really needed. Sometimes the name of a column morphs during an iteration (a short cycle of failing test-code-passing test on the order of minutes). Nowadays, no problem! It's relatively simple to change a column name.


The big trick/advantage is that everyone else can pull in those source controlled changes and have their local databases updated by running a batch file or using a git hook. It only applies changes that are new on the target database, so if they haven't updated in awhile it'll apply all those changes until they are up to date.


It's all good! I never really had a chance to dig into EF code-first migrations to this degree. If its the same or similar in efficiency, I would recommend it for sure! Or any tool that enabled rapid changes to the backing store which has acceptable trade-offs (here's looking at you noSQL).

Friday, March 25, 2016

If You Write The DataBase First You Are Doing It Wrong!

In an epiphany, I figured out why developers want to make the database first. In root cause analysis fashion, let's play the 5-whys game and find out.


1. Why start with the DB? You need data in order to start developing the rest of the app.


2. Why do you need data first? Because you aren't practicing TDD.


3. Why aren't you practicing TDD? Because it takes you longer.


4. Why do you think it takes longer? Because it means you have to write more code.


5. Why does TDD necessarily mean that you have to write more code? Because you've written Unit Tests before and its difficult to write them when your business logic depends on the data, so the Unit Tests become lengthy, take a long time to run, and are costly to maintain.


So it comes down to a myth that is invented from having done testing wrong in the first place. Perhaps there's an assumption that TDD is about testing, when it is really about design. Dr. Dobbs sums up the concept in this article, basically pointing out that the tests make you think about the code. There are endless sources on the web, in books and in magazines that can take you through the details. I will be staying focused on how TDD helps avoid the costs of developing the data layer first.


If your development efforts start with a test, the first thing you may soon notice is that you will need to provide some kind of data to the business logic. However, rather than writing a database right then and there you will use one or more of several patterns for substituting an actual DB in your code - repository pattern, resource, fixture, stub, mock, etc. This will allow you to focus on what the app DOES instead of the low-level details of a datastore. You will control the data that drives the logic in a scientific way for each scenario by providing the combinations of values that are expected for each scenario. The art is in knowing how to write the tests, which takes practice.


Imagine if you had a DB first and you operated under the assumption that certain bits of data would be needed for certain methods of accessing the data would be needed. Now when it turns out they are not needed, or that your assumptions were incorrect, you've actually just done a lot of unnecessary work - e.g. wrote more code which took longer and wasn't needed.


Eventually, and maybe in the first few iterations of test-then-code, you will begin to write some models that can be represented by a data model. As your application takes shape, you should be safely refactoring so that the entire application becomes more cohesive in time. You have the tests to back you in the refactoring process. One of the reasons to start TDD at the system interfaces.


Additionally, as you add features during the project, your manual testing or even Unit Tests will take longer to execute and you will end up creating and deleting data and having to update data update scripts to set up a bunch of data which is more code to maintain. In the end you will end up doing more work than of you'd written the tests while designing - iterating test, then code to make it pass, then test...bit by bit.


When you eventually get to the point where you will need to integrate, you will now understand what you really need to persist to a database, but not until it is needed. If you start in this way, the way of TDD, then you will know that you do NOT need the database to write the application and you will see it as the outer layer of the onion that it is.


One final nugget - the most important reason for a repository pattern is NOT so that you can swap the underlying data store technology, though it is a compelling myth. More details about this and how to start TDD in future posts.




Wednesday, July 9, 2014

SOLID DB principles

In my last post, I rattled off a little bit about creating a db model that conforms to SOLID development principles. I presented the idea to some colleagues and it seemed to go over well. Today I will attempt to expound upon the principle and how a good database design can conform to each piece of the SOLID acronym.

S - single responsibility - each table should have a single responsibility. This means no god tables with nearly infinite nullable columns that can capture more than one type (sub-type actually) of thing.

O - open-closed - open to extension, closed to modification. Create linking tables or type tables to extend a base table, don't change the base table. Add views to create reusable components. Always create list/type tables so that new types of data can be linked to metadata. Reuse data when possible (no is no). If you really, really need to change the attributes frequently, consider EAV - but with caution.

L- Liskov Substitution - Comment is a base table, PostComment is a sub table. They map to types in object-land. Learn TPT, TPH etc.

I - interface segregation - there aren't really interfaces. Just make sure again that your tables, and perhaps views and other objects such as stored procedures/functions (especially these) aren't doing more than one thing.

D - dependency inversion (DI) - this applies more to the DAL than anything. Using DI allows for switching the database technology later, or even spread your data model across technologies. Either way all other code/modules/layers should be agnostic of the data store - abstract that baby away! One could go so far as to write adapters that adapt the db access technology to a custom interface should one want to switch from something like Entity Framework to NHibernate. EmberJS does something like that in the ember-data library. Consumers make all calls through ember-data objects, but can map the calls to whatever their storage technology requires. I truly can't think of any way to apply DI in SQL, perhaps noSQL implementations will have varying abilities to do so with map/reduce functions. I would say that the best way to practice good DI with SQL is to avoid any sort of business logic in stored procedures. I have seen some pretty tricky DI being simulated in stored procedure chains, perhaps some EXEC calls on dynamic SQL and things like that. I avoid those sorts of things like the plague these days unless it's generated SQL via EntityFramework.

Anyways, that's my attempt to SOLIDify database design. I feel kind of floppy about the whole DI rambling, if anyone has some good ideas about that one feel free to chip in!

Happy Coding!

Saturday, July 5, 2014

Primary Key: ID or Combo Key?

Debating a coworker about a table schema in SQL server. The debate is about a parent child relationship. Should the child have its own identity?

Here is the situation (in a generic way).

PeopleTable (a proxy of a table in another db)
- PersonID PK

PlanYearTable
-PlanYear PK

SpecificPlanPersonYearDetailsTable
-PersonID FK PK
-PlanYear FK PK
-other details columns

I am of the mind that the third table should have its own ID rather than the combination PK that is also an FK from two other tables. Even though there will only be one instance per person, per year.

Here's my rationale -
the specific plan is its own entity, even if it can only relate to the specific person and year. SQL Server needs to work harder to order by two columns also to join on two columns (as so users running queries). And if you ever want to create a rest API, what does the URL look like?

The other side of the argument is rooted in difficulties encountered with a database that created its own local ID for referencing entities from another db.

DB Main

BucketTable
-bucketId PK

DropletTable
-dropletId PK
-bucketId FK PK

DB Two

BucketsTable
-bucketId PK

DropletsTable
-LocalDropletId PK
-RemoteDropletId
-bucketId FK

this was done because the droplets in main are partitioned per bucket
Bucket 1 - Droplet 1,2,3.
Bucket 2 - Droplet 1,2,3.
so that there is no way to refer to a droplet without referring to the bucket as well. So this schema was born of another schema and was an attempt to simplify referring to a single droplet entity in the db.

there are other structures that can make our database more open to future change. EAV for the plans would be a great option! Plans change, new ones are added. EAV would handle this beautifully.

PlanEntities
-ID
-Type

PlanAttributeTypes
-ID

ValueTypes
ID

PlanAttributeValues
-ID
-PlanEntityID
-AttributeID
-ValueTypeID
-Value

Of course in this case the plans will not change that often or by that much, so EAV is un-necessary since it adds complexity.

In any case, my preference is to have an ID for each table. The PersonID and Year are still necessary and can be used for identifying the relationship. But in this way the plan has its own identity. Other entities relates to the plan itself can refer to the plan directly.
Guess we will see how it goes.