Wednesday, July 9, 2014

SOLID DB principles

In my last post, I rattled off a little bit about creating a db model that conforms to SOLID development principles. I presented the idea to some colleagues and it seemed to go over well. Today I will attempt to expound upon the principle and how a good database design can conform to each piece of the SOLID acronym.

S - single responsibility - each table should have a single responsibility. This means no god tables with nearly infinite nullable columns that can capture more than one type (sub-type actually) of thing.

O - open-closed - open to extension, closed to modification. Create linking tables or type tables to extend a base table, don't change the base table. Add views to create reusable components. Always create list/type tables so that new types of data can be linked to metadata. Reuse data when possible (no is no). If you really, really need to change the attributes frequently, consider EAV - but with caution.

L- Liskov Substitution - Comment is a base table, PostComment is a sub table. They map to types in object-land. Learn TPT, TPH etc.

I - interface segregation - there aren't really interfaces. Just make sure again that your tables, and perhaps views and other objects such as stored procedures/functions (especially these) aren't doing more than one thing.

D - dependency inversion (DI) - this applies more to the DAL than anything. Using DI allows for switching the database technology later, or even spread your data model across technologies. Either way all other code/modules/layers should be agnostic of the data store - abstract that baby away! One could go so far as to write adapters that adapt the db access technology to a custom interface should one want to switch from something like Entity Framework to NHibernate. EmberJS does something like that in the ember-data library. Consumers make all calls through ember-data objects, but can map the calls to whatever their storage technology requires. I truly can't think of any way to apply DI in SQL, perhaps noSQL implementations will have varying abilities to do so with map/reduce functions. I would say that the best way to practice good DI with SQL is to avoid any sort of business logic in stored procedures. I have seen some pretty tricky DI being simulated in stored procedure chains, perhaps some EXEC calls on dynamic SQL and things like that. I avoid those sorts of things like the plague these days unless it's generated SQL via EntityFramework.

Anyways, that's my attempt to SOLIDify database design. I feel kind of floppy about the whole DI rambling, if anyone has some good ideas about that one feel free to chip in!

Happy Coding!

Monday, July 7, 2014

SOLID data relationships

One approach I like to take when designing a database schema is to focus on the relationships before adding columns that are not involved in the relationships (in a relational database that is).

The structure and integrity are the most important component of a good database design. One approach is to model the data structures first. If you are working in an object oriented language you could build your objects first with minimal properties. By doing this and thinking about SOLID development principles, we can keep our database design in alignment with our code development practices.

in C#:
public class Person
{
public int PersonID {get;} {set;}
public virtual ICollection<Task> Tasks {get;} {set;}
}
or even using "class/table diagrams":

Person
-PersonID int PK
-Tasks Task[]

Task
-TaskID int PK
-Assignees Person[]

in this model we can already see that a person can have many tasks and a task can be assigned to many people. We can also see that the task class will not map directly to relational tables. There is only one task, this task can be assigned to multiple people. Therefore we will need a linking table to represent the many-to-many relationship between people and tasks. Likely we would want to collect more data about that task assignment, so probably the intermediary table should have it's own model.

TaskAssignment
-TaskAssignmentID int PK
-TaskID int FK
-PersonID int FK
-details...

Person
-PersonID int PK
-TaskAssignments TaskAssignment[]

Tasks
-TaskID int PK
-TaskAssignments TaskAssignment[]

now we can track each assignment and later add props like CompletedDate, hook up another link table to comments so you can save multiple comments with metadata related to each one.

While many-to-many relationships with a single comments table would work, there is a slight issue with data-integrity. A single comment should relate to a single TaskAssignment, and there should be zero-to-many Comments for a TaskAssignment. Or anything we'd like to tie comments to. Note that we might be tempted create a combo PK on TaskAssignments using TaskID and PersonID and do away with TaskAssignmentID, but what if it's a long running task and a person can be assigned to it again?

One way is to add a comments table each time we need comments to relate to another entity. Our query, should we want to view all comments related to some specific root would be really complicated. Another approach would be to have a single Comments entity with a CommentType attribute and whenever we need to add comments to some other table we can add an entity that relates to a specific comment and the target table.

Comments
-CommentID int PK
-CommentType CommentType FK
-details...

PersonComments
-CommentID int PK, FK
-PersonID int FK

TaskComments
-CommentID int PK, FK
-TaskID int FK

TaskAssignmentComments
-CommentID...etc

now we don't have to violate Open-Closed principle to add comments, our comments can be typed in code using TPT inheritance (Comments is the abstract base), and we maintain data-integrity.

The CommentType is so that we can view the Comments table in a query and know what type of entity each one relates to without joining all ___Comments tables in the query. We will have to if we want to know which entity each comment relates to. In .NET, EntityFramework helps significantly with this.

Hope anyone who gets this far will have some neurons firing with regards to structuring a database to support SOLID development practices.

Saturday, July 5, 2014

Primary Key: ID or Combo Key?

Debating a coworker about a table schema in SQL server. The debate is about a parent child relationship. Should the child have its own identity?

Here is the situation (in a generic way).

PeopleTable (a proxy of a table in another db)
- PersonID PK

PlanYearTable
-PlanYear PK

SpecificPlanPersonYearDetailsTable
-PersonID FK PK
-PlanYear FK PK
-other details columns

I am of the mind that the third table should have its own ID rather than the combination PK that is also an FK from two other tables. Even though there will only be one instance per person, per year.

Here's my rationale -
the specific plan is its own entity, even if it can only relate to the specific person and year. SQL Server needs to work harder to order by two columns also to join on two columns (as so users running queries). And if you ever want to create a rest API, what does the URL look like?

The other side of the argument is rooted in difficulties encountered with a database that created its own local ID for referencing entities from another db.

DB Main

BucketTable
-bucketId PK

DropletTable
-dropletId PK
-bucketId FK PK

DB Two

BucketsTable
-bucketId PK

DropletsTable
-LocalDropletId PK
-RemoteDropletId
-bucketId FK

this was done because the droplets in main are partitioned per bucket
Bucket 1 - Droplet 1,2,3.
Bucket 2 - Droplet 1,2,3.
so that there is no way to refer to a droplet without referring to the bucket as well. So this schema was born of another schema and was an attempt to simplify referring to a single droplet entity in the db.

there are other structures that can make our database more open to future change. EAV for the plans would be a great option! Plans change, new ones are added. EAV would handle this beautifully.

PlanEntities
-ID
-Type

PlanAttributeTypes
-ID

ValueTypes
ID

PlanAttributeValues
-ID
-PlanEntityID
-AttributeID
-ValueTypeID
-Value

Of course in this case the plans will not change that often or by that much, so EAV is un-necessary since it adds complexity.

In any case, my preference is to have an ID for each table. The PersonID and Year are still necessary and can be used for identifying the relationship. But in this way the plan has its own identity. Other entities relates to the plan itself can refer to the plan directly.
Guess we will see how it goes.

Thursday, July 3, 2014

Time is Fleeting

When you think something will take only 2 hours, and didn't really look at what needs to be done you may end up spinning your wheels and burn up a whole day before you know it.

I learned this lesson (once again today) and on reflection should have put a bit more thought into the whole amount of work that needs to be done rather than just the development task in a bubble.

One thing I definitely should have done is to consider the technology at hand and how foolish it is to expect it to behave as I would've wanted it to. K2 SmartForms (WebForms via custom designer interface) doesn't always play nice with customizations. That's one input to the time equation.

The next is the custom control that plugs in an autocomplete. I don't know it's limitations either, until now.

Then there is deployment, set-up time, configuration, troubleshooting, testing, etc.

Don't forget to think of all the work and any technical debt or knowledge debt involved when giving an estimate. Put it on a list, or in your work tracking system along with the coding tasks so you and your managers can plan accordingly.

Wednesday, July 2, 2014

NuGet HintPath Fix/Workaround

So we're using nuget and everything is awesome! We have a solution with projects that are maybe custom extensions or helpers that we will use in all our projects and life will be great! Now we bring those projects into another solution for part of this big project. We keep our concerns separated in separate solutions of course and share some projects that will help keep things cool and aligned. Then we build the solution and WHAT HO! What is this crazy nuget error? So we spend a bunch of time trying to resolve this (how many dev hours worldwide?) We find that the issue lay in the way nuget installs a ref to the packages folder in our projects. The path used is relative to the project. If we don't have them in the other solution's packages folder they don't exist and the build fails. You can resolve this in three ways:

Option 1: Edit the proj files to use $(SolutionDir)/packages/... well this doesn't work when the proj is built from msbuild directly without a solution. But it's pretty good.

Option 2: Build the dependent solution first.

Option 3: Add this to the proj file:
  
	
 <choose>
 <when condition="$(SolutionDir) == '' Or $(SolutionDir) == '*Undefined*'">
  <propertygroup>
   <nugetrepositorypath>..\</nugetrepositorypath>
  </propertygroup> 
 </when>
 <when condition="$(SolutionDir) != '' And $(SolutionDir) != '*Undefined*'">
  <propertygroup>
   <nugetrepositorypath>$(SolutionDir)</nugetrepositorypath>
  </propertygroup> 
 </when>
  </choose>
.
.
.
replace '../' in nuget HintPaths:
<hintpath>$(NuGetRepositoryPath)/packages/...</hintpath>

Visual Studio templates vstemplates part 2

Yesterdays lesson was a catchup to today's lesson learned. Actually there were several which will end up in separate posts since the topic vary. This lesson is regarding creating new solutions from template collections. After a custom template is made available and you create a new solution from that template, all of the projects are created in their respective folders under a folder the same name as the solution.
-db
  -dbAccess
  -dbModel
db.sln

The good: the solution is well organized.

The bad: if you have NuGet packages installed as part of the template, the references will be mangled since they want to point to ../packages but you need them to point to ../packages.

The ugly: there are several ways to resolve this issue until the NuGet gets rid of this ref issue for good (another lesson learned today).

Option 1: close the solution, move the project folders up one folder, open the sln file in Notpad++ or some other text editor, change the project refs to reflect the new location.

-db.Access
-db.Model
db.sln

Option 2: edit the proj files in the template collection folders to ref nuget package one level up.

Option 3: edit the proj files in your new solution to point to the correct place.

Option 4: uninstall and reinstall all the nugets in the solution. Better take notes of versions and use the nuget commands because the manager won't give you the same version.

Option 5: edit the proj templates to reflect the fix in my next post.

there may be other options, so far I went with the first one. Found that you have to close the solution before you edit or it will resave from the cache.

Another thing I learned about vstemplates is that they get cached by VS after you create a couple. If you edit the templates, you would need to clear them from the template cache. Once you do that, you also need to restart VS. The template cache is located in appdata/roaming/microsoft/visual studio/{{version}}/project templates or something like that.

Tuesday, July 1, 2014

Visual Studio 2013 project templates

.vstemplate is the extension.

While in the process of creating some boilerplate templates for solutions I learned quite a bit. The reason for creating these templates is that we have identified our best practices for using WCF services to wrap dbs and 3rd party APIs in services that we control. Other services related to business logic, workflow and logging have their own solution patterns that are pretty standard. I was searching for a way to automate new services via solution templates. I found that Microsoft surfaces the ability to create custom project templates for Visual Studio that can be used to do just that. There are a few caveats and I can't get everything I need to fully standardize the process without some manual steps or writing custom code, but it's a good start.

There are two basic styles of project templates to choose from - single project and multi project. There is no concept of a solution template. A multi-project template is what I was able to use to create solutions that contain a collection of projects. Here's how it works and some lessons learned that they don't tell you in the limited documentation.

A template collection is designed in a vstemplate file. This file references a vstemplate file for each project you want to include in the solution. The individual project templates point to the items to include in the project (proj file, classes, folders etc). These are single project files.

The documentation says to create a project and export it via VS File menu. This does some work to the proj file and creates the vstemplate. The collection of files is zipped up and put in the My Templates folder of the VS/Projects folder in Documents. If you selected the import option (checked by default) it also puts a copy in the ../Projects/templates folder that makes it available to VS when you create a new project.

The way I created the "solution template" was to create a subfolder in the Visual C# folder of ../templates for that solution type. I moved each proj zip into that. I unzipped the folder contents to their own folder. Then I deleted the zip folders. The multi-project vstemplate is at the solution template root under ../templates/Visual C#. The vstemplate refs each project template.

The good: We can now create projects with all the necessary projects included.

The bad: Haven't found a way to include solution level items in the resultant solution.

The Ugly: We use post-build scripts to set up the service locally in IIS when the configuration is set to Debug. The AppPool name is ProductServiceVx where Product should be only the product name. I thought I could use $safeprojectname$ which should be the user entered value in the new project dialog. This is not so in our situation.

So there are some manual steps for us. Here they are in a nutshell and how we are in the situation.

Create a new project in VS.

Pick a single name that represents the product - for EmployeeDataService, call it EmployeeData. All of the projects will be named using EmployeeData.xxx as the namespace and assembly name! Even the existing classes. In fact you can add $safeprojectnane$ in any files as long as you define replace as true on the element in the vstemplate.

But Wait! I want the proj names to be PeopleData.Interfaces, PeopleData.DTOs etc... They are, but then the classnanes become PeopleData.DTOs and my post build has PeopleData.DTOsVx as the AppPool name. Not what I want. Try as I might, I cannot find documentation on how to affect this declaratively. Hence the manual steps to crack open the projects and make adjustments. This opens up some risk of errors. Its palpable though since it takes so much work to set up a solution with 10 projects including dependencies.

Speaking of dependencies, enable NuGet package restore and bring in external dlls and projects that these depend on. But again, this is minimal compared to the overall time it takes to set up the solutions.

Speaking of time consuming, it did take a while to create the initial solution, then export each project and define the master vstemplate, but I hope it proves to be worthwhile in the end.