Wednesday, March 30, 2016

What Is a Repository Pattern For?

There is a myth that the reason for using a Repository pattern that goes a little something like this: "You use a Repository pattern so that you can swap out the database technology if you want to switch to something else." In this post, I will bust this myth.

First off, I'm not saying that the swapping rationale is not valid. I've encountered one instance where we actually wanted to switch the data-store technology but the cost and risks of doing so would be well beyond what we were willing to invest since the data access code bled into all the other code! In this case, had the original developer used some kind of abstraction or at least a clean data layer, we would have been able to reduce certain risks at a reasonable cost.

Chances are that you will not change the data-store technology. However, what you will likely want to do is some automated testing. If you are going to do that - and since it's a fast, cheap, and reliable way to increase quality and reduce bugs, you should do that. Hell, if you are doing any sort of testing at all besides end-to-end testing this applies to you too. Testing is the main reason for using the Repository pattern or a similar abstraction of the data.

You could run tests that are dependent on the database, or you could choose no to - especially early in development. If you choose to not depend on the db, then you will need to supply some stand-in values for the core business logic to function. In order to supply those stand-in values, you may want to read them from a different source like a flat file, or use one of the many test-doubles to suit your needs.

For any reasonably complex system - let's say one with multiple data sources - you may not have full control over the data source. You may not be able to change the values. Or maybe others are changing the values for some other efforts and adversely impact your efforts. When you are doing your development work and suddenly your program no longer works as expected or you cannot verify your work due to some other work which impacts the program's dependency - your progress will grind to a halt while you sort it out.

So what do you do? You could copy the db while in a known good state; or you can write up your own db and use that for a source. You could write a bunch of insert statements to set up the database with whatever values you need. You could even write new values to and read from the database for each test case. You could even add some special logic just for the tests that write to the database, even if your program does not require you to do so. However, using an abstraction can lead to a cleaner approach when it comes to testing your business functions.

With an abstraction of the data layer, you can wrap all of the nasty hobbitses of sql statements, ORMs, or whatever you have cleanly behind something that looks like the code you are writing in the layer you are working on. You can supply values to the business logic by mocking, stubbing, faking, or otherwise substituting the implementation of the abstraction to suit your needs. You can use a scientific approach to testing your code as you are implementing it by changing certain variables during different test scenarios.

For an example, let's consider a case where the system needs to send an email. Let's say the recipient list comes from some service via an API and the email itself comes from an internal database. And let's say we want to use an SMTP to send the email for now. All three of those things fall outside of the boundaries of the program. In the Hexagonal Architecture sense, the business logic depends on those but not on the specifics of the implementations of those. So go ahead and abstract them from your business logic.

Your business logic should: fetch the list of recipients, merge with the email template, send the emails. It should not care how you send the email or where the recipients or template come from. I would focus on testing the template mashing code and that the email is sent under the correct conditions with the correct values. I would try running different permutations of values through the tests - there are some techniques that I've found to work very well for testing logic this way while reducing the number of tests which can muddy up what the they should convey to other developers. Look for those in a future post.

Some of the resources that the emailer system consume can change (though mostly unlikely) and the Hex pattern can ease the transition. More importantly though, patterns like Repository aid writing and running the test code which is there to guide and make clear the intent of the business functions. The tests are there to offer an example of how the modules are to be used and to show how they should behave given specific scenarios. They are these to clean up the business logic from the data access code so you and other developers who work on the system don't have to wade through oodles of data access muck to sort out the business functioning of the system. In these ways the TCO of the system can be reduced since changes can be applied more smoothly.

Friday, March 25, 2016

If You Write The DataBase First You Are Doing It Wrong!

In an epiphany, I figured out why developers want to make the database first. In root cause analysis fashion, let's play the 5-whys game and find out.


1. Why start with the DB? You need data in order to start developing the rest of the app.


2. Why do you need data first? Because you aren't practicing TDD.


3. Why aren't you practicing TDD? Because it takes you longer.


4. Why do you think it takes longer? Because it means you have to write more code.


5. Why does TDD necessarily mean that you have to write more code? Because you've written Unit Tests before and its difficult to write them when your business logic depends on the data, so the Unit Tests become lengthy, take a long time to run, and are costly to maintain.


So it comes down to a myth that is invented from having done testing wrong in the first place. Perhaps there's an assumption that TDD is about testing, when it is really about design. Dr. Dobbs sums up the concept in this article, basically pointing out that the tests make you think about the code. There are endless sources on the web, in books and in magazines that can take you through the details. I will be staying focused on how TDD helps avoid the costs of developing the data layer first.


If your development efforts start with a test, the first thing you may soon notice is that you will need to provide some kind of data to the business logic. However, rather than writing a database right then and there you will use one or more of several patterns for substituting an actual DB in your code - repository pattern, resource, fixture, stub, mock, etc. This will allow you to focus on what the app DOES instead of the low-level details of a datastore. You will control the data that drives the logic in a scientific way for each scenario by providing the combinations of values that are expected for each scenario. The art is in knowing how to write the tests, which takes practice.


Imagine if you had a DB first and you operated under the assumption that certain bits of data would be needed for certain methods of accessing the data would be needed. Now when it turns out they are not needed, or that your assumptions were incorrect, you've actually just done a lot of unnecessary work - e.g. wrote more code which took longer and wasn't needed.


Eventually, and maybe in the first few iterations of test-then-code, you will begin to write some models that can be represented by a data model. As your application takes shape, you should be safely refactoring so that the entire application becomes more cohesive in time. You have the tests to back you in the refactoring process. One of the reasons to start TDD at the system interfaces.


Additionally, as you add features during the project, your manual testing or even Unit Tests will take longer to execute and you will end up creating and deleting data and having to update data update scripts to set up a bunch of data which is more code to maintain. In the end you will end up doing more work than of you'd written the tests while designing - iterating test, then code to make it pass, then test...bit by bit.


When you eventually get to the point where you will need to integrate, you will now understand what you really need to persist to a database, but not until it is needed. If you start in this way, the way of TDD, then you will know that you do NOT need the database to write the application and you will see it as the outer layer of the onion that it is.


One final nugget - the most important reason for a repository pattern is NOT so that you can swap the underlying data store technology, though it is a compelling myth. More details about this and how to start TDD in future posts.




Tuesday, March 1, 2016

Const and Static in C#

I learned the true difference between const and static in C# when it comes to class members. This did not come the hard way, but through a PluralSight course. I'm thankful for that!


Here's the difference:


const - set during compilation time, it seems like it is inlined. In other words if you have a const FEET from a class Yard like so:


class Yard
{
public const int FEET = 3;
}


and you use it like this:


...
var bar = baz * Yard.FEET;
...


You would be ok if Yard was in the same assembly, but if it's in another assembly and Yard.FEET changes, but the calling assembly doesn't get recompiled it'll still have its old value. Plus a const cannot be changed once compiled so you cannot compute it from something else like a config value.


With a static readonly member, you gain the benefit of computing the value at runtime AND if it the value changes (either by config or by code) and is in a separate assembly from the consumer (think GAC here) your calling code will retrieve the the new value.


Why use a constant ever? It's a bit more efficient during runtime since it gets inlined and saves a call. But let's be honest here, if you are using C# is that minor gain in efficiency really worth it? If that bit of performance makes a difference in your world, wouldn't you be using C or C++ instead? I'm sure a case can be made, but its not likely doing to be one unless you have millions of calls for the same thing. In that case there may be other optimizations that come first. Just saying...