Tuesday, January 31, 2017

Database Development The Iterative Way

As I recently became more involved in developing a data layer (including database) I've been learning a new style of database migration. Previously, I've written plain SQL for database migrations for each deployment. Those were handed off to a DBA to run in each environment.

These days, I've entered the realm of Liquibase. Liquibase is my first experience with this sort of tool that allows for source controlled, database versioning. It's written in declarative style and translates the declarations into the underlying SQL. It's cross-platform and supports multiple declarative languages (JSON, XML).

Here's how it's changed my development process: In the old days, I used to write the SQL and run each file or put it all in one file and run as a batch. Then write some queries to validate, interrogate the schema etc. I would do mostly up-front design of the database components. Most of the queries and commands were generated by EF or in code. Other factors have changed that as well.

Nowadays, I'm writing a lot more stored procedures with none of the sql in code or generated by code. I'm writing tests first. Then I'm writing stored Procs, then creating/modifying tables declaratively via Liquibase. Sometimes I don't care about the table of the column until it's really needed. Sometimes the name of a column morphs during an iteration (a short cycle of failing test-code-passing test on the order of minutes). Nowadays, no problem! It's relatively simple to change a column name.

The big trick/advantage is that everyone else can pull in those source controlled changes and have their local databases updated by running a batch file or using a git hook. It only applies changes that are new on the target database, so if they haven't updated in awhile it'll apply all those changes until they are up to date.

It's all good! I never really had a chance to dig into EF code-first migrations to this degree. If its the same or similar in efficiency, I would recommend it for sure! Or any tool that enabled rapid changes to the backing store which has acceptable trade-offs (here's looking at you noSQL).

Wednesday, January 18, 2017


I've been hearing this phrase a lot lately "words have meanings". I've also been reading more of Domain Driven Design by Eric Evans, in which he stresses the importance of Ubiquitous Language. When we communicate about our work, when we collaborate as social human beings (even the most antisocial rely on social networks for survival - see the NatGeo show Mick Dodge) we use language. It's what makes us human, its what makes us powerful. Our very fabric and existence is a product of our ability to communicate, coordinate, and work together.

So when we communicate, we are more effective when we do so effectively. We do so partially through word selection. Words do have meaning. And, they have meaning in context. In the past, I've been exposed to the usage of Unit Testing used where the word Functional Testing was more appropriate in my context. To me (and I assumed the rest of the world), Unit Testing is when a developer writes automated tests at the lowest level of code she owns. To the other person, it meant manually testing each functional unit as exposed to the user - which I know as Functional Testing.

Here was a case where we were working at cross purposes whenever we were talking about Unit Testing. She pulled rank and had the term redefined in her world so that context switching had to happen...for all developers on the team. When she uttered the phrase "do Unit Testing" our minds immediately went to thoughts about writing code with test doubles, scenarios, and defining test cases for each string parser and logical branch. But we had to go against what we know and live in this other reality for a bit - fighting our neurological pathways which were forged over the years.

Every conversation in which more than one term is used to describe the same thing and especially where one term requires special domain knowledge is going to see a reduction in effectiveness. Words have meaning in context. What do we do about this? Especially when the same word has meaning in different contexts...

I propose the following: Combine context with the subject so that ambiguous terms become less ambiguous. Instead of Employee say Payroll Employee. Instead of User say Blog User. Say Functional Unit Testing, Ad Campaign, War Campaign, Political Campaign, Shipping Container, Bank Account, User Account, User Role, etc. Use contexts as containers for the concepts in which the words have meaning.

One term came up recently...owner. Owner is such an abstract concept that it must be used with some subject to show context or there will be wholesale confusion, particularly when there is more than one owner context involved. Home Owner and Business Owner have very different connotations...think of how many just in terms of taxes. So what if taxes was the context? Might we say the Tax Homeowner or the Tax Business Owner? Or is it enough to imply tax is the context - depends.

Thursday, January 5, 2017


Alright, so estimates suck and we all know it. Plenty of reasons why and models to try to get it right. But hey, were human and we're subject to all kinds of flaws. Well, here's a method that a colleague and I came up with as a joke at first. But now that I think more about it...it seems to be just as good (maybe better than) as any other method.

What you'll need:

1 sheet of paper, whatever size.
1 writing device (pen, pencil, marker, crayon, go crazy
1 coin with 2 different sides (heads/tails)

What to do:

Step 1: Prepare by drawing a circle on the paper and place paper on flat horizontal surface (desk or floor).

Step 2: Pick a number for the task as a starting point. esty := n;

Step 3: Flip the coin at the paper.

Step 4: If coin lands in circle, return esty.

Step 5: If coin is heads, increase esty and loop back to Step 3.

Step 6: Decrease esty and loop back to Step 3.

Why is this method awesome? Because first of all you use your gut to produce the original estimate - always go with your gut.

Second of all, if you really feel strongly about your number, then you'll hit the circle. Else, the randomness of the coin introduces some real-world randomness that you'll encounter in the real-world...for real!

Some models call this a Monte-Carlo factor and use some pseudo-random number generator to add pseudo-reality to the equation...that's weak compared to the good old trusty randomness of a Wheat-Back Penny!

Be sire to break down those tasks a bit. You wouldn't want a single coin toss to determine the fate of the whole project - you want more samples to produce the desired results!

Oh, and generally I'd use a scale for the numbers like: .5, 1, 2, 4, 8, 16, 32,... In hours. If you throw bigger than 32 (that's more than full week if you really get down to it) maybe you better break it down a bit more.

Ok, there you have it! Make those circles and start tossing your way to successful estimation!

Tuesday, November 29, 2016

Making the Most of Memory

Always so much to learn. For example, did you know that the time it takes to access your computers memory is MANY times slower than the time to access the CPU caches? What does this mean for how we structure our programs and architectures? We'd want to take full advantage of the CPU Cache (L1) and the L2 cache if we are concerned at all with application performance. What this means is structuring components and code so that caches can be used.

Even if we go only to the DRAM (and not disk cache), it only holds the data for ms before having to be refreshed...this means you have under 100ms (as of the article in 2008) to process everything in memory before it cycles and has to be refreshed.

All of this has implications on the design of your objects, structs, etc. SRAM in the L2 cache is limited in memory space due to power consumption and costs. Given this, it seems logical that keeping your data bags as small in size as possible would be of great advantage to performance. Additionally, limiting loops and batches could also help to improve performance.

Architecturally speaking, since operations are also cached in the L1, you'd want to keep the same operations (looking at you data access) on the same CPU/L1. So much to learn...

Read about it in depth at https://lwn.net/Articles/250967/

Tuesday, November 15, 2016

Bugs and Defects, What Issues Really Mean

In episode 3 season 1 of Mr Robot http://m.imdb.com/title/tt4730002/ there's a highly used metaphor about bugs. The lead character's (an elite hacker) narration says that most people think finding a bug is about fixing the bug. Basically goes on to say that it's really about finding the flaw in thinking. Sounds a little like root cause analysis.

The first bug ever was called a bug because it was literally caused by a bug in a relay. Certainly not working as designed. Read more about that at http://www.computerhistory.org/tdih/September/9/

Jimmy Bogard provided a clear interpretation of what a bug is (and isn't) and how to deal with it. He and his affiliates break it down to Bugs, Defects and Stories. They "stop the line" to fix Bugs. Bugs are showstoppers, while user feedback are classified as issues. They don't believe in Bug tracking in their shop.

Think of it this way-if the first Bug ever were not removed immediately, how would that system be expected to work?

Seems to me that Bugs are mostly caused by a change in the environment which was not accounted for in the design (or a complete failure to verify implementation against acceptance criteria). Defects, on the other hand, come down to a breakdown in understanding - a Defect is when the software meets the acceptance criteria, but doesn't do what it needs to do.

There's a saying attributed to Jeffrey Palermo - "a process that produced defects is itself defective."

When Defects and Bugs creep into a system, sure we can just go ahead and fix em, move on and push on. But sometimes, and especially if this happens a lot, it's time to fix the underlying defects that are causing the Issues (Defects and Bugs). Process re-engineering, training, the environment (are we on a stable platform?), deeper analysis, etc.

That first bug ever? Perhaps that lead to a cleaner environment when running programs. Maybe it lead to innovations in storage mechanisms. Maybe it lead to nothing. Certainly, if it was a recurring problem, something would've been done.

Tuesday, November 1, 2016

A Spec File Should...

When writing a spec for an interface (public facing functions/methods of a software module/api) and that spec is written in the same language and that spec is a runnable test that executes the module...it is a fantastic way to show others how the module should work and how to use the interface...so the spec should be written in a way as it conveys these things to the reader.

Imagine a spec written for a users API. And that API has a method to getUsers and that spec has the following description:

"it should all active employees"
and the API is exercised like this:


How safe is it to assume that the first input parameter is isActive. Or could it be null? Or some other flag?

To remove ambiguity in the spec, ALWAYS pass named variables...

includeInactive = false;

yeah, yeah most editors and IDEs show the variable name somehow. Not always going to be accurate, or clear, or built yet (write the spec first anyone?).

Tuesday, October 18, 2016

Polymorphism Can Save Time and Make Code More Fun

After a day of working with some rather procedural code and shoehorning it into something a bit more OO, I've concluded that this code base would've benefitted from some SOLID OOP early on.

This code base takes a set of data in a tree-like structure and generates an html tree view based on certain attributes of each data point. The tree views are not basic like a file tree, but are based on that with more features. And it needs to display differently depending on where it's being rendered. User authorization for each data point is an additional consideration.

Reasoning about this, we can see how polymorphism can benefit us here. For instance (pun intended), the basic flow of walking the tree of data would not change, except for the fact that some business logic applies to whether a node should be added or not - could be several reasons a node would not appear in the view (e.g., security, choice, etc). Filters could be used to achieve this depending on requirements. Or a class which implements filtering. Either way we slice it, the tree of data needs to be walked.

While walking the tree, whatever is rendering the tree view can either construct it on the go, or construct/serialize in one go. For the latter choice, it can hold a collection which can be added to, or receive a collection. That collection would closely model the data tree, but with whatever visibility rules applied to it by any business logic classes.

Whatever is walking the data tree should then take in business logic as a dependency. The polymorphism part is that different types of nodes can be added to the tree view depending on the result of that business logic. I'm thinking of an interface that takes in the data and returns a TreeViewNode. TreeViewNodes are FolderNodes or FileNodes (leaf nodes). But have additional properties which control other aspects of the nodes.

Hey, sounds like we can do this better than procedural with functional programming too!

(buildview (map transformToViewDataList ( filter context dataList) )

But I digress...with OO it would have different implementations of the interface for different business cases, but the same recursion. Heck you could even have different implementations of tree recursion if needed -

nodeMapperFactory = GetFactory(context);
rootNode = GetRootNode(data);
treeWalker.WalkTree(rootNode, nodeMapperFactory);

nodeMapper = nodeMapperFactory.Get(currentDataNode);

viewNode = nodeMapper.MapNode(currentDataNode);


each(WalkTree(childNode) in children);