Wednesday, February 1, 2017

Fix Your Coding Style Cause it Causes Errors!


This enthralling academic paper http://web.stanford.edu/~engler/p401-xie.pdf titled "Using Redundancies to Find Errors", came out of Stanford in 2002. It shows that analysis of redundancies in code "seemed to flag confused or poor programmers who were prone to other error types."


Some examples of the kinds of redundancies mentioned in the paper include: unused variables, reassignment of variables for no purpose, mathematical redundancies such as x/x, x|x, etc., logical branch and case statements which either always eval to true or false or are never reached, dead code - statements that are never reached due to an early break or continue in a loop.


Most of the examples given in the paper are the result just plain sloppy and/or lazy programming style. You see it too many times - copy-paste code, terse conditionals, disconnected variable names, poor structure and flow. My advice to anyone seeking to get better at programming is to reread your code before you commit. Write it three times - once to make it work, a second time so you can read it and a third time so everyone else can read it. When your done with it, you should be able to show one of your QA engineers the code and they'll know what it does! Then you know what you should do? Rewrite it again - this time to make it more concise, you could be missing out on something important like an opportunity to reduce the size of the code base, crate some reusability, or remove some diabolical nested if statements (one of the worst sins of programmerkind).


Look, if you were writing a book, would you be allowed to write the first draft and sell it as soon as you've written the last word? Hell NO! No one would want to read it anyways. It would contain several grammatical errors as well as redundancies, illogic, contradictions, and shit that just plain don't make any sense! Why would you abuse any language in such a manner? Languages are made to communicate. Programming languages are no different.

Tuesday, January 31, 2017

Database Development The Iterative Way

As I recently became more involved in developing a data layer (including database) I've been learning a new style of database migration. Previously, I've written plain SQL for database migrations for each deployment. Those were handed off to a DBA to run in each environment.


These days, I've entered the realm of Liquibase. Liquibase is my first experience with this sort of tool that allows for source controlled, database versioning. It's written in declarative style and translates the declarations into the underlying SQL. It's cross-platform and supports multiple declarative languages (JSON, XML).


Here's how it's changed my development process: In the old days, I used to write the SQL and run each file or put it all in one file and run as a batch. Then write some queries to validate, interrogate the schema etc. I would do mostly up-front design of the database components. Most of the queries and commands were generated by EF or in code. Other factors have changed that as well.


Nowadays, I'm writing a lot more stored procedures with none of the sql in code or generated by code. I'm writing tests first. Then I'm writing stored Procs, then creating/modifying tables declaratively via Liquibase. Sometimes I don't care about the table of the column until it's really needed. Sometimes the name of a column morphs during an iteration (a short cycle of failing test-code-passing test on the order of minutes). Nowadays, no problem! It's relatively simple to change a column name.


The big trick/advantage is that everyone else can pull in those source controlled changes and have their local databases updated by running a batch file or using a git hook. It only applies changes that are new on the target database, so if they haven't updated in awhile it'll apply all those changes until they are up to date.


It's all good! I never really had a chance to dig into EF code-first migrations to this degree. If its the same or similar in efficiency, I would recommend it for sure! Or any tool that enabled rapid changes to the backing store which has acceptable trade-offs (here's looking at you noSQL).

Wednesday, January 18, 2017

Names

I've been hearing this phrase a lot lately "words have meanings". I've also been reading more of Domain Driven Design by Eric Evans, in which he stresses the importance of Ubiquitous Language. When we communicate about our work, when we collaborate as social human beings (even the most antisocial rely on social networks for survival - see the NatGeo show Mick Dodge) we use language. It's what makes us human, its what makes us powerful. Our very fabric and existence is a product of our ability to communicate, coordinate, and work together.


So when we communicate, we are more effective when we do so effectively. We do so partially through word selection. Words do have meaning. And, they have meaning in context. In the past, I've been exposed to the usage of Unit Testing used where the word Functional Testing was more appropriate in my context. To me (and I assumed the rest of the world), Unit Testing is when a developer writes automated tests at the lowest level of code she owns. To the other person, it meant manually testing each functional unit as exposed to the user - which I know as Functional Testing.


Here was a case where we were working at cross purposes whenever we were talking about Unit Testing. She pulled rank and had the term redefined in her world so that context switching had to happen...for all developers on the team. When she uttered the phrase "do Unit Testing" our minds immediately went to thoughts about writing code with test doubles, scenarios, and defining test cases for each string parser and logical branch. But we had to go against what we know and live in this other reality for a bit - fighting our neurological pathways which were forged over the years.


Every conversation in which more than one term is used to describe the same thing and especially where one term requires special domain knowledge is going to see a reduction in effectiveness. Words have meaning in context. What do we do about this? Especially when the same word has meaning in different contexts...


I propose the following: Combine context with the subject so that ambiguous terms become less ambiguous. Instead of Employee say Payroll Employee. Instead of User say Blog User. Say Functional Unit Testing, Ad Campaign, War Campaign, Political Campaign, Shipping Container, Bank Account, User Account, User Role, etc. Use contexts as containers for the concepts in which the words have meaning.


One term came up recently...owner. Owner is such an abstract concept that it must be used with some subject to show context or there will be wholesale confusion, particularly when there is more than one owner context involved. Home Owner and Business Owner have very different connotations...think of how many just in terms of taxes. So what if taxes was the context? Might we say the Tax Homeowner or the Tax Business Owner? Or is it enough to imply tax is the context - depends.

Thursday, January 5, 2017

Best...Estimation...Method...Ever!!!

Alright, so estimates suck and we all know it. Plenty of reasons why and models to try to get it right. But hey, were human and we're subject to all kinds of flaws. Well, here's a method that a colleague and I came up with as a joke at first. But now that I think more about it...it seems to be just as good (maybe better than) as any other method.


What you'll need:


1 sheet of paper, whatever size.
1 writing device (pen, pencil, marker, crayon, go crazy
1 coin with 2 different sides (heads/tails)


What to do:


Step 1: Prepare by drawing a circle on the paper and place paper on flat horizontal surface (desk or floor).


Step 2: Pick a number for the task as a starting point. esty := n;


Step 3: Flip the coin at the paper.


Step 4: If coin lands in circle, return esty.


Step 5: If coin is heads, increase esty and loop back to Step 3.


Step 6: Decrease esty and loop back to Step 3.


Why is this method awesome? Because first of all you use your gut to produce the original estimate - always go with your gut.


Second of all, if you really feel strongly about your number, then you'll hit the circle. Else, the randomness of the coin introduces some real-world randomness that you'll encounter in the real-world...for real!


Some models call this a Monte-Carlo factor and use some pseudo-random number generator to add pseudo-reality to the equation...that's weak compared to the good old trusty randomness of a Wheat-Back Penny!


Be sire to break down those tasks a bit. You wouldn't want a single coin toss to determine the fate of the whole project - you want more samples to produce the desired results!


Oh, and generally I'd use a scale for the numbers like: .5, 1, 2, 4, 8, 16, 32,... In hours. If you throw bigger than 32 (that's more than full week if you really get down to it) maybe you better break it down a bit more.


Ok, there you have it! Make those circles and start tossing your way to successful estimation!

Tuesday, November 29, 2016

Making the Most of Memory


Always so much to learn. For example, did you know that the time it takes to access your computers memory is MANY times slower than the time to access the CPU caches? What does this mean for how we structure our programs and architectures? We'd want to take full advantage of the CPU Cache (L1) and the L2 cache if we are concerned at all with application performance. What this means is structuring components and code so that caches can be used.


Even if we go only to the DRAM (and not disk cache), it only holds the data for ms before having to be refreshed...this means you have under 100ms (as of the article in 2008) to process everything in memory before it cycles and has to be refreshed.


All of this has implications on the design of your objects, structs, etc. SRAM in the L2 cache is limited in memory space due to power consumption and costs. Given this, it seems logical that keeping your data bags as small in size as possible would be of great advantage to performance. Additionally, limiting loops and batches could also help to improve performance.


Architecturally speaking, since operations are also cached in the L1, you'd want to keep the same operations (looking at you data access) on the same CPU/L1. So much to learn...


Read about it in depth at https://lwn.net/Articles/250967/

Tuesday, November 15, 2016

Bugs and Defects, What Issues Really Mean

In episode 3 season 1 of Mr Robot http://m.imdb.com/title/tt4730002/ there's a highly used metaphor about bugs. The lead character's (an elite hacker) narration says that most people think finding a bug is about fixing the bug. Basically goes on to say that it's really about finding the flaw in thinking. Sounds a little like root cause analysis.


The first bug ever was called a bug because it was literally caused by a bug in a relay. Certainly not working as designed. Read more about that at http://www.computerhistory.org/tdih/September/9/


Jimmy Bogard provided a clear interpretation of what a bug is (and isn't) and how to deal with it. He and his affiliates break it down to Bugs, Defects and Stories. They "stop the line" to fix Bugs. Bugs are showstoppers, while user feedback are classified as issues. They don't believe in Bug tracking in their shop.


Think of it this way-if the first Bug ever were not removed immediately, how would that system be expected to work?


Seems to me that Bugs are mostly caused by a change in the environment which was not accounted for in the design (or a complete failure to verify implementation against acceptance criteria). Defects, on the other hand, come down to a breakdown in understanding - a Defect is when the software meets the acceptance criteria, but doesn't do what it needs to do.


There's a saying attributed to Jeffrey Palermo - "a process that produced defects is itself defective."


When Defects and Bugs creep into a system, sure we can just go ahead and fix em, move on and push on. But sometimes, and especially if this happens a lot, it's time to fix the underlying defects that are causing the Issues (Defects and Bugs). Process re-engineering, training, the environment (are we on a stable platform?), deeper analysis, etc.






That first bug ever? Perhaps that lead to a cleaner environment when running programs. Maybe it lead to innovations in storage mechanisms. Maybe it lead to nothing. Certainly, if it was a recurring problem, something would've been done.

Tuesday, November 1, 2016

A Spec File Should...

When writing a spec for an interface (public facing functions/methods of a software module/api) and that spec is written in the same language and that spec is a runnable test that executes the module...it is a fantastic way to show others how the module should work and how to use the interface...so the spec should be written in a way as it conveys these things to the reader.


Imagine a spec written for a users API. And that API has a method to getUsers and that spec has the following description:


"it should all active employees"
and the API is exercised like this:


getUsers(true);


How safe is it to assume that the first input parameter is isActive. Or could it be null? Or some other flag?


To remove ambiguity in the spec, ALWAYS pass named variables...


includeInactive = false;
getUsers(includeInactive);


yeah, yeah most editors and IDEs show the variable name somehow. Not always going to be accurate, or clear, or built yet (write the spec first anyone?).