Tuesday, March 31, 2015

ADO.NET Bulk Insert

The SqlBulkCopy class in the System.Data.SqlClient namespace in .NET worked to significantly increase efficiency of inserting thousands of rows in a table. The inserts of about 100k rows with 24 columns of mixed data types went from minutes to seconds after switching from a row-by-row update with a stored procedure to the bulk insert. I set it to insert 1000 rows at a time with no callbacks and left the default timeout of 30 seconds.

The source data came from a query that the same code owns so I put all of the column names in an array of strings and did a Linq .Select to add them to the ColumnMappings. It didn't work at first, I found that I had to call .ToArray() after the Select() in order to iterate the array before calling WriteToServer to perform the insert.

I can see how this could be wrapped up and made dynamic by walking the ColumnName properties of a DataTable to add ColumnMappings if they are 1:1. Another potential pattern would be to wrap the table stuff in a class that knows about mappings and abstract it away a bit because it is a bit low level when working with business logic.

Thursday, March 26, 2015

Strategic Improvement

After soaking for a bit in the bath of refactoring, I took a walk down the road of a major overhaul to some components of one application. The "component" in question is responsible for doing several things - determining whether user entered data from a control should be saved(inserted, updated, or deleted), mapping the values from the control to the data object, formatting the data before saving, coordinating the data updates, and updating contextual data. That seemed like a lot for one class to do, also it is being done in a rather procedural way. Somehow, parts of it are covered with "unit tests".

The tests themselves are not written in a way that was easy to follow, but I eventually got through them and understand what they are covering. There is a specific scenario that we found a defect for which was not covered.

The first approach I took was to get the code itself into a state where it made more sense. I did some reorganizing and a little refactoring. After a bit of this, I understood enough to make the thing a bit more OO. I started down that road and came into some of the more complex behaviors of the system.

Near the end of my allotted time, two things happened - I spent more time than I originally anticipated, and some other work I did earlier to clean up the rotting code turned out to be buggy (in dev still of course).

After a conversation with a colleague about strategic improvement (planned) vs just doing it because it needs to be done(unplanned), I abandoned the effort of making the poorly written and maintained code more OO. I went back to the unit tests and found that a bunch were failing. I fixed those up and continued with a little more reorganization. I added the missing test, made that pass, and now the defect is repaired.

All in all, I'm glad I went down the path I did since it offered a valuable experience and lesson. I learned a great deal about the existing code. This is valuable when we consider strategic improvement. Because I backed out, I didn't have to learn the hardest way - buggy code in qa or worse prod. It may or may not have caused more harm than good, but one thing for sure is that it was taking more time.

If you have the time planned (as we did for this particular case) you can use it to learn a great deal, but don't abuse it. Leave options open. One way to do this is to branch if possible, if not, create new classes and files then hook them into existing code in a minimally invasive way so you can undo the changes easily without affecting existing code. This approach allowed me to bail on plan A and go to plan B when it was clear that plan A was doomed. I hadn't even integrated the changes from plan A into the existing code base yet so it was easy to abandon them.

Friday, March 20, 2015

Pasta, Webs, Balls of String and Rats Nests

All of these things have something in common. I put those in the title because that's what I've been looking at lately. It kind of reminds me of an episode of hoarders, there's code in there that might be useful someday, that we just don't want to pitch out, and frankly because we just don't know how it all works and holds itself together. In fact, if we remove the wrong thing in an attempt to clean up, the whole pile can cave in on us.

Over the years we've cleaned up here and there and made more messes in other places. All in all it will take some concerted effort to clean up the mess. But like hoarders sometimes do, we are delaying the inevitable and even passing it on to our next of kin to deal with - who will likely have their own messes to deal with.

Outcome - I don't have time to blog much this week because I'm cleaning up the mess that was left behind. Of course I don't have time to clean up the mess in any real way since it wasn't part of the plan, but now I'm forced to because of a preexisting condition (defect) that is killing the current enhancement. Therefore, technical debt rears its ugly head and now we pay some back.

Tuesday, March 17, 2015

When Your Test is Defective

It helps to verify and design the test with the domain expert. This will save much time downstream.

I had a defect in a system that I partly wrote last year. The defect was related to how we were fetching data from another system. It boiled down to a misinterpretation of the requirements. When it came into my work queue, I decided to take a pseudo- BDD approach.

The business analyst joined me at my desk and we began writing the acceptance criteria together in Gherkin syntax. Next, I created a Unit test via MsTest with the help of Moq framework. Then I implemented the fix.

 I had to break out the logic to run the test properly first, the original is more like a big run-on chapter with several too many methods. It could be a facade, except that it's all implemented in one fell class.

After making my logic testable, I created some fixtures right there in the test class. I needed two fixtures, one for each entity involved in the source data. In reality, each comes from a separate source, but are related. Similar to how client and assigned employees would be related.

With my fixtures in place, I set up fakes for the data access classes which now return my fixtures instead. The reason for using fixtures is so that the tests can be deterministic in analyzing the results. Fixture data does not change by some external force like data from a database can.

Next, I executed my method that gets the filtered data. This method grabs the data from both sources, joins it together to return the data I need, then passes it through a filter. That filtering is where the business rule rubber meets the road. And I got it wrong on the first pass...

You see what happened was...

Three things. First, we didn't have a complete setup in our fixture because we didn't learn that lesson yet - we are new at this BDD thing. Second, I setup the fixtures wrong. And thirdly, I implemented the initial query predicate wrong.

The first and second were resolved when the BA and I worked together to go through the test more thoroughly. The third was resolved after I did some testing against real test data via the data access API. Turns out I misinterpreted the meaning of one of the filter values.

Ultimately, the lessons learned here are several.

Many forms of testing are ultimately needed to produce real working code.

Acceptance criteria defined fixtures should be passed to the tests in an automated way to avoid that kind of copying errors.

The last point would also have forced me to write a domain specific object that was an aggregate of the two entities with the properties I really needed for the filtering. I took some shortcuts there.

Domain experts and code experts working together...priceless.

Thursday, March 12, 2015

Patterns for list-based pub/sub with wcf and msmq

The topic of using queues came up today. The debate we had was between contract based topics and content based topics. In WCF, a pub/sub service can use msmq to receive and relay messages to subscribers. We are only considering the list-based subscriber model and perhaps transient subscribers as a potential case. Broadcasts and other patterns were out of scope of the conversation.

The debate was about whether to have an explicit event operation on the service contract to describe the topic or to have a data based description of the topic. In other words, would an event be published as follows:

IEmailSentEventPublisher publisherProxy =
  new factory.GetProxy();

or in a data controlled way as follows:

IEventPublisher publisherProxy =
  new factory.GetProxy();
publisherProxy.PublishEvent(event, data);

where event contains meta data about the event such as the topic ("email","success", etc) and data is the contextual data such as the id of the email in the database.

or perhaps a variation of the former:

IEventPublisher publisherProxy =
  new factory.GetProxy(event);

where the factory sets up the proxy to send meta data in the header.

Wednesday, March 11, 2015

Quick Assignment Shortcut in JS

This is an old trick, but a good one. I showed a seasoned developer this one yesterday so I figured it's worth sharing here.

    var ns = ns || {};

another variant is

    var a = a || b;

a working sample.

JS Bin

Friday, March 6, 2015

More about Separation of Concerns (SOC)

It sliced, it dices, it looks up data, it bakes cookies, it does business logic...and more! It's the incredible security class!

I've always gotten a headache everytime I tried to understand the security logic inside this all-in-one class. That's bad since the original devs are long gone. Finally, I began to refactor the beast!

My ultimate goal is to move away from using cookies, or at least having a choice. But the work I've been doing offers so so much more.

Here's what I've done so far...

First, everything that consumed the cookie directly in the code now consumes a UserSecurityReader abstraction that has getters to read each value from the cache (that's as much as the consumer knows). This allows the consuming code to flow freely without having to know the details of how user settings are persisted. Also, it gives us the ability to swap out the caching model at any time in the future without having to update tons of consumer code.

Next up, the cookie setting logic. The app sets a cookie in the user agent on app startup. The global class consumes the security class which does everything in one shot. From a consumption standpoint, this is great! The implementation however is another matter.

I decided to pull out everything except for the cookie setting logic out of this class. I made a builder and what is effectively a lightweight model of a user. The builder processes most of the business logic for now. It gathers data, processes some business rules, and constructs the model. I wanted to keep any data access out of the model so that it can be passed around more freely. Also, it breaks some dependencies that could be issues for consumption.

Let's say we wanted to centralize the builder in a service. That service would need references to some other services. Whatever consumes the model, whether it caches the data or not, should not be forced to carry the same dependencies.

I did come across one business rule that crosses the boundary. My next step is to remedy that. I'm planning on either separating all of the security business logic from the builder, or finding another alternative. The first option seems best right now. However, there may be some redundant data access calls that I would like to avoid. Shouldn't be too difficult to put it all in the right order though.

Wednesday, March 4, 2015

What Does an Interface Do?

I've been knee deep in html5, web media, graphjs, and fftjs trying to fit all of that together to produce a radial graph of audio spectrum. Also been toying around with cryptographic algorithms in graphic form. Essentially, I'm creating cs-art. That is, art based on computer science.

In the meantime, I've taken on a mentee. Our project that we are undertaking is a library for analyzing server data such as running services, uptime, disk capacity, etc. While there is some degree of reinventing the wheel, we are taking a behavior driven development (BDD) approach using SpecFlow (Cucumber for .NET) as our automated user acceptance testing tool.

Currently, we have one scenario defined. In implementing the test, we created an IServer interface. Upon reflection, I think this interface is more like a facade than what an interface should be. Let me clarify.

The current interface definition contains one method - GetServices which returns an IEnumerable of WindowsServices (our own DTO). Perhaps this contract should be more specific to what we need to do rather than what the object is. In this way we will be more align with the interface segregation (ISP).

Perhaps we have an interface called IWindowsServiceReader.

This definition makes clear the intent of the consumer, interfaces are all about the consumer. The implementation can be defined in one or many classes, and one class can implement several interfaces.

So, if we have several interfaces that read server info, we may have a single class called WindowsServer2012 that implements IWindowsServiveReader, IWindowsTaskReader, IDiskInfoReader, etc. ISP at its finest!