Monday, October 9, 2017

Cause of Death: Optimization

So you've got this team - developers, business reps, product managers, QA engineers and they all need to work toward building out products that add value to the business (either directly, by reducing risk, or increasing efficiency). The business has some goals. You have some goals. Bonuses and career advancement and frankly job security are tied to those goals.

At the same time its difficult to find the right people. There are plenty of applicants but who can be trusted and who has the right level of expertise to jump right in and get things done because the backlog is growing and the work is piling up and everyone wants to know if you can deliver and of course when.

The New York Times wrote up a great article on how to find the right candidate. The thing is, once you've found the right candidate (well-qualified, energetic, enthusiastic, etc..) what happens next depends. Lets walk through the process and see why.

Ramping up


In order to make good on your deliverables, you need more headcount so you can get more done. You're at capacity and just can't ask your people to put in anymore. Once that's approved you now have the task of staffing the right person. You need to throw someone in straight away and start knocking things out because your way behind already and in order to justify the added cost, you want to show results.

So now you ask around internally - this is the standard pattern - if anyone knows someone who is ninjas at xyz. I'll tell you right now everyone who is ninjas is currently working. You'll need to do some sales - or typically hire a salesperson (recruiter).

So now you're in it with the recruiters for about 20% give or take. For a 100k salary, that's 20k...say I bet that 20k would make a great training budget! The person you really think you need actually costs more than $100k. First off in a supply and demand economy where supply is limited and demand is high you're on the downside of that. So $130 plus all the other costs and recruiter fees.

After the Onboarding

Once you've shelled out the upfront costs and now this maverick is on your team they'll need to be brought up to speed. For a while this will slow down overall production. One or more of your resources will need to be involved regularly in working with the newbie to get them up on the tools, practices, procedures.

It'll take some time before the dust settles. Lets estimate 4-6 months depending on the learning curve for a well-designed, documented, unit tested, and maintained system...1-2 years for most systems...longer for train wrecks. You are likely to be in the 1-2 year range and please don't lie to yourself because that's counterproductive and were focusing on being more productive. If you know I'm wrong in your case, then you probably don't need me to tell you all of this anyways, you can close the tab and carry on as you were.

Once the Dust Settles


If you're still with me and haven't gone back to balancing your budget or sipping your morning coffee while touring your friends' feeds, I'm glad you decided to stay because it's about to get real.

Taken on the surface and without considering long-term stability, hiring the right person in that moment is a greedy optimization. Greedy optimizations seek to optimize the whole by optimizing the parts of the whole with the hopes and assumptions...well...that it will improve overall performance. Often active stock traders attempt to do the same. Rarely does it work out; one reason is that active traders are likely to sell when things go south. In stock trading, emotions sabotage long-term results. In the same way, needs-of-the-moment hiring practices sabotages long-term health of the team when not considered along with overall team dynamics, culture, and performance implications.

A rockstar-unicorn-ninja-whatever is not going to solve your productivity issues if those issues arise from broader optimization issues - morale, communications and team cohesion especially. What WILL happen if there are deeper rooted issues is that the ninja will come on for a bit, do some things and ninja-smoke off when they figure out how deeply rooted the issues are which causes a degradation of the three most important factors to said ninja - autonomy, mastery and purpose. Not only do we strive for those three, but we are inherently social beings - nature calls for having social well-being. When our social well-being is not upheld we will fall behind our potential.

And you're still on the downside of the market, so there's that problem. I will guarantee you that salespeople are at it regularly pitching the great folks on your team. Any give one of them has inboxes full of emails and messages from recruiters. The other side of most of those openings are mirror images of yours in that their either adding or replacing staff because they've got a pile of work to get done yesterday. Some aren't but most are.

Solutions

You've got to offer something they don't. Compensation, bonuses, and benefits are a given in any decent market. Anyone worthwhile can get those anywhere. Your business may operate in non-IT markets and perhaps are on the upside of the supply chain (unlike in IT). But in your technology departments you're on the demand side - you need good people and the supply is limited.

In order to combat such a position and come out ahead you have to optimize for the long-haul. There may be some short-term goals that occupy much space in the present, but even if those are met you may end up in a position that doesn't set you up well for the next steps, the next goals...it's all about being strategic! Look for more global optimizations rather than local ones - unless the local optimizations truly do lead to a global optimum.

What you've got to do if focus on 2 things - optimizing your workflow and retaining talent. Both of those involve investing in the talent you've got on-hand. If you are willing to take a $20k hit for hiring new personnel or a $120k+ hit annually you certainly have the ability to work on improving the personnel you already have. Additionally, you can certainly afford to improve your system or process. Take a serious look at any bottlenecks - technical or otherwise - and address knowledge gaps, personnel conflicts and process issues BEFORE adding more calamity to the mix by hiring.

Get out there and spread some buzz in the developer community about your organization. When you're ready to hire - when your house is in order and you are prepared to properly on-board - you'll have developers coming to you rather than having to hard-sell. Basically you'd be marketing which makes it easier to sell. Having a great team, process, and organization to brag about is a great way to attract the top talent that you will now be able to retain.

Non-Solutions


Consider hiring all rockstar devs. They may all be fantastic in their own right. And at the same time if they can't work well together or communicate or the process hinders productivity anyways or they end up building the wrong things - it doesn't matter how good each one is.

And how about the case of pushing to get a release done? By pushing I mean everyone working late and just churning away...dropping principles and mounting up debt to get the thing done. That's a way to burn folks out and in this heyday of IT, that's a recipe for exodus. As far as turnover goes, going back to the active traders, fees are costly and chew away at the gains. You've got to have retention for the long-haul.

But that's not the only danger with churn and burn. What's left behind may end up slowing things down significantly in the future. Move too fast, slap things together, drop important practices and time for reflection, team building, etc and it'll only lead to further entrenchment of negative reinforcement - a negative feedback cycle. This has been well-known for a long time now.

Adding people will always slow things down for awhile. Adding them to a late project is classic mistake that no-one should be making anymore.

Outcomes of Successful Solutions

Once you've applied a successful solution, you should be able to compare the outcome to the baseline. That's right, make sure to capture your current state so that you can KNOW that you've solved solved the problem.

You should be seeing better employee retention, improvement of skills, lower stress, higher morale, engagement, proactive behavior, lower bug count if you'd like to address that metric, and more valuable production if your process is really in order. The latter may be beyond your own sphere of control, but perhaps it is not beyond your sphere of influence, but since this post is about the productivity of your own group it may well be beyond scope for now. However, that's arguable since your productivity should actually be measured by the actual value delivered. I'm sure to do a post in the future about that.

Friday, October 6, 2017

Cultural Evolution Part Three

First off, I must apologize for skipping over the topic I promised in the last post in this series. I'm working very hard on that promised post about the algorithm that looks good on paper. Something has come up during the course of a book I'm writing that I feel an urgency to blog about. It fits nicely with the theme of this series, so here it is.


Think about how we are biased from a very young age to play against other teams. To compete and even to think of the others as lesser beings or the enemy. We play soccer, softball, football and chess with the intent to beat the other side and celebrate victory. In our passive involvement in professional spectator sports, we get fired up about our team and even hurl insults at fans of the opposing team in and out of the stadiums and parks. Some of these are our colleagues we work with, others just fellow humans - probably good people too. But that's how we're biased from such a young age. It crosses political boundaries as well - cities, states, nations. As well as cultural boundaries. Always there is this notion of "the others".


In business, and at your company, you've got many teams working in different areas. This "team bias" that's built-in has to be overcome if your company is to act as a whole. While there will always be some level of internal competition, for better or worse, you can tilt those scales to better by setting and socializing the overarching goals of the company. In this way all if your players will be running in the right direction.


How does this help? Setting, socializing, and tracking business goals at the organizational level puts everyone together into the same team. Who or what do you compete for in that scenario? Rather than competing between departments, units, or teams the competition is aligned toward the achieving those goals. As always the organization's goals need to be in alignment with its mission, vision and values!




Thursday, September 28, 2017

Cultural Evolution Part Two

Yesterday I posted about how I evolved the process by improving the work tracking and planning tools. Today I will tell you about what we did that brought people together into a more cohesive team with high morale.


One day, I got an awesome board game as a gift (thanks to my amazingly thoughtful wife). It was a fantastic little strategy game that anyone could play. Gameplay involved dynamic shifts in both political alliances and strategy. This game was called Quorridor. There is beauty in its simplicity. It's easy to pick up which makes it the perfect catalyst for bringing people together for some clean, competitive fun. We played over lunch maybe a couple times per week. Our manager joined us and we all had a good time together!


What inspired this? Something stuck with me from when I was in the live sound business. One day, while working a gig at the Cadillac Theater I saw the house crew playing chess backstage to pass the time between setup and performance. There's a lot of hurry-up-and-wait in that business and usually you'd try to get a little shut eye since the nights were late and the days were long. I recall many days of just sort of sitting there trying to get some rest or exploring; but playing chess was how I really wanted to pass the time. That was the only crew I ever saw passing time that way and it stayed with me.


When I started my development career, I was itching to play some strategy games with colleagues. We'd go out to lunch together and they even had a Foosball table in the lunchroom. But none of us really spent a lot of time in the lunchroom because it was on another floor. Game playing started small, maybe a couple of us. Then we had some regular players and swapped others into the fun.


It was great for morale and a fantastic way to take a break from the work and go back in with a fresh start. Another benefit was the way it brought people together. The benefits of brining teams together are enormous...companies will send their teams on weekend retreats for such things. They'll have weekend events with BBQs and after work parties. Those are all well and fine and maybe often a little forced - those mandatory extra-curricular activities that often cause disruption in busy personal schedules. On the other hand, game playing over lunch is easy, optional, and not disruptive.


Something like this has to be driven by the team! And it needs to be supported by management. There's a fine line there. Sometimes a manager can just be a person and this is the context for it. At the same time, managers can have a BIG influence in the success of such things by jumping in and especially by not shutting something like this down. Heck, take some time for these things during working hours! Maybe on that Sprint Wrap up day after you've shown off your accomplishments for the Sprint. What else would you do for the rest of the day? Try to jam in another feature? No way! Sharpen the saw! Do games, coding challenges, put together a jigsaw puzzle...whatever the team decides!


I know there's some counterintuitive thoughts around this (I've encountered them)...were not being paid to play games were being played to build features. That's true. Fortunately, teams that play together stay together and there's no better way to get more done than with great team cohesion and high morale. It's an investment in a more productive team. So keep on organizing the work at a nice steady pace and helping each other out (by working together) and you'll have the time anyways - its a positive feedback loop!


Next post I'll talk a bit about local optimization vs overall optimization as it applies to teamwork and workflow. Here's a hint - there's a class of algorithm involved.

Wednesday, September 27, 2017

Cultural Evolution

I've been thinking a lot lately about organizational culture and how it might evolve. Actually, I've always thought about this and since my entrée into the software world I've applied myself to evolving both culture and process. I've worked through the evolution of tools and practices that enable collaboration. Here is one way I've applied tool improvement to help along the first team I worked with.


This team used a SharePoint list to track work - each work request had an entry and was printed out and handed to the developers with screenshots and a bit of write up about what was to be done. The process worked ok because when you went to the list you would be able to filter it and find your work. There was room for improvement so I jumped right into it like this -


First, I created a view for developers (where assigned to = Me). I shopped this around to other developers and my manager. She saw the potential there and asked for a view for managers. We briefly discussed and I went off and created it.


At the same time the managers changed their practices a bit and started using the views to plan on a weekly basis with quick checkins each morning. Afterwards, it was much easier for everyone to know what they should be working on. Micromanagement wasn't an issue. Managers managed the work flow and distribution, not each individual's work. Having the right tools in place helped with that because they could see right away where things were - no need to hover or constantly interrupt to ask or waste time in meetings that should be used to discuss impediments.


pro tip: Managers need to know how the work is going - proactive updates will keep them informed. Imagine if your mom was in a long surgery and you were left for hours and hours wondering how things were coming along. Or maybe you have something in for repair for weeks with no word about how it's going. It can be troubling not knowing and they've got to answer for those things to their managers or to clients.


Hopefully sharing this experience helps to illuminate the value in having the right tools to track and communicate about how the work is going. This form of passive communication enables anyone to check in on the work without bugging or micromanaging (both counterproductive activities).



Wednesday, September 20, 2017

Amazon Cognito?

I'm looking into interfacing with Amazon Cognito for user account management. I'm leaning toward abstracting it rather than using it directly in the web code via their JS SDK. Rather maybe use a server-side SDK. I could just use what ASP.NET has built in, seems like it might be a bit easier...the documentation for Cognito isn't straightforward - as it is with all their docs.

Wednesday, August 23, 2017

AWS Elasticache, Memcached, and Enyim

I've been knee deep in caching lately since I've been looking into an issue revolving around AWS Elasticache and Enyim. I've learned a lot so I'll share some things. Let's start at the beginning -

Enyim is a client for memcached which is a cache server - basically an in-memory key value store with a timer on each value so it can be expired after a certain amount of time to preserve memory.

memcached itself doesn't support clustering but Enyim can be configured to allow clusters. So you can have 2 or more memcached instances on one or more servers and use Enyim to talk to them all as if they were one cache...sort of.

Since abstractions leak, Enyim as an abstraction of multiple cache nodes, leaks in its own way. When a node goes down, you get nothing back in the client. Well that's not true, you get a Success == false and StatusCode == null that's what you get. And that's the default values for the Result object. And when you do get that, it means a cache node went down in your cluster - but not to worry!

Interesting thing about those cache clusters and how Enyim manages to use them. Enyim uses a hashing algorithm (actually several you can select from,) which shards based on the key and the number of nodes in the cluster. It picks a different one based on the key. Additionally, provided you have the servers are added in the same order to the cache client, it will be the same no matter where you are calling from. You could be in any number of producers and consumers of the cache and it will pick the right node.


Let's say you've got 4 nodes and you have a key 'my data key'.

Let's say you use Enyim to Store("my data key", someData, ...) and hypothetically that key hashes to a match for node 3. 

Now when your consumer of that data - could be on a different server in a different part of world for all it cares - calls Get("my data key"), that key will also hash to match server 3 and you'll get 'someData' in the response!

Internally, Enyim calls those nodes servers. It marks each server as IsActive when it makes a successful socket connection to the cache instance. When you use the Enyim CacheClient to make a call to the cache and the socket cannot connect, it marks the server/node as inactive (IsActive = false) and adds it to the dead server queue.

Don't worry though, it tries to reconnect to the servers in the dead queue. When the connection is re-established it automatically becomes available in the cache cluster again. There is a caveat when it fails though. When your client calls and the node is dead, the node isn't marked dead BEFORE the call, but after. It's up to you to test for those conditions (null StatusCode) and retry the call so that it WILL use the next node in the hashring.

In the same scenario above, let's say node 3 is unreachable. Well you could see how network connectivity could be an issue if they aren't co-located. In that case it could be down for only some clients but not others. Let's ignore that issue for this scenario and say the server is being rebooted for patching or something.

Here's what will happen...your consumer will make the first call and the key will resolve to node 3. Enyim will attempt to establish a connection to that node. When the socket connection times out it will mark the server as "dead". It will return Success == false and not set a StatusCode in the response object. The VERY next call will resolve to the next server for that cache key while the node is down.

With Elasticache in the mix, there's another layer of abstraction in play. Elasticache offers its own grouping and node discovery so that you don't have to specify each node address manually. This means you can add nodes without re-configuring the client. When using this option, you should definitely use Amazon's cluster configuration for Enyim, it handles the node discovery for you meaning you only configure 1 endpoint per cluster. This is a good thing, but what they haven't put in the mix is handling node failures on their end. Would be nice if they did, but they don't. So just think about that...

Still, when you are using cache clusters in Elasticache, the best way to handle a node reboot is to simply make the call and the amazing Enyim will do its alchemy and switch to a live node until the dead one is back up and running again. Works the same with adding and removing nodes too. It's all automagic!

Summary -

When using Enyim, just put some retry logic in your code - preferably in your own wrapper class so you aren't repeating yourself everywhere and forget to do it somewhere.

Wednesday, August 16, 2017

Where to Begin the Agile Project

So, your a programmer and you get a new project to work on. And let's assume this project involves some data source(s), probably some persistence, and user interaction. How do we approach the construction of this system?


In the first place, we'll want to understand what the system needs to do and what it will generally "look like". When I say look like, I mean how it will generally be architected - how does the data flow? What layers? What components? Where do they interface? This involves some basic design and there are several patterns to draw from that would get the job done.


Either way you design the general structure of the system, you'll need to understand the data flow. You'll be presenting some data to a user and that data will come from somewhere. It seems obvious that the first step is to begin where the data flow starts...begin at the beginning. However, I say NO!


Through years of trials and errors of beginning at the beginning, I've found that you generally have to change things near the data source at least a few times throughout the life of an "Agile" project.


I've spent much time modeling databases and building data layers for applications only find that once the front-end was built on top, the data model had some fundamental flaws. The main trouble is that those flaws only came out when the business stakeholders were able to interact with the system - when they were working with something concrete. Only then did the details truly solidify. But by then, there were layers built up below the presentation layer - interfaces, implementations, tests - that all needed to change to support this new understanding. This leads to much rework!


If we can have a better understanding of the system before there is a full foundation baked into the system, then we can reduce the amount of wasteful rework on a project.


To solve for this problem, we have some options. We can build wireframes and prototypes in a Design Sprint. This sort of design-up-front is useful in as much as our understanding of the system is complete at the beginning of the project. If there is a good general understanding by the requester, it will be relatively more useful to do mockups - it may trigger insight and understanding earlier in the "Agile" project. It's a way of saying "if I were to build this, would it solve your problem?" I'm for it and it does a lot of good, but it's not where we should end the game!


Another technique I like to use, and I'm urging you to do the same, is to "start at the end"! By starting at the end - meaning the desired outcome of the feature we're building this cycle - we can set some concrete earlier in the sprint (preferably on Day 1 - Sprint Planning). If you're building an analysis system, start with the graphs and charts. Mock up the data that feeds them. Then build backwards from there. It's best to find out how the system will provide value. Is it an invoicing system? Start with the invoice itself! Work back to entering the billing info.


When we do this - and it doesn't have to look perfectly pretty yet - we give the business stakeholder something to interact with BEFORE we build up all the supporting layers underneath it and put on the fresh coat of paint. It's like building a custom bike starting with the seat and handlebars first...then you put on the wheels and a temporary engine to test the rake of the front forks, footpegs, shifter, grips, brake levers, etc...the user interface. Then they know it's built to what they need - after all it is a custom!


But software is a bit different than building a hog. Users can't see or feel the engine directly. They interact through whatever UI you put on there - there can be mockups and design drawings and diagrams, but it's the concretion that will have the most meaning. Besides, mockups are throwaway - waste. You can build a mockup with the actual code, it's easy these days to get a basic UI up and running quickly. Better to build the real thing then iterate from there one little manageable decision at a time. Better to have to refactor early in one layer than have to chase a change up the whole stack!