Friday, August 29, 2014

Efficient SQL searching

If you don't want to use Fulltext indexing, or you need to order your search results by some criteria (which columns take precedence) here is a way that this can be accomplished with decent efficiency.

Suppose we have a People table in a Sql Server database and we want to find people by last name, then first name, then email address.

we can write the query as:

WITH lastNameBegins AS
(
    SELECT id,
    1 criteriaNum,
    ROW_NUMBER() OVER (ORDER BY lastName) AS rowNum
    FROM People
    WHERE lastName like @searchTerm + '%'
),
firstNameBegins AS
(
    SELECT id,
    2 criteriaNum,
    ROW_NUMBER() OVER (ORDER BY firstName) AS rowNum
    FROM People
    WHERE firstName like @searchTerm + '%'
),
emailBegins AS
(
    SELECT id,
    3 criteriaNum,
    ROW_NUMBER() OVER (ORDER BY email) AS rowNum
    FROM People
    WHERE email like @searchTerm + '%'
),
SELECT id, name
FROM People as p
JOIN (
    SELECT id, MIN(criteriaNum) criteriaNum, MIN(rowNum) rowNum
    FROM (
        SELECT * FROM lastNameBegins
        UNION
        SELECT * FROM firstNameBegins
        UNION
        SELECT * FROM emailBegins
    ) AS a
    GROUP BY id
) AS o
ON p.id = o.id
ORDER BY criteriaNum, rowNum;

And now you have a search that is ordered by criteria. The CTEs defined the ranking of the criteria, we union those together, there could be duplicates so we group those out and take the min vals for our ordering. I suggest limiting the size of the inner set 'a' for efficiency's sake.

Monday, August 25, 2014

Break from the Switch-Case

If you find yourself reacting to states via switch-case statements, consider using a GoF pattern called the State Pattern
http://www.codeproject.com/Articles/489136/UnderstandingplusandplusImplementingplusStateplusP
described in C#.

Basically, it uses state objects that know how to transition to other states. The main object has a reference to the current state. Possibly a state manager could offer more extensibility. A state manager would manage the transition between states instead of the state object itself. Another option could be to create a manager that wraps the main object, and the state object(s). Then a momento could contain the previous states. Whatever the situation dictates, break from switch-case.

Friday, August 22, 2014

Articles on Technical Debt

Here is a great article about the true cost of Technical Debt by Aaron Erickson. The author gives a good example of the cost over time and relates it to Financial Debt in a way that should make sense to management and appropriately supports the metaphor.

I read another article of his that talks about killing the project in favor of small releases. That was great too and can be found on the same site.

When I read the example of commissions, I immediately thought of a commission service that could be called from anywhere on the intranet to calculate commissions for the company. I thought of reports and of SSRS reports and realized how far behind it would be to write reports with a technology that does not support services as data sources.

Wouldn't it be great to write some business rule in a first class code language and have all consumers call that centralized logic to do the processing? It would be contained in a service of course, not in a DLL or some other form of shared code; if it were, each place that uses that code would have to update in order to get the new rule. So your change would still involve some Technical Debt a la Architectural Debt (your architecture introduces more future work required to change some core business logic that can change). IDesign refers to this likelihood of change as volatility.

This volatility factor is a natural occurrence in software, whether recognized or not.  Martin Fowler points out that this type of ignorance is wreckless, it is likened to going on a shopping spree and maxing out credit cards that you don't know you have to pay back. Then you will get the calls from the bill collectors. "The business needs to change xyz factor to yyz factor and now!" So now you don't sleep for a week and everyone is run ragged and angry because of this inexperience that cost just the right price.

In the case of Aaron's article, huge deals can run afoul and what was once an exiting merger can turn into a costly technical nightmare. I would like to hear of an example where this was the case. Office Depot and Office Max merged recently and stated the projected costs and benefits of the synergies of the merger. I wonder how that is going from a technical standpoint...

If I find anything, I'll surely have to blog on it.

Wednesday, August 20, 2014

Data Modelling and DB design


In a Sql Server Pro article from 2005
written by Michelle A. Poolet I encountered some great advice and a great example on how to normalize an existing table that is in first normal form (all the data in one table).

Not to pick on the author here, nor criticize, but I will go further with the normalization process for this database and take it from second normal form (where the author ended up) and go closer to third normal form by creating lookup tables for wood type, finish, size, and a whole other root for shipping. As the author states, we don't know enough about the business process to do so in the real world. I will do this to further illustrate the potential of doing so (after collecting proper usage details of course).

The example is for a custom furniture shop, which means that they use wood and other supplies to build furniture. They probably have some standard wood types that they use and possibly keep stock of some common types. If the database were expanded to include wood type, a lookup table would be really useful to link supply to demand.

I know, I know...build for current use cases, not future use cases. I hear you loud and clear. But let's just assume that they have some standard wood types for sake of this exercise. And let's assume that we've done our due dilligence and we are ready to take it to the next level.

If wood type alone becomes a lookup table, the root table will lose a varchar(30) [2B + nB, n=len(x)] worth of data and gain a smallint [2B] worth. Assuming that the type name is on avg 15 chars, that means a savings of 17B/record or 72 - 17 = 55. Now we can fit 25% more rows per page, about 137 rows per page. If we offloaded ALL of the varchar cols to other tables, we would have even better numbers here.

When we pull off those values into other tables, we will have to perform joins to them, but as long as the record count in those is small (I'm thinking most of it is standard anyways so they might be lookup tables that can be added to for new vals) they should be loaded into the cache and well reused. If anything, they may not be used every time users access the data. Again I don't have the use cases, but I'm making reasonable suggestions about them.

I look at any kind of notes or comments fields as something that should always be it's own table. Refer to the ProcessOrderNotes column in the ProcessOrderStatus table in Figure2 of the article. Each comment can be related in a number of ways to other tables either directly via an FK to the note, or to multiple entities via linking tables. In this way, the comments or notes become easier to query across if someone wants to search for something specific within a note. Eventually, this will be a use case so why not structure the data model to be more adaptable to begin with?

Think open-closed principle here where the model is open to extension, closed to modification. Of you have a new table that requires notes, you can just add an FK or a new linking table depending on what is needed. Now the app code can go on about calling it's addNote function, get the PK and write it to the table as an FK as needed. No reason to reinvent that. Well, suppose that all notes now need a timestamp and username. Guess what? Just add a table related to the Notes table, update the code to save to that each time the note is updated. Or store a new note record each time for auditing purposes and show them all - use cases?

Now I'm getting carried away, so I best stop here.

Friday, August 15, 2014

RMDB Surrogate v Natural Key

An integral part of transactional relational databases is the Primary Key (PK). A PK is a value that defines the data row and that related tables use to relate via a Foreign Key (FK).

Four standard criterion are commonly used to measure a PK.

1. Is it unique?
2. Is it immutable?
3. Is it minimal?
4. Is it never null?

A key can consist of one or more columns in a table as long as the combination meets the criteria.

My recommendation is that you stick to a Surrogate Key if you don't know. Here's why-

Natural keys made under an assumption that some business rule will not change can cause significant problems later when it does.

Most arguments for natural keys place emphasis on ease of use. If you find yourself joining multiple columns because your PK is a combo key then you lose that value. Create views to ease your suffering from joining a few more tables in your queries.


The exception I would make is for SOME lookups. Esp when a well defined code is used as the natural key. Bit even then, shit changes all the time. Suppose you assumed that the US state codes were immutable. Or telephone area codes. You would have to write scripts to update all those FKs that use 'Calif' or use some hacked in solution to fix your data and update all the apps that consume the data to support the hack. Contain volatility in your data!

Process v Procedure

Today I learned the difference between a business process and a business procedure. I looked through a few sources on the web and found that a process is made up of many procedures and tasks, and can cross departments. A procedure is a set of instructions to complete one unit of work. I would equate these to separation of concerns in software.

Think of an enterprise application environment where there are several standard concerns such as - application access security, sending emails, logging errors, reporting on data, saving transactional data, sending data to an external destination, etc, etc. These are re-usable procedures. A process collects them into something meaningful.

Since most common concerns can be thought of as procedures. It would be helpful to think of procedures as abstractions when describing a process.

Consider a task of explaining the process to a non-technical person, or someone outside your organization. The details about the procedures are specific to your domain, also they could change without the overall process changing. Is it a process or procedure? Could it be changed without affecting the path from point A to point B? Can parts of the process be standardized or abstracted into procedures?

For a more concrete example, a procedure could be "communicate with someone". There may be some criterion that will change the communications type but the end result is communication. Perhaps the communication is an email, could be a phone call, a text, in person, etc. The point is that a communication needs to take place, and the details are not a concern of a process.

Perhaps some Protocol will define what steps to follow for the communications procedure. Perhaps the communications needs to be saved for auditing purposes. That would limit the possibilities of forms of communication. In any case, those are procedural details. The process is about linking procedures.

Wednesday, August 13, 2014

Process Documentation Style

Finding that I post more on planning and management than on technical topics. This stems from current efforts more than anything.
The main thing I want to cover is that a process is repeatable. If it's not understood and documented, then there ain't no process. Or, as I just found out, it's the components of the business process that do not exist.

http://m.businessdictionary.com/definition/process.html

However, I would state that a process that is different for each individual or instance of input, cannot be defined by a single meaningful process. In order to understand a current process in such state, much effort is required to understand the process from each point of view.

Unless a consistent process has been defined, the process is to collect whatever procedures make sense at the time in order to achieve the outcome.

In some cases, perhaps in a results-oriented environment, ad hoc may be the best approach. This style of process management relies on the individual more than the structure of the process.

There are certain risks involved in taking such an approach including un-repeatable results and inconsistent data for analysis. Another drawback is the increased risk in taking on technical debt, depending on the individual.

The benefits are that the end result can be achieved without following a possibly complex and lengthy chain of procedures - results are dependent upon the individuals involved.

A well-defined process can be implemented in a process-oriented environment. This style is suitable for consistent and repeatable behavior, as long as people are following process.

The major risks are that a process may be ill-defined, overbearing, or not well understood. It may serve to stifle creativity as well as lead people to go off process if it is not understood, is too complex to follow, or if time to results is inadequate to the situation.

The benefits can be great once data is tracked and proper analysis techniques can be applied. Resources can be directed according to analysis, and long-term planning can be better achieved.

Possibly, one of the most important aspects of defining a process would be to keep the long-term goals in mind. If the process becomes heavy for the sake of the process, it may do more damage than good.

Sometimes, tools can limit the performance of a process. Appropriate tooling should be applied to the appropriate process; the process should not be limited or hindered in effectiveness by the tools.

Tuesday, August 12, 2014

how not to beg in the streets

Don't write your life story on a piece of cardboard, no one has time. Just ask for money.

Maybe I'm getting lazy with this one, but I saw a couple writing something on some cardboard that looked like a life story. I caught the first part of it - "Traveling..." that's about as far as anyone would get. So to any would-be travelers who run out of money - write really big "Traveling, need money! Please help!" that's it. Three lines, really big.

I also learned don't derail a train that's barreling down the tracks! Personal story - working hard on cleaning house and making great headway. Out of left field comes a request from management that would completely sidetrack the effort! I still need to push back on that one, at least it should wait until we clear out the playing field. Right thought, wrong timing!
Sudden change is a huge source of stress. "Stress can initiate the "fight or flight" response, a complex reaction of neurologic and endocrinologic systems.
 http://www.medicinenet.com/script/main/mobileart.asp?articlekey=20104

Resulting in high employee turnover. Those in leadership positions should be aware of the stress that their activities may cause. One way is to gauge current workload and progress before introducing new work. And keep the timelines and scope realistic.

Monday, August 11, 2014

Plan to Organize

Before you begin, set forth a plan. This is the most important part.

Having an adequate plan is especially important if you need to communicate and coordinate with one or more people. And hey, of you have more than one thing to do, why not put those in some central place that all parties can see so you don't drop the ball on on one or more of them? Try tracking things like progress, blocking, and work in this way too.

Devise a system of communication and a workflow that works for you and your organization so that it's not all in your head. This is the start to doing great things. Manage your work effectively, and you will work more efficiently.

Thursday, August 7, 2014

Thoughts of branching

If your code base has projects that are shared across products, consider creating a branch for each release.

This will allow bug fixes/hot fixes to be applied without having to play catch-up when the shared code has breaking changes. Bug fixes are typically unplanned activities, therefore spending extra time on them means that more time is spent on unplanned activities. These unplanned activities will surely include communication, additional testing, and resource leaks.

The time to bring a code base up to speed is during a planned release cycle. Be sure to include the updates in your release plan (before you start coding and during the planning phase). If you do not, this hidden input will creep in and cause some time pressures during the development phase, also the test phase will have some creep in it which will undoubtedly increase time unexpectedly as well.

To hot-fix a bug, branch from the current release branch (not the main/trunk), fix it, test it, release it. To scope a release, be sure to include in the plan any updates that have to be done to keep the code current. Branch from main/trunk for the release, do your work, merge main/trunk to your branch, test, deploy. Deprecate the old release branch if you are not using it.

Wednesday, August 6, 2014

listen up...to your users

I overheard an interesting conversation today regarding the use of encrypted storage. There was a wise person who had some very good ideas on the topic. The advice was to leave it up to the responsible individual to decide what should be encrypted and what should not. This was the conclusion along with some good backing points.

One of those were that often folks in IT assume too much of the users without getting the picture of reality. We live in a bubble of misconception and assumption when we don't reach out to our users for input on decisions we are making for them. It's one thing to give users the tools to do what they need to do, it's quite another to restrict users without accurate input. And even worse, to "Big Brother" users under the assumption that they need to be culled into submission and watched over.

One such "Big Brother" practice of storing attempted passwords in a database borders on unethical. How many times have you forgotten your password but you know it's either something like this, or maybe that other one? Is that the same password that you use for your bank account? Hope not!  Shame on you if it is, but shame on those who decided to store that information! My advice is to keep that information all in one place, on paper, in a safe but accessible place. This way, you know if someone got to it because it will be missing! If you try the shotgun approach by guessing every password you have, you never know if they're being captured by wolves.

The corrective action to take when implementing some new process is to either take a passive approach and measure popularity, or actively solicit feedback. Many times users will be unwilling to give feedback (unless negative) so the passive option may be preferable since you would get skewed data from solicitation (unless it's very active).

But Phil, you say how do we know if we should implement xyz until we know if the users want it? And if we can't solicit for xyz because the data will be skewed, how do we know? Perhaps there is a way to offer incentive for giving feedback. Bringing users into the fold regularly will be key to delivering successful solutions. Is it more costly to put something out there that never gets used, or to pay for active participation during the decision making process? Perhaps monetary incentives are not necessary; perhaps users will participate because they want to make their own lives better, or see their ideas in action, or just plain joy of contributing to a decision making process.

Here's my take on some certain approaches that I've seen in terms of user voice:

The online user voice boards where users add and vote for features is ok, but I think most of those get lost in the shuffle for larger apps, and only relatively active users participate. The ones with the most votes get bumped to the top and many duplicates are created then ignored.

Soliciting users via popups in the middle of their work, or when they are opening some page or application is intrusive and annoying. Often users are in the middle of a train of though (on a mission so to speak). DON'T INTERRUPT YOUR USERS!

A panel of beta users might be a very rational answer to soliciting feedback, but that's very exclusive and misses many users, if not the most common users.

A suggestion I might make is to allow users to submit suggestions for review (categorized). Once those suggestions are in and aggregated into coherent features, have a suggestion panel (like an ad rotator) in the application (the active information section that should exist in your app anyways and if not add it). Make it simple -  how much value do you place on xyz? 1-5 skip. If you have a logged in user, this can be collected per user and tracked along with usage data. In this way you can gather useful metrics on the most useful features for the most active users. For a portable device or small screen with local apps, each app should have it's own feedback link that's easily accessible and displays a few suggestions for voting. Wii had an awesome game called Everybody Votes, that was fun! User's got to see where they stood against the trend-line nationally and internationally. If they were in the highest group they got props - rewards!

The point here is to include users in the conversations when making decisions that affect them. Get the largest data sample possible and give good exposure to anything worth doing to gauge value (e.g. if it's really worth doing).

Measure what can be taken away as well! How much CRUFT is out there clogging up implementation when it's not even used? Lose it!

Tuesday, August 5, 2014

Make no assumptions about data you don't control

Today I learned from a legacy-app where a developer made an assumption about a data element that the application did not own. I would say never make assumptions about any data, but especially for data coming from another source - huge no-no.