Wednesday, 29 March 2006

Interview Programming Problems

Another Saturday done, another interviewing round finished. Thought I would put down into words what I look for when reviewing the programming problems done by candidates. I don't really care if candidates know what I look for - if they can do it in an interview, they can do it in their daily work. Especially so when code reviewers are likely to be at least as watchful as I am in an interview.

As an illustration, I'm using a problem we previously used in our written tests. We replaced it recently because everyone answered it in almost the same way, making it useless as a differentiator between candidates. The problem is as follows:

Implement a function "intersection". The function takes two ASCII strings and returns an ASCII string containing only those characters, that are simultaneously present in both arguments. The result should be as short as possible. For example:
intersection("abde", "bexy") may return "be" or "eb"
intersection("exoweb", "candidate") should return "e"

Almost all solutions (that work) are variants of the same form. Below was the minimum acceptable code to pass first round screening in Exoweb, in a prettied up python format:

def intersection(a, b):
intersections = ""
for char1 in a:
for char2 in b:
if char1 == char2:
intersections = intersections + char1
return intersections

(trivia - usage of the "in" keyword is up to 3 times faster than a str.find() call on my laptop)

(trivia #2 - something like 50% of candidates who make it to the written test are unable to write even the above snippet. After this test, 90% of all candidates who have submitted their resume have been eliminated)

There are two problems with the code above, one pretty obvious (not reading the requirements) and the other a lot less so (performance problem). The first is repeated characters in the return statement and the second is an algorithm that does not run in linear time.

The first problem is easy. Given "aaabbb" and "bbbccc", the algorithm above returns "bbbbbbbbb". The problem specification says "The result should be as short as possible." Failure to read the spec or forgetting to check for this is bad, but not fatal as long as one spots this quickly.

The second problem is one that less than 1% of the candidates manage to avoid - algorithmic complexity. If strings a and b were of length n, the double for loops in the algorithm result in a O(n^2) algorithm. For every character added to the length of string n, the computer can end up doing up to n+1 times more computations. This quickly becomes impossible.

On my laptop, with a data set tweaked for the worst case scenario, I get the following execution times:

(n=1,000) 0.00372 seconds
(n=10,000) 0.29497 seconds
(n=100,000) 30.20992 seconds

For every time I increase n by an order of magnitude (*10), the execution time increases by roughly two orders of magnitude (*100). Following this progression, a value of n=1,000,000 would take around 3,000 seconds or 50 minutes!

This problem is relatively easily solved and there are multiple solutions. For those languages without rich libraries, one solution is to build a 128 char length array (the problem specifies ASCII, which is only 128 values) and to run through each string once, putting a value into the array to specify that the character was found. Once complete, it's a matter of scanning all 128 elements to see what was found in both strings. All these operations are in linear time. This has also the advantage of ensuring that the returned result has no duplicates.

For languages with richer libraries or built ins, you can also use hashed containers or even set data types. We disallow using Sets in python because it would simply be too trivial. In Python 2.4, Set data types are built in and the code would look like this:

def intersection(a, b):
return ''.join(set(a).intersection(set(b)))

Sets in Java aren't quite so feature rich, lacking the intersection() method, so we still allow it in Java. A non-set method in Python, using just the standard built-in data types might look like this:

def intersection(a, b):
char_seen = {}
intersections = {}
for char in a:
char_seen[char] = True

for char in b:
if char_seen.has_key(char)
intersections[char] = True

return ''.join(intersections.keys())

With a n=1,000,000 string size, this takes 1.7 seconds on my laptop, much faster than the expected 50 minutes required by the inefficient, O(n^2) algorithm. With the n = 100,000 string size, the algorithm takes 0.18 seconds, the expected roughly linear decrease in time.

The algorithm above can certainly be optimized further, for different focus areas. Using two dictionaries does waste a bit of memory, and there are probably faster ways of doing this. There are probably readability tweaks too.

In our interviews, it does not matter if the code has flaws on the first try (it must work though), as long as the interviewee can understand the problem when pointed out and fix them. No one is perfect and mistakes are to be expected. We just try to minimize them and fix them as soon as possible.

Saturday, 18 March 2006

Teaching Software Engineering

Heh. Having spent a not insignificant proportion of the last 1.5 years doing HR work, I feel a great deal of sympathy when reading of the plight of others when doing HR. Some amusement too, as I recognize the problems and issues faced.

Today's fun article comes courtesy of planetpython.org, regarding teaching the Waterfall method in schools. I wince in sympathy because almost all of the people I interviewed, if they knew anything about development, knew only this method. Yet it is a method we (as in Exoweb) know doesn't work very well for us.

It's a nice, sunny Saturday afternoon so I'm too lazy to ruminate on why schools put too much emphasis on the Waterfall method and SEI methodologies, but I have been recently rambling to colleagues about a few complaints I had with my own college experience in software engineering:

  • Overly simplified
  • Short term projects
  • No Challenge
Overly Simplified

This is related to the Waterfall issue. I realize that colleges first try to teach us the basics, then try to teach us the more complex stuff. But sometimes, the basics are so overly simplified that we learn the wrong things. e.g. the Waterfall method. To me, the failure of the Waterfall method is the assumption that it is possible to get perfect requirements and that they will never change. Working life has taught me that no plan survives first contact with reality. That lesson was most painfully learned.

What is sad is that too many people I meet still stubbornly stick to what they were taught in college. I still see too many people/organizations spending months trying to gather all the requirements while competitors gain a head start by producing an imperfect but workable product. I see man-years of developer time spent haggling over little requirement details, only to find the client or market has changed requirements in the time it took for them to sort out the exact details.

Yes, requirements are important and it is the cheapest stage in the software development process to make changes. Cowboy hacking just as frequently leads to disasters. However, there is a point of diminishing returns and most people following the Waterfall process go way past this point. Agile Development offers the best middle ground that I have found to date.

So, to wrap up this section of the rant, if schools would quit simplifying stuff too much, the tragedy of the 1 year requirements gathering phase would not occur.

Short Term Projects

Almost all college projects are for the duration of a single class - a single semester of a few months in length. This means that a student typically spends an entire semester building a system that works, then forgets about it afterwards.

The problem with this approach is that, like construction, it is much easier to build a small shack than it is to build a skyscraper. If you are just slapping a few pieces of wood together to cover some random stuff in your backyard, you really aren't concerned about how good the foundation is or if the darned thing collapses a few months later. It's not that hard to rebuild it. On the other hand, screw up the foundation of a skyscraper and very horrible things happen. Like software, those screw ups become apparently very late, when the cost of changing things (or failure) is very high. Yet the one semester projects mostly teach us the habits required to build small shacks.

Challenge

There is a quote from Peopleware that I enjoy about good builders:

"The minimum that will satisfy them is more or less the best quality they have achieved in the past."

This seems to be true for myself (not that I consider myself a great builder) and for many great developers that I respect. I cannot be sure that this applies to everyone, but it seems true enough for most.

The problem is that most schools don't really hold their students up to high standards or even show them that it exists. If the "best" that they've done is code that doesn't even compile (I know quite a few professors don't even bother to check this), then they will always be satisfied producing crap because they don't know any better.

I see this in some fresh grads that I interview - they are, in theory, some of the smartest kids graduating that year from their college. They have the highest grades, they've achieved more than their peers ... they think they are the king of the world. The only problem is that compared to the truly best in the world, they are crap. They don't automatically strive to improve their code, they use suboptimal algorithms, miss various corner cases, etc.

I have had classmates that have graduated after 2 years of courses taught in C++, yet still not know what a pointer is. I have interviewed candidates who graduated with a bachelor's degree in computer science, but have never written a line of code in their life. These schools do a great disservice to our profession and society in general (i.e. think of the cost of all that crappy code out there).

I know this has been suggested before by others, but perhaps one thing that would make things better would be a minimum competency exam, administered by a certification board. Professions such as law, medicine and accounting all have professional organizations that set minimum standards and administer an exam that all practicing members of that profession must pass in order to practice being a lawyer, doctor or certified public accountant. Perhaps we are approaching a time when software developers too must meet a minimum competency before being allowed to work on things like nuclear power plant controls or medical equipment. I know I would sleep better at night knowing my pointer-incompetent classmate was not writing the code for medical equipment that would one day be used on me.

Wednesday, 1 March 2006

HR at Exoweb

Greg and I got curious this morning about what our interviewees were writing about their experience on the web and decided to do a bit of searching. This ended up in me getting curious about a batch of HR related matters. Final result is a bunch of weird trivia:

Interview stats:

  • Distinct resumes received in February: 1308
  • Called for pre-screening interviews: 186
  • Passed pre-screening: 19
  • Job offers given: 3

Ouch. We have a huge attrition rate (0.2% get offers). Will write in more detail about the transition from stages 1-2 and 2-3 in later blogs and what typically kills a candidate.

Other fun tidbits we found from scanning bbs posts:

"Those guys must be poor! They're sharing offices with another company! Don't work there!"

Heh. When we moved to this office in 2004, Exoweb was all of 12 people, but we found this large space to renovate into a great loft. We ended up inviting 2 other companies owned by good friends (and fellow FOSS users) to join us. Since then, all of us have at least doubled in size, filling up the entire loft space and overflowing. Although it doesn't look like it, we actually take up the entire top floor of our building, except for one stubborn company that refuses to move out and give us total control of the floor.

"They have an all you can drink policy! Bunch of drunkards!"

We have an all you can drink soft drinks benefit. But I guess it doesn't help that some of the pictures of our office posted on the web have included pictures of "herb liquor tasting party" or "empty bottles after christmas party".