Tuesday, 18 December 2012

Reversing a string in C# - is it really that difficult!

I've been interviewing people for a C# developers position recently which has taken the usual "technical questions, chat, random questions, any questions for us" kind of format which I'm sure most people who've been looking for a job recently will be familiar with.

One of the technical questions we've been asking recently is "Can you describe how you would reverse a string in C#". Note, we're not asking anyone to code a full answer, just describe how they would do it.

This is one of those pretty standard interview questions, much like the FizzBuzz question, which is presented to see if candidates can demonstrate a basic understanding of a programming language and show that they can understand simple requirements. And yet, it seems to throw so many candidates!

The requirements are simple.

Take a string such as "ignoring the voices"
Reverse the characters of the string
Output the result (i.e. "seciov eht gnirongi")

As with most things there are a number of ways to achieve this goal, some more efficient than others, some more geeky than others, but still there are varied number of ways to do it, so I wanted to demonstrate a couple of methods here to show it is possible. To date I've only received a single correct answer, the other answers have been either non-existent (I don't know how) or far from the mark.

Method 1 - The agnostic approach

This method is the agnostic approach as it shows an algorithm rather than a platform specific solution. This is a good solution in an interview as it shows that you understand the problem and can devise algorithms to solve it, although it's worth saying that there may be a solution more relevant to the platform. The solution is simple, work your way in to the middle of the string (character array) from the start and end of the string, swapping the characters as you go. The important thing is to stop when you get half way through the string, otherwise you'll reverse the same characters again and arrive back at the original string. The example is presented as an extension method.

Method 2 - The C# (.NET 4) approach

The .NET framework provides a large number of methods available on all of the standard types, these can help solve this problem in a way which is specific to C# and the .NET platform. The important thing to remember is that strings are just arrays of characters, and so any methods which can be applied to arrays can be applied to strings as character arrays. More recently some methods which were only for arrays have been moved to strings as well, conveniently one of them is the Reverse method. That's not to say that you can just reverse a string directly as the Reverse method returns an IEnumerable object, but the IEnumerable object can return an array and the string constructor can take a character array to initialize a string object with, so we get the following method.

This is a vastly simpler implementation than the first, but is it any better? Well it shows an understanding of the language which is a positive, but not of algorithm implementation, so it's neither better or worse than the first method. The best answer to give would be the first answer followed by the second to show an understanding of algorithms and of the required platform.

Of course from these you can devise variations on the problem, such as reversing the words in a string whilst keeping the characters of each word in the correct direction.

Or even reversing the characters of each word, but keeping the words themselves in the correct order.

So why do some programmers struggle with these kind of questions? Is this indicative of programmers as a whole or is this just a lull? I can only guess at the reason, but non-the-less it feels concerning at the moment.

Wednesday, 24 October 2012

Playing around with Go, prime numbers and ancient Greece

I've been playing around with Google Go (golang if you want to Google it) a lot lately as I've been of sick and needed something to stop me from going nuts.  It's been a while since I played with the language properly and so I started off by reading through a new book which has been written by Caleb Doxsey entitled An Introduction to Programming in Go which you can read for free on-line (or download as a PDF) or purchase from Amazon.  I read it for free on-line but purchased a Kindle copy of it from Amazon afterwards as it's a really good book and really well priced.

So after going through the book I needed something to practice on, not wanting to throw myself into an active project I decided to head over to Project Euler.  A great resource if you want to practice using a language without resorting to made up scenarios or over elaborate "Hello World" applications.  There are a number of problems (399 at the time of writing) which are mathematical/programming based in nature.  If you're not too good at maths or it's been a while since you were in full time education then you may need to Google a bit to get an idea of the problem but they're not beyond the average person.

The Problem

The one I want to look at here specifically is problem 7 which (at the time of writing) is a problem to find the 10,001st prime number.  There are two obvious ways to solve this, the first being a prime number generator.  This is a function which keeps on incrementing a count, checks to see if the next number is prime and if it is the number is returned from the function.  The second method is to use a sieve, this takes a sequence of numbers and "removes" all of the ones which are not prime numbers, it does this by:

1. Create a sequence of numbers, marking each as a "prime candidate"
2. Starting from 2, check to see if the entry is marked as a prime candidate
  a. If the entry is a prime candidate then mark each multiple in the sequence as non-prime
3. Return the collection of prime of numbers still marked as prime candidates as the final set of prime numbers

This is an over-simplified explanation of the process, for a more detailed explanation see the Wikipedia entry on the Sieve of Eratosthenes.

The issue with creating a sieve is that you must specify an upper limit, that is to say "find all prime numbers less than N".  The problem however poses the question as "find the first N prime numbers".  This means that in order to use a sieve we have to know roughly where the Nth prime will be, or use a high enough limit to generate enough primes to get N.  Because of this it would seem that using a generator would be a better option.  One thing to keep in mind is that, although there is no time limit to the problems, generally speaking if the solution takes longer than a minute then it may not be the best solution.

The Solution

Before I start

One thing I should probably mention here is that I'm only checking to see if a number is divisible by prime numbers instead of checking for all possible divisors.  The theory here is Prime Factorization.  So a prime number is a number which is only divisible by 1 and itself, other positive integers (other than 1 which is special) are composite numbers, meaning that they are divisible by integers other than 1 and themselves.  Consider the number 20, this can be decomposed as follows:

20 / 2 = 10
10 / 2 = 5
5 / 5 = 1

So the number 20 can be represented as 2 x 2 x 5 (or 2^2 x 5).

Because numbers which are not primes must be evenly divisible by at least one prime number less than itself then this reduces the number of numbers to check for to see if a number is prime.

Solution 1

Because I wanted to play with Go I took the opportunity to create a prime number generator using goroutines (think light-weight threads).  This was done by creating a number "emitter" at one end and a "listener" at another end, when a number leaves the emitter it passes through a series of checkers, these check to see if the number is evenly divisible by a given prime number.  So the first checker will check to see if the number is evenly divisible by 2, if not then it is passed to the next checker which checks to see if it is evenly divisible by 3.  The "listener" watches the output from the last checker, if a number appears here then it is not divisible by any of the currently known primes and so must be a prime number itself.  In this event, the listener creates a new checker for the new prime number which listens to the output from the last checker, the listener then listens for output from this new checker, it also puts the new prime number into a channel and waits for it to be read by the client.

Emitter > Check For 2 > Check For 3 > Check For 5 > Listener

In implementation I thought this to be a rather elegant solution, however for production use this turns out to be quite a time consuming exercise.  It allowed me however to play with channels and goroutines and so was incredibly useful.  Next up I decided to implement a similar generator again, but without goroutines and channels.

Solution 2

This solution is more simple as it simply builds up a list of prime numbers and runs in a single goroutine (the main one).  The theory is almost identical though; keep on checking numbers to see if they are evenly divisible by the current list of known prime numbers, if not then it is a prime number and add it to the list.  The implementation this time takes a number which is the Nth prime to find, this means that the function will keep on running until it has found that many prime numbers and will return the last one found.

This was straight forward to implement and ran so much faster than the first solution (something along the lines of 38 seconds for solution 1 to 6 seconds for solution 2).  But I still felt that it was too slow.

Solution 3 - The Prime Sieve

So I investigated the sieve solution.  I gave a brief outline of what is required above but there are a couple of optimizations to be made.  The first is that for a given sequence of numbers you only need to look for prime numbers up to the square root of the limit, numbers after this point are either multiples of the prime numbers discovered to that point, or are prime numbers themselves.  Say we wanted to find all of the prime numbers up to 16 then we would limit ourselves to only looking up to 4 (the square root of 16):

1 N/A
2 Prime Number
3 Prime Number
4 Divisible by 2
Search limit
5 Prime Number
6 Divisible by 3
7 Prime Number
8 Divisible by 2
9 Divisible by 3
10 Divisible by 2
11 Prime Number
12 Divisible by 2
13 Prime Number
14 Divisible by 2

The second optimization is fairly obvious from the table above, every even number with the exception of 2 is non-prime.  This means that we can initially mark every even candidate as non-prime (except 2), then starting from 3 check every odd number (e.g. 3, 5, 7, 9...), this halves the number of candidates which need checking.

In order to get around the "known limit" problem I implemented the solution to sieve out all primes up to 1,000,000 (a suitably large number I thought).  There are ways in which you can determine the upper limit, but the math is a bit beyond me right now, this is Prime Number Theorem, feel free to take a look.  If this were implemented then we would be checking fewer potential candidates meaning less work again.

(Edit, thanks to compoasso from the comments I implemented a function to get the upper limit so that it is no longer checking from primes up to 1 million)

I didn't hold out much hope for this solution as I am sieving for primes in a large data set to find 1 specific number, so I figured I that I wasn't going to do much better than the ~6 seconds from solution 2.  So the result surprised me a little.  After putting together the solution and running it with the "time" command on my HP Mini 210 (Intel Atom N450, dual core 1.6Ghz) running Ubuntu Linux I got the correct result in 98 milliseconds!

So, what now?

Well, if you want to check out the full source files you can find them on GitHub along with the above snippets.

I absolutely love playing with Go, it is very much like creating C++ application but with the convenience of something like Python or Ruby.  Project Euler is a great playground for experimenting with new languages as well, it also allows you to think about how to create a solution which is appropriate to the language (instead of making a generic solution work across all languages).

Now I just need to remember to keep on posting :-)

Tuesday, 18 September 2012

News of my demise has been greatly exagerated

I am still here, honest!

I can't believe it's been almost a year since my last post, things have been manic around here over the last year.  I started with the best of intentions after my daughter was born, but with 2 children and a new position at work consuming most of my time, it's left me little time to blog and learn new things.

I am still intending on continuing the C++11 series; it's not so new any more but features are still being added to compilers and people are still finding it so I think it's worth going on with.  I'm also considering running a few posts on Google Go which I've been playing with for a while but I want to start doing more with it.

Now I just need to stay focused and get on with it :-)