Monday 21 November 2011

The var keyword, or what I meant to say

Whilst I'm waiting for something to compile (and for baby number 2 to turn up) I thought I'd post about my thoughts on the "var" keyword in C#.

In previous posts you might have seen that I was somewhat enthusiastic about the "auto" keyword in C++, so it might be natural to assume that I'd feel the same about the "var" keyword in C#.  This is rather unfortunately not the case, allow me to expand a little as to why my feelings about the same functionality in two different languages are not strictly exactly the same.

I'll do it tomorrow, honest!
I'm sort of afraid a little that I'm going to start upsetting people here, so I shall start by saying this.  I am generalizing here and not stereotyping, not everyone is the same and you may indeed be an exception to the rule, but this is generally what I find to be true.

C++ programmers tend to be less lazy than their C# counterparts.  What do I mean by this?  Well, normally when I'm looking at code written by a C++ programmer (in any language) it tends to be easier to read and maintain.  It's because C++ can be painful enough without having to add extra complexity or obfuscation.  Code written by C# programmers tends to be a little lazier, things need tidying up here, stuff is left lying around over there and it generally has that "I'll do it tomorrow" kind of feel to it.

Now don't get me wrong, there is a lot of very nicely written C# code out there, but I tend to find that the people who have written it come from a C/C++ background or have a lot of experience with those or similar languages.

So why should this matter?  Well, when I think of C++ programmers using the "auto" keyword I tend to think of code coming out looking like this:

  • map<int, vector<string>> MyFunction() { ... }
  • void SomeOtherFunction()
  • {
  •     auto result = MyFunction();
  • }

Which is easy to follow, I know when I look in the "SomeOtherFunction" code that I just need to find the "MyFunction" method to see what the type will be (or use the functionality of the IDE), and importantly I know what the code is trying to do without looking this information up.  When I think of C# developers using the "var" keyword then I tend to think of code coming out looking like this (and I have seen this):

  • void MyFunction()
  • {
  •     var a = 1;
  •     var b = 2;
  •     var c = "Something";
  •     ...
  •     var x = a + b;
  • }

Which, okay, is readable and I can make out what is happening but I no longer have clue about the intent of the code; is "a" meant to be a short, an int or a long, maybe it should have been a double?  We could have put some modifiers in there, but that's still not as easy to read.  I just know that if a C++ programmer had written it that we'd have some types in there and the intent would become obvious.  And this isn't just me worrying about something that probably wont happen, I've seen numerous people write code in this way.

What I meant was...
The thing is that the intent of the code is about as important as the code itself.  If I say that a variable is a 64bit integer then it means that I'm expecting some pretty big values in there, similarly if I proclaimed it to be a 16bit integer then I'm expecting very small values.  This kind of information can be invaluable to a maintainer, who might not be some unknown person looking at the code 5 years after you've written it, it might be you after you've spent 2 weeks on a different project and can't quite remember why you wrote something a specific way.

So is "var" a good thing?  Well I would say it is, but like most things it should be used responsibly and never at the cost of losing the intent of the what you are trying to write.  If you're not sure about it, then talk to someone about it, or write the code the way you want and give it to someone who hasn't seen it and ask them if they know what it's trying to do.  If they pull a face then change it, if they know what the intent of the code is without asking too many (what you would consider) obvious questions then it's good to go.

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Monday 14 November 2011

Living in a "dynamic" C# world

Taking a brief break from the C++11 posts (I'm working on the next one, I promise), I thought I'd quickly cover a small problem I came up against in C# and how something I'd previously dismissed really helped me out.  If you want to try out any of the code below I'd strongly recommend looking at LinqPad which is a great tool for trying out sample code, expressions and for querying your databases using Linq.

I can't remember the number of times I've looked at the new "dynamic" keyword and thought of it as ugly, and I will admit that maybe I've not been it's greatest advocate.  Recently however I went on a training course during which we spent some time calling IronPython scripts from C#, so I could see a use for dynamic, but not so much outside of this use-case.

Today however I encountered a problem and the dynamic keyword came to my rescue.  The problem was this; I'm loading in data from an XML document (and no, XML is not my problem), this document has a number of sections which identify how to check something from another document, so it might have an entry which says "You're expecting a value in a field called 'x' of type 'y' and I want to check it like this...".  So as an example, say I'm picking up a value which is a double precision value, and I want to check it against another value of the same type but using a tolerance.  So if 's' is my source value, 'x' is my expected value and 't' is my delta then I would want to check it using the following:

// |s - x| < t
var s = 1.0005;
var x = 1.0004;
var t = 1.0001;

return Math.Abs(s - x) < t;

Great, but here's the problem, when I'm writing the code the function first needs to check the type and convert it from a string value to the correct type, which I only know about because the type is held in another variable.  Again, not too tricky as I can just write the following (where "type" is a Type variable holding the type I need to use):

var convertedValue = Convert.ChangeType(s, type);

The compiler has no problem with this and lets me carry on my merry way, but when I add the following line the compiler starts to shout and tells me I'm an idiot for even attempting to apply an operand of "-" to a type of "object" and "object"!

var sourceValue = "1.0005";
var expectedValue = "1.0004";
var tolerance = 1.0001;
var type = typeof(double);

var convertedSource = Convert.ChangeType(sourceValue, type);
var convertedExpected = Convert.ChangeType(expectedValue, type);
var result = Math.Abs(convertedSource - convertedExpected) < tolerance;

Console.WriteLine(result);

The thing is, I know that my converted values are doubles but I need to tell the compiler that I know what I'm doing here and it can compile this.  Well this is where "dynamic" comes to save the day, it allows me to bypass compile-time type checking and instead have this checked at run-time.  So changing the code to the following:

var sourceValue = "1.0005";
var expectedValue = "1.0004";
var tolerance = 1.0001;
var type = typeof(double);

dynamic convertedSource = Convert.ChangeType(sourceValue, type);
dynamic convertedExpected = Convert.ChangeType(expectedValue, type);
var result = Math.Abs(convertedSource - convertedExpected) < tolerance;

Console.WriteLine(result);

I get the expected result of "True" when I run the code.

I know there are probably other ways of doing this, and the example code I've presented doesn't exactly portray the complexity I was attempting to deal with, but I do think it's quite a nice little solution.  Hopefully after reading this you might also re-consider looking at the "dynamic" keyword, you never know when you might have a genuine use for it.

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Thursday 10 November 2011

C++11: Initializer lists and range-for statements

In my previous post I wrote about the auto keyword, using it as a return type and the decltype operator.  Hopefully you've had a chance to use these and hopefully you've been finding them incredibly useful.  I said that in my next post I would look at initializer lists and range-for statements, so let's get stuck in.

Initializer Lists
Perhaps one of the most annoying things I tend to have to do is create an array or vector and initialize it with some known values, if it's held in configuration then it's not too bad but I still end up having to do it sometimes.  Previously if you've wanted to just use an array this has been fairly trivial, we'd just write the following:

int a[] = { 1, 2, 3 };

But if we wanted to use a vector then we'd end up with 4 lines of code to do the same job:

vector<int> a;
a.push_back(1);
a.push_back(2);
a.push_back(3);

Which isn't particularly nice to write and can lead to RSI related injuries.  But now functions (including constructors which are referred to as an initializer-list constructor) can accept a {} list by accepting an argument with the type std::initializer_list<T>.  This has been pushed into the STL so our favourite containers should now accept a {} list for initialization.

vector<int> a = { 1, 2, 3 };

map<int, vector<string>> c({ 
    {1, { "Ignoring", "The", "Voices" } },
    {2, { "In", "My", "Head" } }
});

Doesn't that just look a lot better, and it's certainly easier to type.  The nice thing about this new type is that it means we can write our own functions which take initializer lists, whether we're creating our own container class or just writing a function which can accept a {} list of values.


template<class T> void MyFunction(initializer_list<T> values)
{
    cout << "Number of items in initializer list: " << values.size() << endl;
    for (auto i = begin(values); i < end(values); ++i)
    {
        cout << *i << " ";
    }
    cout << endl;
}

This method then works by simply calling it in the following manner:

MyFunction<int>({ 1, 2, 3 });

Now you may have noticed something different with the for loop in that method, instead of using "values.begin()" and "values.end()" it's using "begin(values)" and "end(values)".  These are two stand-alone methods which return iterators to the beginning and end of the of the collection; the nice thing about these methods is that they work on any structure which works in a similar way to STL iterators (i.e. implements operator++, operator!= and operator*), which means that they won't work on dynamic arrays.

Full example Code

Range-For Statements

If you're use to working in languages such as C# or Python then the chances are you're use to seeing statements like these:

C#: foreach (int i in my_list) { ... }
Python: for i in my_list: ...

These are statements which  provide a simple syntax for working with each item in an iterable structure.  To perform something similar in C++ we would write something more like this:

for (vector<int>::iterator it = my_list.begin(); it != my_list.end(); ++it) { ... }

Which works and it does what we want it to, but secretly we've been looking over the shoulders of the C#, Java etc... developers and coveting their range loops.  Well not any more, now we too have a range loop which works on any iterable structure (i.e. anything you can iterate through like an STL-sequence defined by a begin() and end(), [1]), including initializer lists.

for (auto i : my_list) { ... }

So to give a more complete example, and using what we covered earlier we can do the following:

vector<string> a = { "Ignoring", "The", "Voices" };
for (const auto s : a)
{
    cout << s << endl;
}

Full example code

Which just looks a whole lot different from the following which we would have needed to write before hand to accomplish the same thing.

vector<string> a;
a.push_back("Ignoring");
a.push_back("The");
a.push_back("Voices");

for (vector<string>::const_iterator it = a.begin(); it != a.end(); ++it)
{
    cout << *it << endl;
}

The next post I'm planning on doing is about lambda expressions, these are another fantastic language feature which I use a lot in other languages such as C# and Python so I'm glad that they've finally made their way into C++ as well.  As it's a fairly sizable topic by itself I'll probably just a do a single post on those and then single posts for other features as well.  I think that the items I've covered in this post and my last are really the easiest to get going with and which have a fairly large impact on the code we write daily.

References
[1] Bjarne Stroustrup C++11 FAQ

All code provided in this article is provided under a BSD license.  If you spot an error then please do let me know so that we can make this better for anyone else reading it.
Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Sunday 6 November 2011

Getting started with C++11

Wow, has it really been this long since my last post?  It's been a mad few months getting the bathroom and kitchen finished off, dealing with a young child and now baby number 2 only a couple of weeks away.  With all of this going off I've somewhat neglected this blog and it's about time I started putting some articles up.

So I thought I'd try to get back into the swing of things by writing a few brief entries on getting going with some of the new features of the recently C++11 standard.  I've been keeping an eye on the process over the last year and I'm excited by the new features available to us being introduced in newer versions of the compilers.  All of the code I'll be putting up has been written and compiled on a laptop running Ubuntu 11.10 (Oneiric Ocelot) which has GCC 4.6.1 in the repositories (this has a list of the C++11 features available at the various releases), most things are coming along nicely but concurrency is still a way off.

Auto's

Quite possibly the most useful new feature in day-to-day use is the new "auto" keyword - if you know C# then this can be compared to the "var" keyword (which is not the same as the VB variant type) - which can be used in place of specifying a variable type where the type can be inferred by the compiler at the point of declaration.  So instead of typing:

int x = 42;

You can instead use:

auto x = 42;

This means that the compiler will infer x as an integer, after this point x will always be an integer (in the current scope).  This is most likely not something you will do day-to-day (personally I won't be likely to) but then this isn't where it's use shines through, lets instead look at another example:

std::vector<std::string> my_collection;
my_collection.push_back("Hello");
my_collection.push_back("World");

for (std::vector<std::string>::iterator it = my_collection.begin(); it != my_collection.end(); ++it)
{
    cout << *it << endl;
}

So, a simple collection that we then iterate over and write the value out to the console.  So where can the "auto" keyword help here?  Well that for loop is looking pretty doesn't it?  Wouldn't it be nice if there was some way we could tidy it up a little, maybe get it looking a little more like this:

for (auto it = my_collection.begin(); it != my_collection.end(); ++it)
{
    cout << *it << endl;
}

Full example code

And guess what, we can (yay!).  This is because when we declare "it" the compiler can infer it's type so we don't have to clutter up our code specifying the type when we already know what it is.  There is actually a few more things we can do to this example to make it even easier to read with new features but they'll come later.

As a Return Type

Yep, we can use the "auto" keyword in place of a return type as well, how does this work though as we're not specifying a variable, so how do we infer type?  Well we can now specify the return type at the end of the function declaration, so instead of: 

int Sum(int x, int y) { ... }

We can instead use the "auto" keyword and specify the type at the end:

auto Sum(int x, int y) -> int { ... }

Which doesn't look much better does it?  Well again this isn't really the intended use of the syntax, but if this isn't then what is? Well one place is where the type being returned is not known to the compiler at the point of definition.  Consider the following snippet from a header file:

class Test
{
public:
    enum TestEnum { One, Two, Three };
    void SetField(TestEnum t);
    TestEnum GetField();
private:
    TestEnum _field;
};

Implementing the setter is easy in the source file, we just write the following:

void Test::SetField(TestEnum t) { ... }

And for the getter we just write this:

TestEnum Test::GetField() { return _field; }

Dont we?  Well, no actually.  The compiler will return an error as the return type TestEnum is not known to the compiler at the point where we define the return type, to get this to work we would need to do the following:

Test::TestEnum Test::GetField() { return _field; }

Alternatively, using the "autokeyword as the return type and using the new return type syntax we could type the following instead:

auto Test::GetField() -> TestEnum { return _field; }

Full example code

This works because the compiler knows about TestEnum at the point where we now define the return type.  Still this doesn't look like it provides much benefit, but it will when we introduce the final new piece of syntax for this post.

decltype

This is an operator which is used to determine the type of an expression or variable so you can create a variable based on that type, like this:

int x = 3;
decltype(x) y = 5;       // same as int y = 5
decltype(x - y) z = 7;   // same as int z = 7

So far so good but again it doesn't look like it's bringing much to the party.  So what if we do the following instead:

std::map<int, std::vector<std::string>> MyFunction() { ... }
auto MakeCollection() -> decltype( MyFunction() )
{
    auto val = MyFunction();
    return val;
}

Full code example

Take a second, read it again, now think of all those poor keys on your keyboard, don't they deserve a break?  At this point the use of decltype and the new return type syntax and the new auto keyword all should hopefully look really useful and the kind of things you might want to start using a bit more frequently, they did for me when I first figured it out.  The whole thing looks even more appealing when you start considering templated functions as well when sometimes the return type can be more difficult if not impossible to figure out.  Also you are reading that right, I did write ">>" in there, the new specification treats this the way we read it which makes a lot more sense thankfully.

Just for completeness, here's the above snippet of code written using the more traditional syntax:

std::map<int, std::vector<std::string> > MyFunction() { ... }
std::map<int, std::vector<std::string> > MakeCollection()
{
    std::map<int, std::vector<std::string> > val = MyFunction();
    return val;
}

Anyone who says that last snippet is easier to read is either lying or wants their head examining, it's bad enough typing it!  So go on and give these new features a try, if you're not wanting to use them most of the time after a week I'll be shocked.

I'm planning my next post of this type to be about initializer lists and for-range loops, after which I will hopefully look at lamda expressions and smart pointers.  The items discussed above and the ones coming up - I feel - are the first things which makes C++ based on the new C++11 standard feel like a modern programming language and, hopefully, keeps new and experiences programmers coming back to it for years to come.

References
Wikipedia - C++11
CProgramming.com - C++11 articles
Bjarne Stroustrup - C++11 FAQ

All code provided in this article is provided under a BSD license.  If you spot an error then please do let me know so that we can make this better for anyone else reading it.
Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.