Ignoring the Voices

Azure Security and Automation

2017-01-30T20:46:00.001+00:00

Recently I've been doing a lot of work without automating some of my more mundane daily tasks in Azure so that I can free up time to work with the cool stuff (don't we all really just want to play with the shiny stuff). Having not blogged anything for a while and finding that online examples of pulling some of these things together wasn't as clear as it could be I thought I'd write something up.

All of these examples are something that you can run from an Azure Automation account, before you do though you will need to update the Azure and AzureRM modules to the latest version. I don't know why they aren't already at the latest version and it's a pain to do them in the right order but it's worth it in the long run.

So what kinds of things have I been automating? Well most of it is enabling security related features or changing settings which are less secure than they could be.

Please note that I don't list anything here as a silver bullet to prevent attacks, if someone is determined to get in then they most likely eventually will do (if they haven't already). You can make their lives more difficult though, hopefully persuading them to move onto easier targets or minimising the impact of a successful attack. A common mantra you will hear often now is "assume breach", put simply you must assume that your environment is already compromised, now how do you manage things in a way minimises impact and reduces the time between detection and resolution.

Also, I'm aware that most of the Powershell here could be improved on and/or simplified. Whilst I've been writing these my style has changed and I tend to write them out in longer form so that they're easier to follow. Feel free to take the code and re-arrange and modify as much as you want, no attribution required (but always welcome).

Virtual Machines

If you create a VM from the market place into a new resource group (I work almost exclusively in the new ARM portal) then along with the VM itself you'll get a virtual network with at least one subnet, a storage account and a network security group with an RDP rule in place (assuming it's a Windows based VM). So what's wrong here? Well, lets have a look.

RDP Rule

That RDP rule in the Network Security Group is an Any-Any rule on a standard RDP port (3389), this makes it incredibly easy for anyone with a relatively simple script to scan a large range of ports and see if anything is listening. From this they can then launch a brute-force attack (other types of attack are available) and if you've not used a particularly good password along with an obvious username then it won't take long for an attacker to gain access. Once in if you have a number of VMs on the same virtual network with the same usernames/passwords then traversal becomes fairly trivial and it's game over.

But hope there is, if changes you make. The most obvious changes you can make when setting up the VM are:

Make sure that you use strong passwords and don't use the same password everywhere
Don't use obvious usernames (e.g. admin)

Following that you can also modify your RDP rule so that RDP access is whitelisted (if it's needed at all), you can do this using with CIDR blocks and is pretty trivial. If you're a subscription administrator then you can also look for any wide open RDP rules and disable them.

If you're proficient with Desired State Configuration you could also look at changing the RDP port to a non-standard port. Whilst this isn't a fix it will stop a large number of "lazy" scans where attackers are just looking for the standard ports.

Storage Account

Storage accounts now support encryption services for blob storage across all regions. Whilst this might not be important to you personally some organisations are pretty insistent on using it to ensure compliance with their own requirements or those of their customers. Given how simple it is to enable it's worth getting use to working with it and switching it on by default.

Ideally you should create your storage account before you create your VM, this is because only data added to the storage account after encryption is enabled will be encrypted, any existing data will remain unsecured. So if your storage account is created as part of the VM provisioning then the VHD files will not be encrypted.

Virtual Machine drives

The final thing is that most market place images do not support BitLocker or Crypt drive encryption as part of their standard provisioning. This is useful to have in place as if an attacker does gain access to the storage account hosting the VHD files they could just download them and then browse through them at their own leisure, if it's encrypted then this becomes more difficult. I won't cover here how to do this as Microsoft's own documentation is already pretty good and it involves a few more steps then simply running a PowerShell Cmdlet.

SQL Servers and Databases

This might not come as a surprise but Microsoft are actually pretty good at managing their own infrastructure, because they do this well and at scale in Azure a lot of people are realising that actually leaving them to get on with it and utilising the services they provide on top of this infrastructure is a better option. SQL Servers are a great example of this, why should I have to bother with managing OS upgrades, security patches and version upgrades if someone else who knows this stuff inside out can do it for me?

This doesn't make the service fool-proof and there are still ways to improve on it. Azure SQL offers a couple of features which can beef up security.

Transparent Data Encryption

A lot like encryption services for blob storage this may or may not be something you want to implement, but again a number of organisations have an "encryption at rest" requirement which this feature addresses. Again, because it is so trivial to implement it's worth getting use to enabling this by default. Unlike storage accounts however, enabling this feature will encrypt all existing data.

Auditing and Threat Detection

Capturing your audit events to blob storage is a fairly obvious thing to want to do, if something does happen you'll want to know when, how and what. Unfortunately this hasn't been rolled out to all regions at the time of writing, for example the UK regions are still missing this feature.

Threat detection is a number of threat types which can monitor for and can email the subscription co-admins along with any other number of recipients alerting them when a threat is identified, such as SQL injection attacks.

Setting this is up is not tricky but is a little more involved than the transparent data encryption setting. However this is the kind of thing that will let you capture an attack earlier and so it's worth enabling. Note that this script makes use of Automation variables which will need to be created and configured to ensure that the script runs correctly (i.e. doesn't break)

Security Center

This should be something you have open pretty much at all times, it should be regularly monitored and actions taken from it. Everything I have outlined above is an item which is monitored and reported on by Security Center. Some issues such as transparent data encryption and deployment of end-point protection can be rolled out directly from the Security Center blade make a few of the issues incredibly simple to resolve. Also reported on from here are threat issues identified such as malware being identified on VMs, brute force RDP attacks etc... These are detailed with a priority, a description of what was detected, the resource being attacked and often steps for remediation.

Before showing everyone how great it is though it's worth preparing them for it, sometimes the amount of information can be overwhelming to which people may respond negatively, this is often when the "Azure is too insecure" arguments can start. A lot of organisations would have nothing close to this in their on-premise environments and so have the opposite view that because they can't see this information it must be more secure (ignorance is bliss right?).

As with most services in Azure Security Center is constantly being improved upon with new features being delivered often. I'm already pretty sure this entire post will be out-of-date in about a few months if not sooner!

SonarQube and LetsEncrypt

2016-04-29T09:00:00.000+01:00

Recently I've completed moving our "temporary" SonarQube to something which is a bit more production ready. This pretty much looks like a Windows server hosted in Azure, backed by an Azure SQL database with a reverse proxy in front of it so that we can enable HTTPS. Migrating from the old server was relatively painless although if you're about to do the same I'd suggest looking at the SQL Database Migration Wizard available on CodePlex to move the database.

Rather than sticking messing around with multiple Azure provisioned names we decided to purchase a domain name for our internal development systems. Again this was really easy to do through the Azure App Service blade and within a couple of minutes we had the name and I'd setup the A record for our SonarQube server. A quick test proved that I could now access the system over HTTPS using our new domain name. Just one problem, it was using a self-signed certificate!

After looking around I decided to try out LetsEncrypt.org to get a certificate. The biggest problem with this is that the tools they provide to get a certificate don't work on Windows. Fortunately there's a pretty good Windows utility written by a community member which works brilliantly on Windows called letsencrypt-win-simple.

To run this tool I had to temporarily disable the reverse proxy rules I'd created in IIS, make sure that the site was backed by a folder on the local drive and open up HTTP access (by binding the appropriate port in IIS and enabling access to the server of HTTP in the Azure resource groups Network Security Group), once I had this I could run the tool from the command line. Other than a couple of prompts for me such as selecting the correct website it had auto-discovered the process was completely automated and in about 15 seconds I had a valid SSL certificate installed and configured against the correct binding, the self-signed certificate had been replaced with the new valid certificate. After this it was a trivial task to then reset the changes I'd made previously such as re-enabling the reverse proxy rules, removing the HTTP binding and removing the rule in the Network Security Group.

So, SonarQube was up and running with all of my old data migrated, user accounts were set up and working and I could log in over an HTTPS connection. A quick change of settings in Visual Studio Team Services and the end-point was now pointing at the new server as well.

But then...

I ran a quick test build which ran analysis using the SonarQube quality profiles and it failed!

The error in the build output was pretty long but buried towards the bottom of it was this little gem of an error message.

sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

So the Java process doesn't like the certificate? It came as a bit of a surprise as I'd tested the site out in a number of browsers and none of them had reported a certificate problem. A quick Google (other search engines are available) and I came across this thread on the LetsEncrypt community boards. It turns out that LetsEncrypt along with a number of others such as StartSSL are not included in the out-of-box Java client trust store. In the thread someone has posted a quick URL reader class you can use to prove the point by pointing a request at https://helloworld.letsencrypt.org and naturally this fails with the same error.

There's quite a few suggestions of how to work around this on the thread but the most common theme is to add the chain certificate to the trusted store manually. I tried out a few of the suggestions in the thread but had a couple of problems.

Some of the solutions are Linux based
Almost all of them talk about a chain.pem file and I had no idea what that was!

All of the solutions largely focused in around the Java keytool.exe utility you can use to add a certificate to the clients trusted store, so just need to figure out which certificate to install.

I'll not cover the process I took to work it out, but eventually I got the the LetsEncrypt X3 cross signed certificate which you can get from their website.

To use the keytool.exe you need the latest Java runtime installed on the server running the Build Agent, the utility should then be available under %JAVA_HOME%\bin\keytool.exe. To try and automate the process a bit I created a PowerShell scrpit which will download the certificate and execute the keytool.exe utility to install the certificate to the trusted key store. It needs to be run as an administrator and naturally the machine needs internet access. Feel free to use or modify as needed.

Once I'd run this on the build server I re-ran the build and this time, SUCCESS! Now I just need to remember to do this each time the JRE gets updated.

Comparing with the previous row using Microsoft Power BI

2016-02-22T21:59:00.002+00:00

This is something which has come up recently at work which at first glance seems like it should be straight forward (tl;dr it is when you know how) but if you're new to Power BI or Power Pivot then it's something which takes some thinking about. Please note that for this post I will be sticking with Power BI, but if you want to see how this can work with Power Pivot then have a look at Dany Hoter's post over at PowerPivot(Pro).

The Problem

Let's say that you have some data, which is spread over time and you want to be able to create a BI visualisation showing how that value changes as a delta from the previous value.

Implementing a solution

To walk through how to do this, in a way which is hopefully easy to follow, I'm going to grab some sample data which is available on-line. That data is is the Annual Sheep Population in England & Wales 1867 - 1939 (measured in 1000's) from DataMarket. No idea why I picked sheep populations but there we go.

The goal here is to be able to create a simple chart which can show the sheep population for a given year and the population difference from the previous year (except for 1867 which should show a difference of 0).

So step 1 is to go and download the data, I've opted for CSV for this example but if you want to try another format then go for it. Once you've downloaded the data fire up Power BI.

Connecting to the data source

Once Power BI has opened hit "Get Data" to connect to your data source and follow these steps:

Getting data into Power BI

Select the CSV connector
Navigate to and open the sample data set from your local machine
Once the preview window has opened select the "Edit" button

Configuring your data source

One of the first things I've done is renamed the query over in the Query Properties section to "sheep", you don't have to do this but it makes things easier to read for me and also makes the screen shots a little less cluttered.

If you look at your data in the Query Editor window and scroll to the bottom you'll see that there are 3 rows which were in the CSV file which aren't part of the data. To get rid of these select the "Remove Rows" button from the ribbon and then select the "Remove Bottom Rows" option. When the option dialog appears enter the value "3" and press "OK", this will remove the rows from the bottom of the data set (scroll down and check).

Removing rows from the query

Next we'll rename the columns to "Year" and "Annual Population", the first of these is fine but the second column will need to be changed, do this by right-clicking on the column title and selecting the "Rename" option. Alternatively you can select the column and use the "Rename" item in the "Transform" ribbon section.

Now right-click on each column title and under "Change Type" select the "Whole Number" option. This will ensure that we are working with the data correctly, and you'll notice that the year was mostly likely set to "Text" because of the content of the rows we previously removed.

Almost there, now we need to sort the "Year" column in ascending order, you can use the options in the ribbon or click on the drop-down menu next to the column title and select the correct sort option.

Finally we need to add an index column (we'll see why in a bit) to the data source, in the "Add Column" ribbon. If you don't select any options then the default is to start from zero, which we'll do in this example, but you can choose to start from 1 or a custom value with a custom increment. You can leave this new column name as Index.

Adding the index column

Now that we've done all of the prep work you can click the "Close & Apply" option under the Home section in the ribbon.

Creating the new population difference column

Back in Power BI we'll need to be under the "Data" section on the left.

Viewing the data in Power BI

From here add a new column to the data collection by right clicking on the "sheep" (or originally named query if you didn't change it earlier) and selecting the "New Column" option.

Adding a new column to the query

You'll now have a new column imaginatively called "Column" and a query expression editor at the top. Just to get something for the moment let's enter the following:

Column = SUM(sheep[Annual Population])

This will give us a new column called column where every row will now have the sum of the Annual Population column.

Our first new column

As you can see this isn't particularly useful so lets change that expression to something a bit more useful for our purposes.

Population Difference =
'sheep'[Annual Population] - IF(
'sheep'[Index] = 0,
'sheep'[Annual Population],
LOOKUPVALUE(
'sheep'[Annual Population],
'sheep'[Index],
'sheep'[Index]-1)
)

That's quite a lot to take in so we'll break it down a bit and talk through what this is doing.

The first bit is quite simple as we're just saying to take the current sheep population and subtract a number from it. When we get to the IF statement we're checking to see if the current Index is 0 (I created my Index column to start from zero) and if it is return the current sheep population, this is so that we get a difference value of 0. If the current index is not zero then we're doing a lookup, with this we're looking up the sheep population value by checking the Index column for a value which is the current Index value less 1 (i.e. the previous value).

If we change our column to this expression we should get the following.

Our data with a population difference value

Visualizing the data

From here we can go back into the Reports section and create a number of visualizations against this data. Here I've provided an example where I'm using the "Line and Stacked Column Chart" visualization with the annual population for the column values and the population difference for the population difference. Play around with it and see what looks good for you though.

Visualizing the data

I've provided the data and the pbix file for download if you want to have a look at the version I've put together and play around with it. Just click on the OneDrive link below.

Coming up next...

One other thing that comes up once in a while is producing a moving value such as a moving average. Next post we'll extend this basic example to show how we can accomplish this.

Working with Entity Framework Code First and SQL Bulk Copy

2014-09-24T21:57:00.000+01:00

There's a few of these that I haven't written, but it seems that you could keep a blog going pretty well with just "Working with Entity Framework and ..." posts. That's not because Entity Framework is particularly bad, I really quite like it myself, but because if you're writing an application of any considerable size or complexity you will at some point reach the limitations of what is possible "out of the box". I'm also aware of the number of "ORMs are evil" posts circulating currently, and possibly you're someone who thinks the same, but for what I'm working on now they make perfect sense, and I'm all for using the right tool for the job.

So what's the problem this time?

General purpose ORMs are great when you're working with transactions, it's what they're meant for. A user interacts with the system and you need to persist the data from their most recent transaction. The problem this time is that as the system grows you suddenly find yourself with a couple of use cases where you need to persist a lot of data. In the example application I've put together for this post I'm using an XML data source with 10,000 records which need to be persisted.

The problem here is that when running with this size data set (with auto-tracking changes disabled) is that it is taking around 40 seconds to run. 10,000 records in 40 seconds is certainly more that I can process manually but for a modern computer it's not so great. The problem is that as you're adding more records to the context it's getting bloated, it has to start tracking more and more entities, then each entity is persisted individually. That last point is important because each insert in Entity Framework inserts the new record and then pulls back out the newly created ID code and updates the entity in memory with the new ID code, which is not a trivial amount of work.

So what are the solutions?

Disable features: The first thing to check is that you are telling the context to not auto-track changes, it's a small thing but you're giving the context less work to do, performance isn't about making your code faster, it's about making it do less.

If you were to run this again you would find that you've taken a few seconds of the total run time, which is better but it's still no where near fast enough.

Commit early, commit often: A fairly obvious option but rather than waiting until we've attached all of the entities to the context before saving, save more frequently (e.g. every 100 entities). Again this is reducing the amount of work the context is having to perform when it figures out which entities it needs to persist and makes a more significant impact in our figures, but it's still got a way to go.

You might also remember that I mentioned about the context getting bloated, well we can do something about that as well by re-creating the context after each time we save the changes. This stops the context from getting too bloated and again reduces the amount of effort needed to work out which entities need to be persisted. We've added in some work now for context initialisation but this is typically cheaper. This does take a bit more effort to maintain and ensure that we're not breaking the system by doing anything stupid, but it again takes a bit more of a chunk out of the run time.

Get to the DbSet differently: The typical route to adding an entity is to add it to the contexts DbSet collection. Bizarrely this collection doesn't have an AddRange method, but there is a way to get at one by asking the context for the set directly. By adding the entities using the AddRange method we can skip all of the tedious foreach looping and adding the entities one at a time. So we can now make a simple call to AddRange followed by a call to SaveChanges, this is much more performant than the previous solutions, getting down to some slightly more reasonable numbers.

But what about SQL Bulk Copy?

So the title of the post is about bulk copying, and having read through the above you're probably wondering why I didn't just jump to this as a solution. Well, Entity Framework has no out of the box support for SQL Bulk Copy because it's database agnostic. But I've done this before when working with SQL FILESTREAM so why can't I do it again?

Being a lazy developer the first thing I did was look for an existing solution and one turned up in the shape of EntityFramework.BulkInsert. It seems pretty popular online for this kind of a problem, is available through NuGet and is pretty easy to use. After adding the package and creating a method to try it out I ran the sample application and waited for it to finish. It took me a while before I realised that it already had! For 10,000 records it ran in under 1 second.

So surely EntityFramework.BulkInsert is the answer then? Well if you want to stop reading here and go and download it then please do, it's a great little package. Naturally there are a few things that you need to take into consideration. First of all bulk copying doesn't bring the ID codes back, so if you need the values you will have to think of a way around this (think SQL 2012 sequences and sp_sequence_get_range). Next you have to think about how bulk copying works and make sure you get the bulk copy options correct. By default it won't check any constraints you have in place and it might not observe NULL values, instead putting in default values for the column type. It also works within its own transaction (unless you provide a TransactionScope), but if you can work around these then you have a great little solution in your hands.

SQL Bulk Copy

I'm a lazy developer but I'm also a fascinated one, I wanted to know if I could still write the code using the System.Data.SqlClient.SqlBulkCopy class instead of relying on 3rd party packages or falling back to ADO.NET (which is an option, but not one I'm going to cover).

I already know that I can get the connection information from the context, and I've previously shown how to get the mapped table name for a given entity, so surely this is possible. But I am going to be a little bit lazy and not implement an IDataReader for my collection, instead I'm going to load the entities into a DataTable and use that (note, this option really isn't going to scale well).

This is actually a fairly easy solution to implement with probably the most complicated piece being a fairly simple extension method which pulls out the entity properties and their types, then using this to create a DataTable and copy the data using reflection (again, this isn't going to scale well). Once you have that you just need to write the data to the database for your chosen batch size.

This solution isn't quite as fast as the EntityFramework.BulkInsert component, mostly for the reasons I mention, but it can still persist 100,000 records in about 1 second.

I've created a project which is available on GitHub under an MIT license for you to grab and look at. I've done this because the code isn't really that difficult to follow and is pretty similar to my previous post on SQL FILESTREAM and me talking through lines of code is boring. Also available is a LinqPad file which I used to create the input data files, just change the number of entities and run it. But for convenience I've added a 1,000 and 10,000 entity files into the project anyway.

Where does your FILESTREAM data live?

2014-07-08T16:54:00.002+01:00

This is hopefully just a short one but follows up on a previous post about using FILESTREAM with Entity Framework.

After implementing the solution almost as posted we ran into some problems on the environment it was being deployed to (for reference, do NOT disable netBIOS!). Whilst investigating these issues we needed to figure out where on the server the FILESTREAM data was being stored. Looking around the internet I found many posts about getting the path name from a SELECT statement but that's useless if you want to know "my data is at C:\<path>" because you want to check disk space, permissions etc... Not having the DBA who installed available wasn't useful either and there doesn't appear to be a way to get this information back from SQL Server Management Studio.

But, there is a way to find it from the system tables. So I put together a SQL statement which pulls the information out. It worked for me but you may want to tweak it to give you the information you want, to filter stuff out and so on.

Working with Entity Framework Code First and SQL FILESTREAM

2014-01-30T22:00:00.001+00:00

Whilst looking around at these two technologies I couldn't find any information which was particularly useful about how to get the solution working. So I wanted to put this up so that should anyone else want to try something similar then there's some (hopefully) useful information out there.

What am I trying to do?

The system that I am involved with at the moment has the need to store files with some of it's entities. We've started out by using Entity Framework 5 Code First (although in this post I'll be working with Entity Framework 6) to create the data model and database mappings which is working nicely and is proving itself a very useful framework.

When looking at saving file data along with an entity there are a few choices:

Store the file contents directly in the database
Save the file contents to the file system
Use a database table with FILESTREAM enabled
Use a FILETABLE

The first option is fine for small files, but I don't really want to load all of the file data each time I query the table and the files might get pretty big.

The next option is a reasonable option and with recent versions of Windows Server we have a transactional file system. So far so good, but it would be nice to not have to think about two persistence mechanisms.

FILESTREAMs were introduced in SQL Server 2008 and allow you to store unstructured data on the file system, so it feels like we're using the right tools for the job but they're all nicely in the same package. The problem here is that Entity Framework doesn't support FILESTREAM.

Lastly there's FILETABLE which was introduced with SQL Server 2012. This is like FILESTREAM, but rather than defining it at a column level you get a table created for you which provides information from the file system. It is a really nice system, but it didn't quite fit with how we're structuring data and it's also not supported by Entity Framework.

So the option that I would ideally like to work with here is the FILESTREAM option as it gives me all of the database goodness but with the performance of the file system. But there is just that minor sticking point of it not being supported by Entity Framework. After a fair amount of playing around with the technology and research into the problem I figured that I could probably make it work by falling back to basic ADO.NET for handling the FILESTREAM part of the requests. Whilst this was an option I didn't really want to start having different technology choices for doing database work, so the goal was to see how much I could get away with in Entity Framework.

Setting up the test solution

The database server

With SQL Server the default install options will not give you a FILESTREAM enabled server but you can enable it. I'm not going to go into how with this post as Microsoft have some pretty good documentation available on how to do this.

This also means that we can't let Entity Framework create the database for us, so you will need to create an empty, FILESTREAM enabled database and point to that.

The outline of the project

I created the solution in Visual Studio 2013, and delving into the most creative parts of my mind I came up with a solution that has hotels, with rooms and multiple pictures of each room (okay, not my most creative moment, but it gets the job done).

So in this solution I have my data model which is pretty simple. I have some locations, at each location there are a number of hotels, each hotel has rooms and each room has photos.

Solution Data Model

Of all of these the import one is Photo. This entity has some basic properties, Title and Description, which describe the photo, then there's the navigation properties for getting back to the room and then lastly there's the Data property which is intended to hold the content of the file. Normally Entity Framework would see this property and it's type (a byte array) and map it to an appropriately named column of type VARBINARY(max). Whilst we could still let it do this, it would somewhat defeat the purpose of the exercise as we'd be storing the contents of the file directly in the database, so we need to add some configuration to tell Entity Framework to ignore this property when mapping.

Photo entity configuration

I'm using the Fluent API here, but you should be able to do this using Data Annotations as well.

At this point if we were to deploy the database we would get a table with no data information and a blank property in our entity. What we need to do next before any of this is useful is to somehow get a FILESTREAM column into the Photo table. The solution to this is to use Entity Framework migrations, the basics of which I'll not cover here and leave it as an exercise to the reader.

Migrations provides us with a migration class for each migration added to uplift and roll-back the changes to the database. The useful method for us in this class is the Sql method which allows us to execute SQL commands; using this we can add our ROWGUID column and our FILESTREAM column with all the constraints we need (and of course the appropriate commands to remove it all again as well for the Down method).

Migrations SQL commands

Now if we run the Update-Database command from the Package Manager Console we get a table with all the right columns of the right types for being able to use FILESTREAM.

So that's half the battle won, the next challenge is being able to read to and write from the table.

Storing and retrieving file data

So how do we query data in a FILESTREAM column? Well this is the bit where we fall back to the System.Data.SqlTypes namespace, specifically the SqlFileStream class. We use this class to read the contents of the file back from the server as a stream, but this only works in the context of a SQL transaction.

So the first thing we need to do is get the file path and the SQL transaction information, we can then pass this to the SqlFileStream constructor to get our stream, after which it's just a case of reading from the byte array in our entity and writing to the SqlFileStream stream. To get this information we need to run a custom SQL statement. We could do this using a SqlCommand object, but I still want to stick to Entity Framework a bit more, fortunately there's the DbContext.Database.SqlQuery<TElement> class which we can use to run raw SQL statements, it also handles parameters so we can parametrize the query (great for guarding against SQL injection attacks) and it an enumerable collection mapped to TElement (which does not have to be a part of our data model).

Raw Data Query

The FileStreamRowData class here is a custom class with a string property for the path, and a byte array for the transaction context.

Running all of this inside of a transaction scope will get information required (the call to "First" will enumerate the collection) to pass to the SqlFileStream constructor, we can then use this to write data to the stream.

Writing to the FILESTREAM

The same applies when writing to the database as well, but with the source and destination reversed. Also when writing to the database you would need to save the entity first. Wrapping up the Entity Framework bit in the same transaction scope means that even if you call "SaveChanges" on your context, if the transaction does not successfully complete then the changes are stilled rolled back.

So does it work?

Well, yes it does, and it works pretty nicely as well. It's maybe not the final solution that I'll use as I'm still investigating a couple of other options, but it's certainly not a solution that I would be upset at using, and by hiding the complexity in the data services the client need never know how the file information is being held in the database or using which technologies.

How do I play with it?

You could probably work most of what you need out from this post, but for convenience sake I've also put up the whole solution onto GitHub, so feel free to head over and take a look. If you come up with any suggestions or improvements then feel free to contribute.

Playing with CoffeeScript

2013-07-30T21:30:00.001+01:00

I've recently been playing around with CoffeeScript lately, and as I have a tendency to do I decided to crack open a prime number challenge and see what the solution looked like.

The Challenge

I recently set this up as a challenge at work as a bit of fun which goes as follows.

"Calculate the first 10,000 prime numbers, output the largest prime number and the sum of all palindromic primes"

I've implemented the solution a number of times using C#, C++, Python, Go and JavaScript and it is the latter that I wanted to compare the solution to. The JavaScript solution I created typically ran in about 15ms after a fair amount of tweaking and profiling to squeeze as much out of it as I could do (within my own abilities).

In all I found writing the solution a pleasant experience, with a very simple and expressive syntax which abstracts away some of the ugliness that is JavaScript (not that I particularly dislike JavaScript, it's quite fun actually). List comprehensions were incredibly useful and powerful as they are in other languages and writing iterations and checks were very simple.

Anyway, here's my solution. I'm sure it's not perfect, and there are probably some CoffeeScript tricks that I've not picked up yet. But the solution is as fast as my JavaScript implementation and much easier to read, so all positives in my book :)

My JavaScript solution along with a couple of others is available on the gist as well.

EDIT: Since posting this I tweaked the solution a little bit after noticing that I was evaluating all of the values in the array of candidates, instead of working only up to the square root of the upper value

SQL Injection with Entity Framework 5 and Static Code Analysis

2013-07-03T22:10:00.000+01:00

It's interesting times here at the moment having started a new, big, project which is using MVC 4 Web API and Entity Framework 5 and some other cool stuff. There is also a big push for secure coding at the moment so I'm evaluating various tools to help out with code and solution analysis for this.

Thinking about exploits

So the project is using Entity Framework 5, Code First. It's a really nice technology even if it does have some limitations and quirks, but then that's the trade-off you make when you choose not to do everything manually.

Whilst using it, and looking at static code analysis tools I got to wondering how easy it would be to create a SQL injection exploit and, more importantly, how well do the code analysis tools pick it up. The reason this kind of attack came to mind was because I was at the time reading the updated 2013 OWASP Top 10 and this is pretty close to the top (actually, I think it might be number 1).

Typically when using Entity Framework you would use LINQ to query your data source, this helps in that the framework will create parametrized queries, a great line of defence (maybe, more on this later). But, what if you want to do something funky, well EF5 has you covered and allows you to run any SQL you want.

So I created a basic Web API project to retrieve data from a books database (imaginative!). It included a couple of entity POCOs for the books and authors with some referential integrity, a DbContext class, and of course the API controller. The method I implemented simply takes a string value and searches for all books with that text in the title. What could possibly go wrong? On to the code.

Exploiting the code

So if I run the code from Visual Studio 2012 and point my web browser to the resource method at "/api/books/?title=shadow" I get the single book in my limited database back.

Great, working so far. So what else can we provide as an API request? Well how about the following

/api/books/?title=' drop database test --'

Well, what does the response say?

That looks fine doesn't it? Well yes, until I look in my database server and realise that my "test" database has suddenly vanished!

Looking at the exploit

As you can see, it's a fairly obvious exploit and one that should be picked up quite easily during a code review, although it frankly should never get to that point anyway and I would hope that most developers wouldn't think that this is acceptable. But, with deadlines looming and managers breathing down your neck, even the best of us can make silly mistakes, so what can we do?

Well, one option is static code analysis. This is where a tool looks at your code and determines if you are breaking any rules. The most obvious choice here is the tool which comes with Visual Studio (in 2012 this is now available in the professional edition), also known as FxCop. It comes with a built-in rule-set called "Microsoft Security Rules" and running it is as simple as going to the Code Analysis window, clicking the settings icon, setting the rule set and then running it.

Unfortunately it doesn't detect this exploit, so having code analysis running on your build environment and reporting issues isn't going to save you here. So I tried the version of Code Analysis which comes with Visual Studio 2013 Preview Edition, still no luck.

Next up was CAT.NET, this is a tool which has a very limited set of rules and hasn't seen much attention from Microsoft since 2010, but it's still available so I tried it. As expected it didn't find anything!

Last up was a commercial product which I'm now going to name here, but is in the NIST list. It touts a large list of security rules and in demonstrations is very impressive, but here it failed to detect the issue. It did detect that I should be using column names instead of "SELECT *", but that's not going to save the day really.

So lets make it parametrized

So will making it a parametrized query help? Well lets see what it looks like in code (thanks to Troy Hunt for highlighting this in the first place).

This uses a stored procedure (even safer right?), the code for which looks like this.

So is this better? Well, no.

The problem here is that I've shifted the injection vulnerability to the database. But in a large development team where there are people looking after the database and others writing the C# code, in this kind of environment, without proper oversight of the full product you might find that you run into this problem.

So what do we do

Well, the only way to prevent this kind of simple mistake is education. Make sure that developers don't think that using ORMs like Entity Framework protect them completely from exploits, and make sure that they're aware of coding defensively.

For now we're going to have to keep a closer eye on the code, but keep on pushing the vendors to support the latest technologies so that some of these checks can be automated, then we can turn our attention to less obvious problems.

Update

Thanks to Rowan Miller in the comments below for suggesting that I update the post to show how to use the above pattern safely with Entity Framework. When I wrote this post I wanted to highlight how easily you could misuse good framework, and how tools out of the box failed to detect it. The trouble is that it's easy to forget to show how to do something positively when you're looking at a negative.

So, with that in mind, here is an example using the above scenario, but this time using Database.SqlQuery<T> with a parametrized query and using a SqlParameter (although you can pass the value in directly in an object array if you don't want to create a parameter).

It's perhaps still not the best way of doing it, but if you have to write code this way then it's better that you do it right and reduce the risk of attack rather than bypassing all the secure features in place for you and opening up your application for exploits.

Reversing a string in C# - is it really that difficult!

2012-12-18T21:59:00.001+00:00

I've been interviewing people for a C# developers position recently which has taken the usual "technical questions, chat, random questions, any questions for us" kind of format which I'm sure most people who've been looking for a job recently will be familiar with.

One of the technical questions we've been asking recently is "Can you describe how you would reverse a string in C#". Note, we're not asking anyone to code a full answer, just describe how they would do it.

This is one of those pretty standard interview questions, much like the FizzBuzz question, which is presented to see if candidates can demonstrate a basic understanding of a programming language and show that they can understand simple requirements. And yet, it seems to throw so many candidates!

The requirements are simple.

Take a string such as "ignoring the voices"
Reverse the characters of the string
Output the result (i.e. "seciov eht gnirongi")

As with most things there are a number of ways to achieve this goal, some more efficient than others, some more geeky than others, but still there are varied number of ways to do it, so I wanted to demonstrate a couple of methods here to show it is possible. To date I've only received a single correct answer, the other answers have been either non-existent (I don't know how) or far from the mark.

Method 1 - The agnostic approach

This method is the agnostic approach as it shows an algorithm rather than a platform specific solution. This is a good solution in an interview as it shows that you understand the problem and can devise algorithms to solve it, although it's worth saying that there may be a solution more relevant to the platform. The solution is simple, work your way in to the middle of the string (character array) from the start and end of the string, swapping the characters as you go. The important thing is to stop when you get half way through the string, otherwise you'll reverse the same characters again and arrive back at the original string. The example is presented as an extension method.

Method 2 - The C# (.NET 4) approach

The .NET framework provides a large number of methods available on all of the standard types, these can help solve this problem in a way which is specific to C# and the .NET platform. The important thing to remember is that strings are just arrays of characters, and so any methods which can be applied to arrays can be applied to strings as character arrays. More recently some methods which were only for arrays have been moved to strings as well, conveniently one of them is the Reverse method. That's not to say that you can just reverse a string directly as the Reverse method returns an IEnumerable object, but the IEnumerable object can return an array and the string constructor can take a character array to initialize a string object with, so we get the following method.

This is a vastly simpler implementation than the first, but is it any better? Well it shows an understanding of the language which is a positive, but not of algorithm implementation, so it's neither better or worse than the first method. The best answer to give would be the first answer followed by the second to show an understanding of algorithms and of the required platform.

Of course from these you can devise variations on the problem, such as reversing the words in a string whilst keeping the characters of each word in the correct direction.

Or even reversing the characters of each word, but keeping the words themselves in the correct order.

So why do some programmers struggle with these kind of questions? Is this indicative of programmers as a whole or is this just a lull? I can only guess at the reason, but non-the-less it feels concerning at the moment.

Playing around with Go, prime numbers and ancient Greece

2012-10-24T11:18:00.002+01:00

I've been playing around with Google Go (golang if you want to Google it) a lot lately as I've been of sick and needed something to stop me from going nuts. It's been a while since I played with the language properly and so I started off by reading through a new book which has been written by Caleb Doxsey entitled An Introduction to Programming in Go which you can read for free on-line (or download as a PDF) or purchase from Amazon. I read it for free on-line but purchased a Kindle copy of it from Amazon afterwards as it's a really good book and really well priced.

So after going through the book I needed something to practice on, not wanting to throw myself into an active project I decided to head over to Project Euler. A great resource if you want to practice using a language without resorting to made up scenarios or over elaborate "Hello World" applications. There are a number of problems (399 at the time of writing) which are mathematical/programming based in nature. If you're not too good at maths or it's been a while since you were in full time education then you may need to Google a bit to get an idea of the problem but they're not beyond the average person.

The Problem

The one I want to look at here specifically is problem 7 which (at the time of writing) is a problem to find the 10,001st prime number. There are two obvious ways to solve this, the first being a prime number generator. This is a function which keeps on incrementing a count, checks to see if the next number is prime and if it is the number is returned from the function. The second method is to use a sieve, this takes a sequence of numbers and "removes" all of the ones which are not prime numbers, it does this by:

1. Create a sequence of numbers, marking each as a "prime candidate"
2. Starting from 2, check to see if the entry is marked as a prime candidate
a. If the entry is a prime candidate then mark each multiple in the sequence as non-prime
3. Return the collection of prime of numbers still marked as prime candidates as the final set of prime numbers

This is an over-simplified explanation of the process, for a more detailed explanation see the Wikipedia entry on the Sieve of Eratosthenes.

The issue with creating a sieve is that you must specify an upper limit, that is to say "find all prime numbers less than N". The problem however poses the question as "find the first N prime numbers". This means that in order to use a sieve we have to know roughly where the Nth prime will be, or use a high enough limit to generate enough primes to get N. Because of this it would seem that using a generator would be a better option. One thing to keep in mind is that, although there is no time limit to the problems, generally speaking if the solution takes longer than a minute then it may not be the best solution.

The Solution

Before I start

One thing I should probably mention here is that I'm only checking to see if a number is divisible by prime numbers instead of checking for all possible divisors. The theory here is Prime Factorization. So a prime number is a number which is only divisible by 1 and itself, other positive integers (other than 1 which is special) are composite numbers, meaning that they are divisible by integers other than 1 and themselves. Consider the number 20, this can be decomposed as follows:

20 / 2 = 10
10 / 2 = 5
5 / 5 = 1

So the number 20 can be represented as 2 x 2 x 5 (or 2^2 x 5).

Because numbers which are not primes must be evenly divisible by at least one prime number less than itself then this reduces the number of numbers to check for to see if a number is prime.

Solution 1

Because I wanted to play with Go I took the opportunity to create a prime number generator using goroutines (think light-weight threads). This was done by creating a number "emitter" at one end and a "listener" at another end, when a number leaves the emitter it passes through a series of checkers, these check to see if the number is evenly divisible by a given prime number. So the first checker will check to see if the number is evenly divisible by 2, if not then it is passed to the next checker which checks to see if it is evenly divisible by 3. The "listener" watches the output from the last checker, if a number appears here then it is not divisible by any of the currently known primes and so must be a prime number itself. In this event, the listener creates a new checker for the new prime number which listens to the output from the last checker, the listener then listens for output from this new checker, it also puts the new prime number into a channel and waits for it to be read by the client.

Emitter > Check For 2 > Check For 3 > Check For 5 > Listener

In implementation I thought this to be a rather elegant solution, however for production use this turns out to be quite a time consuming exercise. It allowed me however to play with channels and goroutines and so was incredibly useful. Next up I decided to implement a similar generator again, but without goroutines and channels.

Solution 2

This solution is more simple as it simply builds up a list of prime numbers and runs in a single goroutine (the main one). The theory is almost identical though; keep on checking numbers to see if they are evenly divisible by the current list of known prime numbers, if not then it is a prime number and add it to the list. The implementation this time takes a number which is the Nth prime to find, this means that the function will keep on running until it has found that many prime numbers and will return the last one found.

This was straight forward to implement and ran so much faster than the first solution (something along the lines of 38 seconds for solution 1 to 6 seconds for solution 2). But I still felt that it was too slow.

Solution 3 - The Prime Sieve

So I investigated the sieve solution. I gave a brief outline of what is required above but there are a couple of optimizations to be made. The first is that for a given sequence of numbers you only need to look for prime numbers up to the square root of the limit, numbers after this point are either multiples of the prime numbers discovered to that point, or are prime numbers themselves. Say we wanted to find all of the prime numbers up to 16 then we would limit ourselves to only looking up to 4 (the square root of 16):

1	N/A
2	Prime Number
3	Prime Number
4	Divisible by 2
Search limit
5	Prime Number
6	Divisible by 3
7	Prime Number
8	Divisible by 2
9	Divisible by 3
10	Divisible by 2
11	Prime Number
12	Divisible by 2
13	Prime Number
14	Divisible by 2

The second optimization is fairly obvious from the table above, every even number with the exception of 2 is non-prime. This means that we can initially mark every even candidate as non-prime (except 2), then starting from 3 check every odd number (e.g. 3, 5, 7, 9...), this halves the number of candidates which need checking.

In order to get around the "known limit" problem I implemented the solution to sieve out all primes up to 1,000,000 (a suitably large number I thought). There are ways in which you can determine the upper limit, but the math is a bit beyond me right now, this is Prime Number Theorem, feel free to take a look. If this were implemented then we would be checking fewer potential candidates meaning less work again.

(Edit, thanks to compoasso from the comments I implemented a function to get the upper limit so that it is no longer checking from primes up to 1 million)

I didn't hold out much hope for this solution as I am sieving for primes in a large data set to find 1 specific number, so I figured I that I wasn't going to do much better than the ~6 seconds from solution 2. So the result surprised me a little. After putting together the solution and running it with the "time" command on my HP Mini 210 (Intel Atom N450, dual core 1.6Ghz) running Ubuntu Linux I got the correct result in 98 milliseconds!

So, what now?

Well, if you want to check out the full source files you can find them on GitHub along with the above snippets.

I absolutely love playing with Go, it is very much like creating C++ application but with the convenience of something like Python or Ruby. Project Euler is a great playground for experimenting with new languages as well, it also allows you to think about how to create a solution which is appropriate to the language (instead of making a generic solution work across all languages).

Now I just need to remember to keep on posting :-)

News of my demise has been greatly exagerated

2012-09-18T21:46:00.000+01:00

I am still here, honest!

I can't believe it's been almost a year since my last post, things have been manic around here over the last year. I started with the best of intentions after my daughter was born, but with 2 children and a new position at work consuming most of my time, it's left me little time to blog and learn new things.

I am still intending on continuing the C++11 series; it's not so new any more but features are still being added to compilers and people are still finding it so I think it's worth going on with. I'm also considering running a few posts on Google Go which I've been playing with for a while but I want to start doing more with it.

Now I just need to stay focused and get on with it :-)

C++11: Lambda Expressions

2011-12-16T08:06:00.000+00:00

Posts in this series
Getting started with C++11
C++11: Initializer lists and range-for statements

So what is a lambda expression?
A lambda expression is way in which to write a function in-line in your code, the typical use case is where you call a function which expects a pointer to another function in order to tailor it to your own needs. For example, if you had a list of integers (4, 1, 6, 2, 13) and you wanted them sorting, you would typically call a sort method, passing it your list of integers. Now that sort method may accept a pointer to another function in which you can specify how your list is to be sorted, typically this means that you would have one function defining how to sort in an ascending manner, and another defining a descending manner. Lets start with an example as to how you would currently do this.

bool SortAscending(const int x, const int y)
{
return x < y;
}
...
vector<int> myList = { 4, 1, 6, 2, 13 };
sort(myList.begin(), myList.end(), SortAscending); // 1, 2, 4, 6, 13

Full example code

So now, if we wanted to sort this in a descending manner then we would need to write a new method to tell the sort algorithm how to sort the values and pass this to the sort method.

This is a really simple function so why do we have to write a whole method for it? Well using lambda expressions we no longer have to, with the above being written as follows instead.

vector<int> myList = { 4, 1, 6, 2, 13 };
sort(myList.begin(), myList.end(), [](int x, int y) -> bool { return x < y; }); // 1, 2, 4, 6, 13

Full example code|

So what did we do here? Well lets have a look at the lambda expression.

[](int x, int y) -> bool { return x < y; }

The opening brackets "[]" is a capture list, more on this later. Next we define the expressions parameter list "(int x, int y)" just as we would if we were defining a normal method.

The next part is the return type "-> bool", this is an optional part of the expression and we could have easily left it out as the compiler can easily determine from the expression body that the return type is a bool; if the return type is not specified and the expression body does not return any value then a return type of void is deduced. It is useful, however, to specify a return type if you want to explicitly instruct the compiler what the type being returned should be.

Finally there is the expression body which is every between the "{}" braces.

Capture lists
Sometimes when we use lambda expressions we may want to use or modify values from the surrounding code. Normally we would just pass these in using a parameter, but with lambda expressions we have to provide a parameter list which the receiving method is expecting, in the case of the sort algorithm method, it is expecting an expression which takes two integers and returns a bool.

To access these surrounding values we can use capture lists. You've already seen a simple use of a capture list in the previous example where we wrote "[]", this tells the compiler that we are not capturing any values for use in the lambda expression. In this next example we are going to capture an integer variable called "stepCount" which will be incremented each time our lambda expression is used by the sort algorithm, this will tell us how many times the expression was used to completely sort our list.

int stepCount = 0;
sort(myList.begin(), myList.end(), [&](int x, int y) -> bool { ++stepCount; return x < y; }); // stepCount is 9

Full example code

The capture list this time is written as "[&]", this instructs the compiler to capture any local variables used in the lambda expression and pass-by-reference. This allows us to have the sort method update a local variable in our calling code. Another option for the capture list is "[=]" which tells the compiler to pass-by-value. If we were to use this in our code however we would get a compiler error as any values passed by value are read-only.

There are other options available for capture lists, these are as follows:

[]	Capture nothing
[&]	Capture variables by reference
[=]	Capture variables by value (make a copy)
[=,&foo]	Capture foo by reference, all other variable by value
[foo]	Only capture foo and do so by value
[this]	Capture the this pointer of the enclosing class

One thing to be careful of here however is that if you return a lambda expression from a method (we'll see how in a second) and you are using a local variable in that method and capturing by reference then you are going to run into problems as the moment you return the expression the local variable you captured as gone out of scope. Other than that hopefully you can see just how useful these are by now.

Accepting lambda expressions in my own code
One of the great new features in C++11 is the std::function type (and std::bind which I'll cover in a later post) which is a great way for us to start passing around lambda expressions as parameters or return types; in fact it allows us to use lambda expressions and functions. The structure of the type is std::function<return_type (parameter list)>, so if we wanted to write a method that accepted a lambda expression we could write the following.

double Sum(const vector<double>& values, function<double (double x)> f)
{
double result = 0.0;
for (auto d : values)
{
result += f(d);
}
return result;
}

The "f" parameter is a function which takes a single double parameter and which returns a double value, this is then used in the method body as part of the accumulation process. The method can then be used as follows.

double Complex(double x)
{
double result = sin(x * 3.14159265);
result -= floor(result);
return result;
}
int main(int argc, char** argv)
{
vector myValues = { 1.3, 2.1, 7.4, 9.6 };
double result1 = Sum(myValues, [](double x) { return x; }); // 20.4
double result2 = Sum(myValues, Complex); // 0.597887
}

Full example code

Here we're calling the "Sum" method twice, the first time with a lambda expression which simply returns the value passed into it so that we sum all of the values. The second call passes a reference to the "Complex" method, which may not be that complex but illustrates the point about being able to pass standard methods as well as lambda expressions.

Lambda expressions provide an easy and convenient method of providing short functions to other other functions which require thema. Even if you're not sure if you want to use lambda expressions, making sure that your methods use the std::function type means that you can carry on writing helper functions if you want to but that you or others have the option of using lambda expressions if they want to.

For the next post I should be looking at smart pointers, these give us the benefits of pointers in C++ but also provide better memory management, which can only be a good thing

References
CProgramming.com - Lambda Expressions in C++ - the definitive guide
Bjarne Stroustrup - C++11 FAQ

All code provided in this article is provided under a BSD license. If you spot an error then please do let me know so that we can make this better for anyone else reading it.

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

A bit quiet for a couple of weeks

2011-12-04T12:22:00.000+00:00

So I thought I'd best mention that I'm likely to be quiet for a couple of weeks as my little girl has finally decided to make an appearance into the world, she was born on the 26th November 2011 weighing 2209g (about 5lbs). We named her Elora Christine, both she and my wife are doing well :)

Baby Elora

The var keyword, or what I meant to say

2011-11-21T15:50:00.000+00:00

Whilst I'm waiting for something to compile (and for baby number 2 to turn up) I thought I'd post about my thoughts on the "var" keyword in C#.

In previous posts you might have seen that I was somewhat enthusiastic about the "auto" keyword in C++, so it might be natural to assume that I'd feel the same about the "var" keyword in C#. This is rather unfortunately not the case, allow me to expand a little as to why my feelings about the same functionality in two different languages are not strictly exactly the same.

I'll do it tomorrow, honest!
I'm sort of afraid a little that I'm going to start upsetting people here, so I shall start by saying this. I am generalizing here and not stereotyping, not everyone is the same and you may indeed be an exception to the rule, but this is generally what I find to be true.

C++ programmers tend to be less lazy than their C# counterparts. What do I mean by this? Well, normally when I'm looking at code written by a C++ programmer (in any language) it tends to be easier to read and maintain. It's because C++ can be painful enough without having to add extra complexity or obfuscation. Code written by C# programmers tends to be a little lazier, things need tidying up here, stuff is left lying around over there and it generally has that "I'll do it tomorrow" kind of feel to it.

Now don't get me wrong, there is a lot of very nicely written C# code out there, but I tend to find that the people who have written it come from a C/C++ background or have a lot of experience with those or similar languages.

So why should this matter? Well, when I think of C++ programmers using the "auto" keyword I tend to think of code coming out looking like this:

map<int, vector<string>> MyFunction() { ... }
void SomeOtherFunction()
{
auto result = MyFunction();
}

Which is easy to follow, I know when I look in the "SomeOtherFunction" code that I just need to find the "MyFunction" method to see what the type will be (or use the functionality of the IDE), and importantly I know what the code is trying to do without looking this information up. When I think of C# developers using the "var" keyword then I tend to think of code coming out looking like this (and I have seen this):

void MyFunction()
{
var a = 1;
var b = 2;
var c = "Something";
...
var x = a + b;
}

Which, okay, is readable and I can make out what is happening but I no longer have clue about the intent of the code; is "a" meant to be a short, an int or a long, maybe it should have been a double? We could have put some modifiers in there, but that's still not as easy to read. I just know that if a C++ programmer had written it that we'd have some types in there and the intent would become obvious. And this isn't just me worrying about something that probably wont happen, I've seen numerous people write code in this way.

What I meant was...
The thing is that the intent of the code is about as important as the code itself. If I say that a variable is a 64bit integer then it means that I'm expecting some pretty big values in there, similarly if I proclaimed it to be a 16bit integer then I'm expecting very small values. This kind of information can be invaluable to a maintainer, who might not be some unknown person looking at the code 5 years after you've written it, it might be you after you've spent 2 weeks on a different project and can't quite remember why you wrote something a specific way.

So is "var" a good thing? Well I would say it is, but like most things it should be used responsibly and never at the cost of losing the intent of the what you are trying to write. If you're not sure about it, then talk to someone about it, or write the code the way you want and give it to someone who hasn't seen it and ask them if they know what it's trying to do. If they pull a face then change it, if they know what the intent of the code is without asking too many (what you would consider) obvious questions then it's good to go.

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Living in a "dynamic" C# world

2011-11-14T17:23:00.001+00:00

Taking a brief break from the C++11 posts (I'm working on the next one, I promise), I thought I'd quickly cover a small problem I came up against in C# and how something I'd previously dismissed really helped me out. If you want to try out any of the code below I'd strongly recommend looking at LinqPad which is a great tool for trying out sample code, expressions and for querying your databases using Linq.

I can't remember the number of times I've looked at the new "dynamic" keyword and thought of it as ugly, and I will admit that maybe I've not been it's greatest advocate. Recently however I went on a training course during which we spent some time calling IronPython scripts from C#, so I could see a use for dynamic, but not so much outside of this use-case.

Today however I encountered a problem and the dynamic keyword came to my rescue. The problem was this; I'm loading in data from an XML document (and no, XML is not my problem), this document has a number of sections which identify how to check something from another document, so it might have an entry which says "You're expecting a value in a field called 'x' of type 'y' and I want to check it like this...". So as an example, say I'm picking up a value which is a double precision value, and I want to check it against another value of the same type but using a tolerance. So if 's' is my source value, 'x' is my expected value and 't' is my delta then I would want to check it using the following:

// |s - x| < t
var s = 1.0005;
var x = 1.0004;
var t = 1.0001;

return Math.Abs(s - x) < t;

Great, but here's the problem, when I'm writing the code the function first needs to check the type and convert it from a string value to the correct type, which I only know about because the type is held in another variable. Again, not too tricky as I can just write the following (where "type" is a Type variable holding the type I need to use):

var convertedValue = Convert.ChangeType(s, type);

The compiler has no problem with this and lets me carry on my merry way, but when I add the following line the compiler starts to shout and tells me I'm an idiot for even attempting to apply an operand of "-" to a type of "object" and "object"!

var sourceValue = "1.0005";
var expectedValue = "1.0004";
var tolerance = 1.0001;
var type = typeof(double);

var convertedSource = Convert.ChangeType(sourceValue, type);
var convertedExpected = Convert.ChangeType(expectedValue, type);
var result = Math.Abs(convertedSource - convertedExpected) < tolerance;

Console.WriteLine(result);

The thing is, I know that my converted values are doubles but I need to tell the compiler that I know what I'm doing here and it can compile this. Well this is where "dynamic" comes to save the day, it allows me to bypass compile-time type checking and instead have this checked at run-time. So changing the code to the following:

var sourceValue = "1.0005";
var expectedValue = "1.0004";
var tolerance = 1.0001;
var type = typeof(double);

dynamic convertedSource = Convert.ChangeType(sourceValue, type);
dynamic convertedExpected = Convert.ChangeType(expectedValue, type);
var result = Math.Abs(convertedSource - convertedExpected) < tolerance;

Console.WriteLine(result);

I get the expected result of "True" when I run the code.

I know there are probably other ways of doing this, and the example code I've presented doesn't exactly portray the complexity I was attempting to deal with, but I do think it's quite a nice little solution. Hopefully after reading this you might also re-consider looking at the "dynamic" keyword, you never know when you might have a genuine use for it.

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

C++11: Initializer lists and range-for statements

2011-11-10T02:48:00.000+00:00

In my previous post I wrote about the auto keyword, using it as a return type and the decltype operator. Hopefully you've had a chance to use these and hopefully you've been finding them incredibly useful. I said that in my next post I would look at initializer lists and range-for statements, so let's get stuck in.

Initializer Lists
Perhaps one of the most annoying things I tend to have to do is create an array or vector and initialize it with some known values, if it's held in configuration then it's not too bad but I still end up having to do it sometimes. Previously if you've wanted to just use an array this has been fairly trivial, we'd just write the following:

int a[] = { 1, 2, 3 };

But if we wanted to use a vector then we'd end up with 4 lines of code to do the same job:

vector<int> a;
a.push_back(1);
a.push_back(2);
a.push_back(3);

Which isn't particularly nice to write and can lead to RSI related injuries. But now functions (including constructors which are referred to as an initializer-list constructor) can accept a {} list by accepting an argument with the type std::initializer_list<T>. This has been pushed into the STL so our favourite containers should now accept a {} list for initialization.

vector<int> a = { 1, 2, 3 };

map<int, vector<string>> c({
{1, { "Ignoring", "The", "Voices" } },
{2, { "In", "My", "Head" } }
});

Doesn't that just look a lot better, and it's certainly easier to type. The nice thing about this new type is that it means we can write our own functions which take initializer lists, whether we're creating our own container class or just writing a function which can accept a {} list of values.

template<class T> void MyFunction(initializer_list<T> values)
{
cout << "Number of items in initializer list: " << values.size() << endl;
for (auto i = begin(values); i < end(values); ++i)
{
cout << *i << " ";
}
cout << endl;
}

This method then works by simply calling it in the following manner:

MyFunction<int>({ 1, 2, 3 });

Now you may have noticed something different with the for loop in that method, instead of using "values.begin()" and "values.end()" it's using "begin(values)" and "end(values)". These are two stand-alone methods which return iterators to the beginning and end of the of the collection; the nice thing about these methods is that they work on any structure which works in a similar way to STL iterators (i.e. implements operator++, operator!= and operator*), which means that they won't work on dynamic arrays.

Full example Code

Range-For Statements

If you're use to working in languages such as C# or Python then the chances are you're use to seeing statements like these:

C#: foreach (int i in my_list) { ... }
Python: for i in my_list: ...

These are statements which provide a simple syntax for working with each item in an iterable structure. To perform something similar in C++ we would write something more like this:

for (vector<int>::iterator it = my_list.begin(); it != my_list.end(); ++it) { ... }

Which works and it does what we want it to, but secretly we've been looking over the shoulders of the C#, Java etc... developers and coveting their range loops. Well not any more, now we too have a range loop which works on any iterable structure (i.e. anything you can iterate through like an STL-sequence defined by a begin() and end(), [1]), including initializer lists.

for (auto i : my_list) { ... }

So to give a more complete example, and using what we covered earlier we can do the following:

vector<string> a = { "Ignoring", "The", "Voices" };
for (const auto s : a)
{
cout << s << endl;
}

Full example code

Which just looks a whole lot different from the following which we would have needed to write before hand to accomplish the same thing.

vector<string> a;
a.push_back("Ignoring");
a.push_back("The");
a.push_back("Voices");

for (vector<string>::const_iterator it = a.begin(); it != a.end(); ++it)
{
cout << *it << endl;
}

The next post I'm planning on doing is about lambda expressions, these are another fantastic language feature which I use a lot in other languages such as C# and Python so I'm glad that they've finally made their way into C++ as well. As it's a fairly sizable topic by itself I'll probably just a do a single post on those and then single posts for other features as well. I think that the items I've covered in this post and my last are really the easiest to get going with and which have a fairly large impact on the code we write daily.

References
[1] Bjarne Stroustrup C++11 FAQ

All code provided in this article is provided under a BSD license. If you spot an error then please do let me know so that we can make this better for anyone else reading it.

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Getting started with C++11

2011-11-06T21:53:00.000+00:00

Wow, has it really been this long since my last post? It's been a mad few months getting the bathroom and kitchen finished off, dealing with a young child and now baby number 2 only a couple of weeks away. With all of this going off I've somewhat neglected this blog and it's about time I started putting some articles up.

So I thought I'd try to get back into the swing of things by writing a few brief entries on getting going with some of the new features of the recently C++11 standard. I've been keeping an eye on the process over the last year and I'm excited by the new features available to us being introduced in newer versions of the compilers. All of the code I'll be putting up has been written and compiled on a laptop running Ubuntu 11.10 (Oneiric Ocelot) which has GCC 4.6.1 in the repositories (this has a list of the C++11 features available at the various releases), most things are coming along nicely but concurrency is still a way off.

Auto's

Quite possibly the most useful new feature in day-to-day use is the new "auto" keyword - if you know C# then this can be compared to the "var" keyword (which is not the same as the VB variant type) - which can be used in place of specifying a variable type where the type can be inferred by the compiler at the point of declaration. So instead of typing:

int x = 42;

You can instead use:

auto x = 42;

This means that the compiler will infer x as an integer, after this point x will always be an integer (in the current scope). This is most likely not something you will do day-to-day (personally I won't be likely to) but then this isn't where it's use shines through, lets instead look at another example:

std::vector<std::string> my_collection;
my_collection.push_back("Hello");
my_collection.push_back("World");

for (std::vector<std::string>::iterator it = my_collection.begin(); it != my_collection.end(); ++it)
{
cout << *it << endl;
}

So, a simple collection that we then iterate over and write the value out to the console. So where can the "auto" keyword help here? Well that for loop is looking pretty doesn't it? Wouldn't it be nice if there was some way we could tidy it up a little, maybe get it looking a little more like this:

for (auto it = my_collection.begin(); it != my_collection.end(); ++it)
{
cout << *it << endl;
}

Full example code

And guess what, we can (yay!). This is because when we declare "it" the compiler can infer it's type so we don't have to clutter up our code specifying the type when we already know what it is. There is actually a few more things we can do to this example to make it even easier to read with new features but they'll come later.

As a Return Type

Yep, we can use the "auto" keyword in place of a return type as well, how does this work though as we're not specifying a variable, so how do we infer type? Well we can now specify the return type at the end of the function declaration, so instead of:

int Sum(int x, int y) { ... }

We can instead use the "auto" keyword and specify the type at the end:

auto Sum(int x, int y) -> int { ... }

Which doesn't look much better does it? Well again this isn't really the intended use of the syntax, but if this isn't then what is? Well one place is where the type being returned is not known to the compiler at the point of definition. Consider the following snippet from a header file:

class Test
{
public:
enum TestEnum { One, Two, Three };
void SetField(TestEnum t);
TestEnum GetField();
private:
TestEnum _field;
};

Implementing the setter is easy in the source file, we just write the following:

void Test::SetField(TestEnum t) { ... }

And for the getter we just write this:

TestEnum Test::GetField() { return _field; }

Dont we? Well, no actually. The compiler will return an error as the return type TestEnum is not known to the compiler at the point where we define the return type, to get this to work we would need to do the following:

Test::TestEnum Test::GetField() { return _field; }

Alternatively, using the "auto" keyword as the return type and using the new return type syntax we could type the following instead:

auto Test::GetField() -> TestEnum { return _field; }

Full example code

This works because the compiler knows about TestEnum at the point where we now define the return type. Still this doesn't look like it provides much benefit, but it will when we introduce the final new piece of syntax for this post.

decltype

This is an operator which is used to determine the type of an expression or variable so you can create a variable based on that type, like this:

int x = 3;
decltype(x) y = 5; // same as int y = 5
decltype(x - y) z = 7; // same as int z = 7

So far so good but again it doesn't look like it's bringing much to the party. So what if we do the following instead:

std::map<int, std::vector<std::string>> MyFunction() { ... }
auto MakeCollection() -> decltype( MyFunction() )
{
auto val = MyFunction();
return val;
}

Full code example

Take a second, read it again, now think of all those poor keys on your keyboard, don't they deserve a break? At this point the use of decltype and the new return type syntax and the new auto keyword all should hopefully look really useful and the kind of things you might want to start using a bit more frequently, they did for me when I first figured it out. The whole thing looks even more appealing when you start considering templated functions as well when sometimes the return type can be more difficult if not impossible to figure out. Also you are reading that right, I did write ">>" in there, the new specification treats this the way we read it which makes a lot more sense thankfully.

Just for completeness, here's the above snippet of code written using the more traditional syntax:

std::map<int, std::vector<std::string> > MyFunction() { ... }
std::map<int, std::vector<std::string> > MakeCollection()
{
std::map<int, std::vector<std::string> > val = MyFunction();
return val;
}

Anyone who says that last snippet is easier to read is either lying or wants their head examining, it's bad enough typing it! So go on and give these new features a try, if you're not wanting to use them most of the time after a week I'll be shocked.

I'm planning my next post of this type to be about initializer lists and for-range loops, after which I will hopefully look at lamda expressions and smart pointers. The items discussed above and the ones coming up - I feel - are the first things which makes C++ based on the new C++11 standard feel like a modern programming language and, hopefully, keeps new and experiences programmers coming back to it for years to come.

References
Wikipedia - C++11
CProgramming.com - C++11 articles
Bjarne Stroustrup - C++11 FAQ

All code provided in this article is provided under a BSD license. If you spot an error then please do let me know so that we can make this better for anyone else reading it.

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Thoughts on open-source development methods (and congratulations to Ubuntu)

2011-04-29T21:34:00.000+01:00

First of all I want to say congratulations and thank you to all of the people at Canonical and in the Ubuntu community for all of their hard work in getting out the latest release of the Ubuntu Linux distribution. Despite all of the criticism the distribution comes under from time-to-time these people strive hard to put out a new version every 6 months.

When you think about their achievement you soon come to realise it's no small feat that they have pulled off, bringing together the best-of-breed free and open-source applications into an easy to install and use distribution readily available to the entire world for free. It becomes even more amazing when you consider that the people working on the projects are spread all over the world, even the teams working on a single feature might be spread across multiple countries perhaps only meeting in person a few times a year. The same is true of many free and open-source projects where there may be hundreds, if not thousands, of contributors to a single project spread around the globe. And yet the projects comes together and produce incredible results, such as the Linux kernel itself which powers so many of the devices and much of the infrastructure we use each day without realising, the chances are that you have at least one gadget in your house powered by the Linux kernel.

The main reason I find all of this amazing and why I felt compelled to blog about it is because, following the very recent release of Ubuntu 11.04 (Natty Narwhal), is because of this geographic spread of developers, document writers, testers, packagers etc... and my own experiences of working in teams within a corporate environment. I have worked in a number of places now where people have found it difficult (if not impossible) to work with people who are not sat immediately within the same vicinity as them. Where projects have been delayed and delivered late because people have had difficulties in working across time zones, and in a few instances where people have been in different offices in the same building. So I suppose I'm curious as to why people working with free software all over the globe can meet these 6 monthly deadlines with amazing frequency and yet companies with money to throw at the same or similar problem have so much trouble!

I have seen a few problems in corporate environments which are often quickly overcome by open-source companies and communities, but I'm sure that these are not the only problems.

The first is often something as simple as choice of version control system. Whereas the recent adoption of distributed systems such as Git and Bazaar in the open-source world has allowed people to work within a project more efficiently companies I have worked in seem to stick to and prefer the older check-out, check-in systems such as SourceSafe. This type of system, although easier to understand by inexperienced developers, is often slower and imposes bottle necks on development process, whereas using a distributed system allows users to work remotely without requiring constant access to a centralised server meaning that the developer only requires access to main branch when retrieving a revision and finally merging changes.

A second problem I've often seen is one of communication. Often in corporate environments communications in teams is limited to office chat, email or meetings, the problem here is that chat often excludes a large number of the team, email is limited to the people on the To and CC list and meetings are more often restricted to a single geographical location. These factors typically lead to large numbers of team members becoming excluded from conversations and vital information, some companies try to limit this by creating procedures for disseminating information but these are not always followed (lets be honest, most of us despise more procedure!). In open-source conversations take place in the open, normally using social systems such as blogs, microblogging, mailing lists, wikis and internet chat, other systems such as mumble are also being adopted for having open meetings over the internet where anyone can join in. Typically a project will let contributors know which are the preferred methods for keeping up to date with project information and developments and which channels are preferred for informal chats.

Whilst I do not think that open-source development methods are perfect, I do believe that there is a lot companies can learn from them if they are willing to break away from traditional models. Perhaps if they do then maybe they to can hit deadlines repeatedly and successfully in the same way many open source projects do.

Catching Up #1

2011-03-24T22:23:00.000+00:00

Well it's been a fun couple of weeks so I thought I'd quickly jot down what's been happening here and what I've been getting up to.

Last week I started working on the kitchen, well I really started the week before but that was only a small amount of preparation. Last week the gloves came off and we managed to get most of the kitchen out, walls and all. Currently we still have a sink and the free-standing gas oven left in but that's it, so we're mostly living out of the back room and plastic boxes. Tyler thinks it's great as all the fun toys like the measuring jugs which were behind locked doors are now in reach. He's actually been a very good boy whilst we've been decorating and unusually for him has been happy to sit and watch!

There have been a few fun moments such as blowing the upstairs lighting circuit fuse after we found out that the previous occupants had wired in a 13 amp socket to it! Cables barely below the surface of the walls, plaster falling off with the wallpaper and we even found the old door from the front room to the kitchen which hadn't been covered up properly. That's all sorted now and the plasterers have done a good job in levelling out the two problem walls. So now all we need to do is:

Fit the new units on one side
Get rid of the sink unit on the other side and install the new units there
Replace the boxing
Buy and fit new appliances
Replace the lighting
Decorate

Sounds like a lot but now the room is looking better as a shell it seems doable.

When I've had a few moments and not been reading (love my Kindle by the way, post coming soon) I've been trying out the Vala (and here) programming language. It's syntax is very close to C# but instead of compiling to assembler or another intermediate language it compiles to C and is then compiled with the platforms standard C compiler, so you get the bonus of not having to worry so much about memory management and benefit from a more modern programming syntax but you also get the performance benefits of a natively compiled C application.

Hopefully when I've spent a bit more time with it I'll be able to do another post about it. In the mean time if you want to see what it's capable of I'd strongly recommend checking out applications like Shotwell which is a photo manager for Gnome which is written in Vala. It's very cool and is coming on very quickly.

Other than that it's been pretty much the same, but I'm hoping to try and post a bit more frequently here so keep checking back to see what else is going on. Alternatively subscribe to the feed and keep up to date from the comfort of your favourite news aggregator, personally I'm a fan of Google Reader but that's because it fits nicely with my Android phone.

Staying in the game

2011-03-08T20:07:00.000+00:00

One of my biggest concerns as a developer (and I have quite a few) is being able to stay relevant. This doesn't necessarily mean making myself the only go-to-guy for a project or getting upset when I'm not invited to meetings but staying relevant as an experienced developer.

So to stay relevant what do I need to do?

Well the first thing is to keep my existing skill-set up-to-date, calling myself a .NET developer is all well and good but if I only know about version 1.1 and not 2, 3, 3.5 and 4 then how useful am I and how am I able to help influence technical direction if I don't know about what's new!

Next is a harsh one, but necessary. Know when to move on from an existing skill. I know we all feel comfortable with what we know but if you're a C++ programmer and there are no C++ based programmes left to maintain or write then should you spend as much - if any - time investing in those skills. I'm not saying forget about them, and from time-to-time it's nice to come back and brush up a little but sticking with it as a core skill means you'll be slowly phased out like the programmes you maintain.

Try to keep up-to-date with new theories and practices. Some times people do re-invent the wheel and sometimes it's a good thing, maybe it's a new design pattern or a new way of looking at threading; but knowing about these can help make you a better programmer.

Keep your eye on the horizon. Sounds a little managerial I know but looking at what's coming up is really useful as it will help to figure out where you should be spending your time. Maybe looking at a new language instead of an entrenched one will help with a new product or problem that you know is coming up, or maybe it might just be more fun.

Enjoy what you do. Sounds obvious but you go to work every day and churn out code without really enjoying it you wont have the motivation to spend the time learning new things and before you know it you're out of touch and out of date. This can be a tricky one though, if the project you're on isn't that interesting then how can you stay enthused about it? Well look for little things around the project you can do in your spare time to make it more interesting, such as writing a little app to make a repetative task more efficient. Contribute to an open-source project you like or just write a little app for yourself, some of the best applications have been written to scratch your own itch.

The last thing is a tricky one for some people but here goes. You DO NOT know it all, you might have at some point but things move on, and quickly so you will need to as well. But there are people at the other end of this scale and to those people I say you DO know something, there is no such thing as a perpetual noob, every day you learn something you're more experienced than the day before. Look at it this way as well, even people who write programming languages don't know every little aspect of it as other people contribute ideas and write libraries and frameworks and they don't know how all of them work!

The advantage of not being connected

2010-10-18T22:26:00.000+01:00

There are normally a few articles published each week on version control, an odd "How-To" guide, a tale of how a version control system saved someone numerous hours worth of work or the odd rant. Over the last few weeks however there seems to be a little more background noise when it comes to version control, more grumblings on the internet and a lot more where I work.

With my current employer we have numerous projects being worked on by teams internationally, but we don't seem to have any real consensus on which tool set to use. This is fine for some of the work as we have client apps and web apps written in Java/.Net/C++, but a common version control system would be nice. I've so far worked on applications where the code has been commited to CVS, SourceSafe and Subversion repositories, we also have Team Foundation Server (TFS) and a couple of others which I rarely remember around somewhere as well. I know the people working on CVS complain that it's slow, SourceSafe users complain that it's ancient and unfit for purpose, the TFS guys complain that the system is unusable and the Subversion people actually tend to be quite happy.

There was one occasion however where non of them were happy and that was the day a virus hit the servers. The infection itself wasn't that serious but the downtime was crippling while the network monkeys checked the servers and slowly bought them all back online. Most people were able to continue working for a short while until they needed to check bits out or check them back in. So I got a few odd looks when I was sat at my desk working away on some code, performing regular commits and relatively oblivious to the world around me.

As it happened I was in a lucky situation, first of all I was working on a new piece of code so had little reliance on others and secondly a few weeks earlier I had added another version control system into my tool set. I had installed Bazaar and had worked it into my processes so that when I started a new piece of work I would use a local Bazaar repository, I would push this up onto a remote location every few commits and then when I was ready for everyone else to get their hands on it I would export it to a new location and commit it to the standard (which ever one it was that day) version control repository.

A number of times since it has proven invaluable to have a local repository, for instance I have a folder where I keep all my snippets and tests, these are all version controlled using Bazaar and again I push them to a remote location every once in a while. Sometimes I will remove folders and files just to keep it tidy and relevant but on the odd occasion I will need to go back in time to some folder where I had worked on something which I would be in need of again. If I've deleted it then I can restore it, get the information I need and if I feel the need delete it again.

Not having to rely on a central repository has been a life-saver on numerous occasions, a few people have also gotten wind of what I'm doing and are investigating switching newer projects over to a distributed version control system to avoid the kind of downtime I mentioned earlier. Naturally there are a few doubters, some of whom still aren't convinced by version control itself, but they're the ones who'll be losing their hear quicker when it all goes wrong.

Belated Birthday Wishes C++

2010-10-16T22:18:00.000+01:00

So 25 years ago, on October 14th 1985, a little known programming language called C++ was released to the world. Since then it has been the cause of many arguments, mostly around complexity, efficiency and the differences between procedural and object-orientated languages. Say what you will but for a language to still be in as much use as it is now (and currently at number 3 in the tiobe index) is for me at least incredible. Admittedly the fact that 'C' is still in wide spread use since it's first appearance back in 1972 is simply astounding but I don't think that should detract from the C++ success.

So when C++ was released commercially I was 5 years old, playing with friends with much fewer cares in the world. I might have been aware of computers but possibly didn't care as much, I do remember that doors opening by themselves was kind of amazing still! And here 25 years later I'm using a programming language which is almost as old as I am day in and day out to earn my living. Sure it may not be as pretty as some languages, or as expressive, you have to consider memory and resources and doing some basic stuff like sending data over the internet is a lot harder work, but I quite like it. It's a challenge and in these days of automatic garbage collection considering memory and resources is quite refreshing and it makes you consider how wasteful some modern languages and frameworks are.

So happy 25th birthday C++, here's looking forward to the next 25.

Living online

2010-09-08T23:20:00.000+01:00

I have a blog, that at least is self evident. I'm on twitter (rarely), I have an account on identi.ca (used most of the time), I spend time on Facebook, I have Google Reader account pulling in about 40 news feeds. My phone is hooked up to most of these accounts so I can access information on the go (it even pulls up the weather and some stock prices when my alarm goes off so I can see what the day holds). I write documents online using Google Docs, my email is provided by Google as is my calendar. I have a paid for Ubuntu One account where most of my photos and all of my important documents are stored, and I'm thinking of paying for more space with Picasa so I can put more photos up there as well. Oh, and my bookmarks are automatically synced between all my devices via my browser.

Writing it all out like that it suddenly becomes very clear how much I rely on the internet and good internet access. I could have a cheaper home ADSL account but I pay a little more for cable so I can get the speed; but that's important to me as having "up to" 4Mb is useless, I don't want to leave my laptop on all weekend so I can upload the photos that I've taken during the week to my on-line storage account. Likewise I want the download speed so that my other devices can pull down new files quicker.

Put simply, my entire life is in the hands of corporations with my data stored on their anonymous servers distributed across the world. I'm not really going anywhere with this train of thought, I'm not one who thinks that all these services should be open - realistically no individual is going to have the resources to match something like Amazons Cloud Services - and I'm not scaremongering you into thinking that these faceless corporations are out to own us all, all though I'm sure one-or-two of them are. Rather I'm looking at this situation in astonishment.

When I first got on-line the biggest problem I had was how to carry around 20 floppy discs safely in my bag without bending or breaking them. The future then was how we would have devices so we could carry our digital lives around in our pockets. Back when I was 16 if you had told me that instead of having storage in my pocket I would have a device to access my on-line virtual storage I probably would have given you a sympathetic look and asked if you wanted to sit down for a while.

My 6-month old son is now living in a world where he's had an on-line presence since he was 40 minutes old, since he was born mobile devices have gotten faster, new video codecs have arrived looking to replace flash movies and self-healing solar cells have been demonstrated. It makes me wonder if, when he's old enough to start using a computer, telling him about storing data on your computer (or indeed 20 floppy discs in your school bag) might illicit the same response I would have given when I was 16 and on-line storage meant having your own website.

New Phone

2010-09-03T22:25:00.000+01:00

After spending a year with my HTC Magic I finally decided it was time for an upgrade (or rather that I was finally able to), so I upgraded to the Samsung Galaxy S. Initially I was hoping to go for the Google Nexus One but contract prices weren't quite in the right ball-park for me, but after seeing the Samsung in use with someone who I work with I figured I'd take a shot.

Initially I was concerned that I would find the phone to big for practical use but in reality it's a good size and feels nice to hold. I've been using my phone as a e-book reader since I first got my HTC Magic and the larger screen on the Samsung makes reading them much easier.

One thing I was eager to try was the camera. I've always been disappointed by camera phones, although the quality of the images are sometime fairly good the shutter response speed has always been a huge disappointment. So I headed off to Markeaton park with the family and with a little trepidation fired up the camera app and started snapping. The camera app doesn't review the pictures after each shot so I took a few before checking and I was pleasantly surprised.

The quality of the photos on the Super AMOLED screen were more than impressive and then shutter response was really quick, still not quite as good as a real camera but close enough that I wouldn't be worried about using my phone as a camera when I'm out and about. One very small gripe here though is that even though the phone supports multi-touch the camera app uses the phones volume buttons to manage the zoom functionality. Where the camera does redeem itself though is the touch-focus function, so after framing the shot you can just tap the part of the scene you want to focus on and the phone does the rest.

So next I moved on to the video, this was also very impressive for a phone and the quality during recording and playback was as good as the photo's I think.

In terms of general use I'm very impressed with the phone, the battery may not lost to long but I've come to expect that from modern smart-phones and I've gotten into the habit of charging while I'm at work. One trick I did find was that turning of automatic syncing overnight means that the battery is barely touched overnight so I don't have to worry about the alarm not going off (besides I have my 6 month old alarm to wake me up anyway). It would be nice to be able to remove some of the pre-installed apps but with 2Gb of internal storage I'm not to concerned.

So this far and so good, I'm impressed with the phone and as always I'm more than impressed with the Android operating system, I'm not sure when Froyo is coming to the device but I'm looking forward to seeing what it brings.

Getting going

2010-08-24T22:37:00.000+01:00

So this is the first post to my new blog. I figured that as I generally have less time to code in my spare time these days that I'd move my blog to one which is a little less focused on coding and is more just about me and what I'm up to. I'll still post code bits from time to time but they will be mixed in with posts about life, family and anything else I find amusing, entertaining or just plain weird!

Anyway, I was looking out of my bedroom window last night and the sky looked amazing so I took the opportunity to take a couple of photos and play around with shutter speeds and exposure settings. I quite like how it turned out, although I'm sure in a years time I'll be thinking I should have changed something else or shot in RAW format! But for now he's a sample of my efforts, enjoy.

From Random