Monday 30 January 2017

Azure Security and Automation

Recently I've been doing a lot of work without automating some of my more mundane daily tasks in Azure so that I can free up time to work with the cool stuff (don't we all really just want to play with the shiny stuff). Having not blogged anything for a while and finding that online examples of pulling some of these things together wasn't as clear as it could be I thought I'd write something up.

All of these examples are something that you can run from an Azure Automation account, before you do though you will need to update the Azure and AzureRM modules to the latest version. I don't know why they aren't already at the latest version and it's a pain to do them in the right order but it's worth it in the long run.

So what kinds of things have I been automating? Well most of it is enabling security related features or changing settings which are less secure than they could be.

Please note that I don't list anything here as a silver bullet to prevent attacks, if someone is determined to get in then they most likely eventually will do (if they haven't already). You can make their lives more difficult though, hopefully persuading them to move onto easier targets or minimising the impact of a successful attack. A common mantra you will hear often now is "assume breach", put simply you must assume that your environment is already compromised, now how do you manage things in a way minimises impact and reduces the time between detection and resolution.

Also, I'm aware that most of the Powershell here could be improved on and/or simplified. Whilst I've been writing these my style has changed and I tend to write them out in longer form so that they're easier to follow. Feel free to take the code and re-arrange and modify as much as you want, no attribution required (but always welcome).

Virtual Machines


If you create a VM from the market place into a new resource group (I work almost exclusively in the new ARM portal) then along with the VM itself you'll get a virtual network with at least one subnet, a storage account and a network security group with an RDP rule in place (assuming it's a Windows based VM). So what's wrong here? Well, lets have a look.

RDP Rule


That RDP rule in the Network Security Group is an Any-Any rule on a standard RDP port (3389), this makes it incredibly easy for anyone with a relatively simple script to scan a large range of ports and see if anything is listening. From this they can then launch a brute-force attack (other types of attack are available) and if you've not used a particularly good password along with an obvious username then it won't take long for an attacker to gain access. Once in if you have a number of VMs on the same virtual network with the same usernames/passwords then traversal becomes fairly trivial and it's game over.

But hope there is, if changes you make. The most obvious changes you can make when setting up the VM are:
  • Make sure that you use strong passwords and don't use the same password everywhere
  • Don't use obvious usernames (e.g. admin)
Following that you can also modify your RDP rule so that RDP access is whitelisted (if it's needed at all), you can do this using with CIDR blocks and is pretty trivial. If you're a subscription administrator then you can also look for any wide open RDP rules and disable them.

If you're proficient with Desired State Configuration you could also look at changing the RDP port to a non-standard port. Whilst this isn't a fix it will stop a large number of "lazy" scans where attackers are just looking for the standard ports.


Storage Account


Storage accounts now support encryption services for blob storage across all regions. Whilst this might not be important to you personally some organisations are pretty insistent on using it to ensure compliance with their own requirements or those of their customers. Given how simple it is to enable it's worth getting use to working with it and switching it on by default.

Ideally you should create your storage account before you create your VM, this is because only data added to the storage account after encryption is enabled will be encrypted, any existing data will remain unsecured. So if your storage account is created as part of the VM provisioning then the VHD files will not be encrypted.


Virtual Machine drives

The final thing is that most market place images do not support BitLocker or Crypt drive encryption as part of their standard provisioning. This is useful to have in place as if an attacker does gain access to the storage account hosting the VHD files they could just download them and then browse through them at their own leisure, if it's encrypted then this becomes more difficult. I won't cover here how to do this as Microsoft's own documentation is already pretty good and it involves a few more steps then simply running a PowerShell Cmdlet.

SQL Servers and Databases


This might not come as a surprise but Microsoft are actually pretty good at managing their own infrastructure, because they do this well and at scale in Azure a lot of people are realising that actually leaving them to get on with it and utilising the services they provide on top of this infrastructure is a better option. SQL Servers are a great example of this, why should I have to bother with managing OS upgrades, security patches and version upgrades if someone else who knows this stuff inside out can do it for me?

This doesn't make the service fool-proof and there are still ways to improve on it. Azure SQL offers a couple of features which can beef up security.

Transparent Data Encryption


A lot like encryption services for blob storage this may or may not be something you want to implement, but again a number of organisations have an "encryption at rest" requirement which this feature addresses. Again, because it is so trivial to implement it's worth getting use to enabling this by default. Unlike storage accounts however, enabling this feature will encrypt all existing data.


Auditing and Threat Detection


Capturing your audit events to blob storage is a fairly obvious thing to want to do, if something does happen you'll want to know when, how and what. Unfortunately this hasn't been rolled out to all regions at the time of writing, for example the UK regions are still missing this feature.

Threat detection is a number of threat types which can monitor for and can email the subscription co-admins along with any other number of recipients alerting them when a threat is identified, such as SQL injection attacks.

Setting this is up is not tricky but is a little more involved than the transparent data encryption setting. However this is the kind of thing that will let you capture an attack earlier and so it's worth enabling. Note that this script makes use of Automation variables which will need to be created and configured to ensure that the script runs correctly (i.e. doesn't break)


Security Center


This should be something you have open pretty much at all times, it should be regularly monitored and actions taken from it. Everything I have outlined above is an item which is monitored and reported on by Security Center. Some issues such as transparent data encryption and deployment of end-point protection can be rolled out directly from the Security Center blade make a few of the issues incredibly simple to resolve. Also reported on from here are threat issues identified such as malware being identified on VMs, brute force RDP attacks etc... These are detailed with a priority, a description of what was detected, the resource being attacked and often steps for remediation.

Before showing everyone how great it is though it's worth preparing them for it, sometimes the amount of information can be overwhelming to which people may respond negatively, this is often when the "Azure is too insecure" arguments can start. A lot of organisations would have nothing close to this in their on-premise environments and so have the opposite view that because they can't see this information it must be more secure (ignorance is bliss right?).

As with most services in Azure Security Center is constantly being improved upon with new features being delivered often. I'm already pretty sure this entire post will be out-of-date in about a few months if not sooner!

Friday 29 April 2016

SonarQube and LetsEncrypt

Recently I've completed moving our "temporary" SonarQube to something which is a bit more production ready.  This pretty much looks like a Windows server hosted in Azure, backed by an Azure SQL database with a reverse proxy in front of it so that we can enable HTTPS.  Migrating from the old server was relatively painless although if you're about to do the same I'd suggest looking at the SQL Database Migration Wizard available on CodePlex to move the database.

Rather than sticking messing around with multiple Azure provisioned names we decided to purchase a domain name for our internal development systems.  Again this was really easy to do through the Azure App Service blade and within a couple of minutes we had the name and I'd setup the A record for our SonarQube server.  A quick test proved that I could now access the system over HTTPS using our new domain name.  Just one problem, it was using a self-signed certificate!

After looking around I decided to try out LetsEncrypt.org to get a certificate.  The biggest problem with this is that the tools they provide to get a certificate don't work on Windows.  Fortunately there's a pretty good Windows utility written by a community member which works brilliantly on Windows called letsencrypt-win-simple.

To run this tool I had to temporarily disable the reverse proxy rules I'd created in IIS, make sure that the site was backed by a folder on the local drive and open up HTTP access (by binding the appropriate port in IIS and enabling access to the server of HTTP in the Azure resource groups Network Security Group), once I had this I could run the tool from the command line.  Other than a couple of prompts for me such as selecting the correct website it had auto-discovered the process was completely automated and in about 15 seconds I had a valid SSL certificate installed and configured against the correct binding, the self-signed certificate had been replaced with the new valid certificate.  After this it was a trivial task to then reset the changes I'd made previously such as re-enabling the reverse proxy rules, removing the HTTP binding and removing the rule in the Network Security Group.

So, SonarQube was up and running with all of my old data migrated, user accounts were set up and working and I could log in over an HTTPS connection.  A quick change of settings in Visual Studio Team Services and the end-point was now pointing at the new server as well.

But then...

I ran a quick test build which ran analysis using the SonarQube quality profiles and it failed!

The error in the build output was pretty long but buried towards the bottom of it was this little gem of an error message.
sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
So the Java process doesn't like the certificate?  It came as a bit of a surprise as I'd tested the site out in a number of browsers and none of them had reported a certificate problem.  A quick Google (other search engines are available) and I came across this thread on the LetsEncrypt community boards.  It turns out that LetsEncrypt along with a number of others such as StartSSL are not included in the out-of-box Java client trust store.  In the thread someone has posted a quick URL reader class you can use to prove the point by pointing a request at https://helloworld.letsencrypt.org and naturally this fails with the same error.

There's quite a few suggestions of how to work around this on the thread but the most common theme is to add the chain certificate to the trusted store manually.  I tried out a few of the suggestions in the thread but had a couple of problems.

  1. Some of the solutions are Linux based
  2. Almost all of them talk about a chain.pem file and I had no idea what that was!

All of the solutions largely focused in around the Java keytool.exe utility you can use to add a certificate to the clients trusted store, so just need to figure out which certificate to install.

I'll not cover the process I took to work it out, but eventually I got the the LetsEncrypt X3 cross signed certificate which you can get from their website.

To use the keytool.exe you need the latest Java runtime installed on the server running the Build Agent, the utility should then be available under %JAVA_HOME%\bin\keytool.exe.  To try and automate the process a bit I created a PowerShell scrpit which will download the certificate and execute the keytool.exe utility to install the certificate to the trusted key store.  It needs to be run as an administrator and naturally the machine needs internet access.  Feel free to use or modify as needed.


Once I'd run this on the build server I re-ran the build and this time, SUCCESS!  Now I just need to remember to do this each time the JRE gets updated.

Monday 22 February 2016

Comparing with the previous row using Microsoft Power BI

This is something which has come up recently at work which at first glance seems like it should be straight forward (tl;dr it is when you know how) but if you're new to Power BI or Power Pivot then it's something which takes some thinking about.  Please note that for this post I will be sticking with Power BI, but if you want to see how this can work with Power Pivot then have a look at Dany Hoter's post over at PowerPivot(Pro).

The Problem

Let's say that you have some data, which is spread over time and you want to be able to create a BI visualisation showing how that value changes as a delta from the previous value.

Implementing a solution

To walk through how to do this, in a way which is hopefully easy to follow, I'm going to grab some sample data which is available on-line.  That data is is the Annual Sheep Population in England & Wales 1867 - 1939 (measured in 1000's) from DataMarket.  No idea why I picked sheep populations but there we go.

The goal here is to be able to create a simple chart which can show the sheep population for a given year and the population difference from the previous year (except for 1867 which should show a difference of 0).

So step 1 is to go and download the data, I've opted for CSV for this example but if you want to try another format then go for it.  Once you've downloaded the data fire up Power BI.

Connecting to the data source

Once Power BI has opened hit "Get Data" to connect to your data source and follow these steps:

Getting data into Power BI
  1. Select the CSV connector
  2. Navigate to and open the sample data set from your local machine
  3. Once the preview window has opened select the "Edit" button

Configuring your data source

One of the first things I've done is renamed the query over in the Query Properties section to "sheep", you don't have to do this but it makes things easier to read for me and also makes the screen shots a little less cluttered.

If you look at your data in the Query Editor window and scroll to the bottom you'll see that there are 3 rows which were in the CSV file which aren't part of the data.  To get rid of these select the "Remove Rows" button from the ribbon and then select the "Remove Bottom Rows" option.  When the option dialog appears enter the value "3" and press "OK", this will remove the rows from the bottom of the data set (scroll down and check).

Removing rows from the query
Next we'll rename the columns to "Year" and "Annual Population", the first of these is fine but the second column will need to be changed, do this by right-clicking on the column title and selecting the "Rename" option.  Alternatively you can select the column and use the "Rename" item in the "Transform" ribbon section.

Now right-click on each column title and under "Change Type" select the "Whole Number" option.  This will ensure that we are working with the data correctly, and you'll notice that the year was mostly likely set to "Text" because of the content of the rows we previously removed.

Almost there, now we need to sort the "Year" column in ascending order, you can use the options in the ribbon or click on the drop-down menu next to the column title and select the correct sort option.

Finally we need to add an index column (we'll see why in a bit) to the data source, in the "Add Column" ribbon.  If you don't select any options then the default is to start from zero, which we'll do in this example, but you can choose to start from 1 or a custom value with a custom increment.  You can leave this new column name as Index.

Adding the index column
Now that we've done all of the prep work you can click the "Close & Apply" option under the Home section in the ribbon.

Creating the new population difference column

Back in Power BI we'll need to be under the "Data" section on the left.

Viewing the data in Power BI
From here add a new column to the data collection by right clicking on the "sheep" (or originally named query if you didn't change it earlier) and selecting the "New Column" option.

Adding a new column to the query
You'll now have a new column imaginatively called "Column" and a query expression editor at the top.  Just to get something for the moment let's enter the following:
Column = SUM(sheep[Annual Population]) 
This will give us a new column called column where every row will now have the sum of the Annual Population column.

Our first new column
As you can see this isn't particularly useful so lets change that expression to something a bit more useful for our purposes.
Population Difference =
'sheep'[Annual Population] - IF(
'sheep'[Index] = 0,
'sheep'[Annual Population],
LOOKUPVALUE(
'sheep'[Annual Population],
'sheep'[Index],
'sheep'[Index]-1)
)
That's quite a lot to take in so we'll break it down a bit and talk through what this is doing.

The first bit is quite simple as we're just saying to take the current sheep population and subtract a number from it.  When we get to the IF statement we're checking to see if the current Index is 0 (I created my Index column to start from zero) and if it is return the current sheep population, this is so that we get a difference value of 0.  If the current index is not zero then we're doing a lookup, with this we're looking up the sheep population value by checking the Index column for a value which is the current Index value less 1 (i.e. the previous value).

If we change our column to this expression we should get the following.

Our data with a population difference value

Visualizing the data

From here we can go back into the Reports section and create a number of visualizations against this data.  Here I've provided an example where I'm using the "Line and Stacked Column Chart" visualization with the annual population for the column values and the population difference for the population difference.  Play around with it and see what looks good for you though.

Visualizing the data

I've provided the data and the pbix file for download if you want to have a look at the version I've put together and play around with it.  Just click on the OneDrive link below.


Coming up next...

One other thing that comes up once in a while is producing a moving value such as a moving average. Next post we'll extend this basic example to show how we can accomplish this.

Wednesday 24 September 2014

Working with Entity Framework Code First and SQL Bulk Copy

There's a few of these that I haven't written, but it seems that you could keep a blog going pretty well with just "Working with Entity Framework and ..." posts. That's not because Entity Framework is particularly bad, I really quite like it myself, but because if you're writing an application of any considerable size or complexity you will at some point reach the limitations of what is possible "out of the box". I'm also aware of the number of "ORMs are evil" posts circulating currently, and possibly you're someone who thinks the same, but for what I'm working on now they make perfect sense, and I'm all for using the right tool for the job.

So what's the problem this time?

General purpose ORMs are great when you're working with transactions, it's what they're meant for.  A user interacts with the system and you need to persist the data from their most recent transaction.  The problem this time is that as the system grows you suddenly find yourself with a couple of use cases where you need to persist a lot of data.  In the example application I've put together for this post I'm using an XML data source with 10,000 records which need to be persisted.

The problem here is that when running with this size data set (with auto-tracking changes disabled) is that it is taking around 40 seconds to run.  10,000 records in 40 seconds is certainly more that I can process manually but for a modern computer it's not so great.  The problem is that as you're adding more records to the context it's getting bloated, it has to start tracking more and more entities, then each entity is persisted individually.  That last point is important because each insert in Entity Framework inserts the new record and then pulls back out the newly created ID code and updates the entity in memory with the new ID code, which is not a trivial amount of work.

So what are the solutions?

Disable features: The first thing to check is that you are telling the context to not auto-track changes, it's a small thing but you're giving the context less work to do, performance isn't about making your code faster, it's about making it do less.

If you were to run this again you would find that you've taken a few seconds of the total run time, which is better but it's still no where near fast enough.

Disabling auto-detect changes


Commit early, commit often: A fairly obvious option but rather than waiting until we've attached all of the entities to the context before saving, save more frequently (e.g. every 100 entities).  Again this is reducing the amount of work the context is having to perform when it figures out which entities it needs to persist and makes a more significant impact in our figures, but it's still got a way to go.

You might also remember that I mentioned about the context getting bloated, well we can do something about that as well by re-creating the context after each time we save the changes.  This stops the context from getting too bloated and again reduces the amount of effort needed to work out which entities need to be persisted.  We've added in some work now for context initialisation but this is typically cheaper.  This does take a bit more effort to maintain and ensure that we're not breaking the system by doing anything stupid, but it again takes a bit more of a chunk out of the run time.

Committing frequently


Get to the DbSet differently: The typical route to adding an entity is to add it to the contexts DbSet collection.  Bizarrely this collection doesn't have an AddRange method, but there is a way to get at one by asking the context for the set directly.  By adding the entities using the AddRange method we can skip all of the tedious foreach looping and adding the entities one at a time.  So we can now make a simple call to AddRange followed by a call to SaveChanges, this is much more performant than the previous solutions, getting down to some slightly more reasonable numbers.

Using AddRange


But what about SQL Bulk Copy?

So the title of the post is about bulk copying, and having read through the above you're probably wondering why I didn't just jump to this as a solution.  Well, Entity Framework has no out of the box support for SQL Bulk Copy because it's database agnostic.  But I've done this before when working with SQL FILESTREAM so why can't I do it again?

Being a lazy developer the first thing I did was look for an existing solution and one turned up in the shape of EntityFramework.BulkInsert.  It seems pretty popular online for this kind of a problem, is available through NuGet and is pretty easy to use.  After adding the package and creating a method to try it out I ran the sample application and waited for it to finish.  It took me a while before I realised that it already had!  For 10,000 records it ran in under 1 second.

Using EntityFramework.BulkInsert


So surely EntityFramework.BulkInsert is the answer then?  Well if you want to stop reading here and go and download it then please do, it's a great little package.  Naturally there are a few things that you need to take into consideration.  First of all bulk copying doesn't bring the ID codes back, so if you need the values you will have to think of a way around this (think SQL 2012 sequences and sp_sequence_get_range).  Next you have to think about how bulk copying works and make sure you get the bulk copy options correct.  By default it won't check any constraints you have in place and it might not observe NULL values, instead putting in default values for the column type.  It also works within its own transaction (unless you provide a TransactionScope), but if you can work around these then you have a great little solution in your hands.

SQL Bulk Copy

I'm a lazy developer but I'm also a fascinated one, I wanted to know if I could still write the code using the System.Data.SqlClient.SqlBulkCopy class instead of relying on 3rd party packages or falling back to ADO.NET (which is an option, but not one I'm going to cover).

I already know that I can get the connection information from the context, and I've previously shown how to get the mapped table name for a given entity, so surely this is possible.  But I am going to be a little bit lazy and not implement an IDataReader for my collection, instead I'm going to load the entities into a DataTable and use that (note, this option really isn't going to scale well).

This is actually a fairly easy solution to implement with probably the most complicated piece being a fairly simple extension method which pulls out the entity properties and their types, then using this to create a DataTable and copy the data using reflection (again, this isn't going to scale well).  Once you have that you just need to write the data to the database for your chosen batch size.

Using System.Data.SqlClient.SqlBulkCopy

This solution isn't quite as fast as the EntityFramework.BulkInsert component, mostly for the reasons I mention, but it can still persist 100,000 records in about 1 second.

I've created a project which is available on GitHub under an MIT license for you to grab and look at.  I've done this because the code isn't really that difficult to follow and is pretty similar to my previous post on SQL FILESTREAM and me talking through lines of code is boring.  Also available is a LinqPad file which I used to create the input data files, just change the number of entities and run it.  But for convenience I've added a 1,000 and 10,000 entity files into the project anyway.

Tuesday 8 July 2014

Where does your FILESTREAM data live?

This is hopefully just a short one but follows up on a previous post about using FILESTREAM with Entity Framework.

After implementing the solution almost as posted we ran into some problems on the environment it was being deployed to (for reference, do NOT disable netBIOS!).  Whilst investigating these issues we needed to figure out where on the server the FILESTREAM data was being stored.  Looking around the internet I found many posts about getting the path name from a SELECT statement but that's useless if you want to know "my data is at C:\<path>" because you want to check disk space, permissions etc...  Not having the DBA who installed available wasn't useful either and there doesn't appear to be a way to get this information back from SQL Server Management Studio.

But, there is a way to find it from the system tables.  So I put together a SQL statement which pulls the information out.  It worked for me but you may want to tweak it to give you the information you want, to filter stuff out and so on.

Thursday 30 January 2014

Working with Entity Framework Code First and SQL FILESTREAM

Whilst looking around at these two technologies I couldn't find any information which was particularly useful about how to get the solution working.  So I wanted to put this up so that should anyone else want to try something similar then there's some (hopefully) useful information out there.

What am I trying to do?

The system that I am involved with at the moment has the need to store files with some of it's entities.  We've started out by using Entity Framework 5 Code First (although in this post I'll be working with Entity Framework 6) to create the data model and database mappings which is working nicely and is proving itself a very useful framework.

When looking at saving file data along with an entity there are a few choices:
  1. Store the file contents directly in the database
  2. Save the file contents to the file system
  3. Use a database table with FILESTREAM enabled
  4. Use a FILETABLE
The first option is fine for small files, but I don't really want to load all of the file data each time I query the table and the files might get pretty big.

The next option is a reasonable option and with recent versions of Windows Server we have a transactional file system.  So far so good, but it would be nice to not have to think about two persistence mechanisms.

FILESTREAMs were introduced in SQL Server 2008 and allow you to store unstructured data on the file system, so it feels like we're using the right tools for the job but they're all nicely in the same package.  The problem here is that Entity Framework doesn't support FILESTREAM.

Lastly there's FILETABLE which was introduced with SQL Server 2012.  This is like FILESTREAM, but rather than defining it at a column level you get a table created for you which provides information from the file system.  It is a really nice system, but it didn't quite fit with how we're structuring data and it's also not supported by Entity Framework.

So the option that I would ideally like to work with here is the FILESTREAM option as it gives me all of the database goodness but with the performance of the file system.  But there is just that minor sticking point of it not being supported by Entity Framework.  After a fair amount of playing around with the technology and research into the problem I figured that I could probably make it work by falling back to basic ADO.NET for handling the FILESTREAM part of the requests.  Whilst this was an option I didn't really want to start having different technology choices for doing database work, so the goal was to see how much I could get away with in Entity Framework.

Setting up the test solution

The database server

With SQL Server the default install options will not give you a FILESTREAM enabled server but you can enable it.  I'm not going to go into how with this post as Microsoft have some pretty good documentation available on how to do this.

This also means that we can't let Entity Framework create the database for us, so you will need to create an empty, FILESTREAM enabled database and point to that.

The outline of the project

I created the solution in Visual Studio 2013, and delving into the most creative parts of my mind I came up with a solution that has hotels, with rooms and multiple pictures of each room (okay, not my most creative moment, but it gets the job done).

So in this solution I have my data model which is pretty simple.  I have some locations, at each location there are a number of hotels, each hotel has rooms and each room has photos.

The Location, Hotel, Room and Photot entities
Solution Data Model
Of all of these the import one is Photo.  This entity has some basic properties, Title and Description, which describe the photo, then there's the navigation properties for getting back to the room and then lastly there's the Data property which is intended to hold the content of the file.  Normally Entity Framework would see this property and it's type (a byte array) and map it to an appropriately named column of type VARBINARY(max).  Whilst we could still let it do this, it would somewhat defeat the purpose of the exercise as we'd be storing the contents of the file directly in the database, so we need to add some configuration to tell Entity Framework to ignore this property when mapping.

Photo configuration information
Photo entity configuration
I'm using the Fluent API here, but you should be able to do this using Data Annotations as well.

At this point if we were to deploy the database we would get a table with no data information and a blank property in our entity.  What we need to do next before any of this is useful is to somehow get a FILESTREAM column into the Photo table.  The solution to this is to use Entity Framework migrations, the basics of which I'll not cover here and leave it as an exercise to the reader.

Migrations provides us with a migration class for each migration added to uplift and roll-back the changes to the database.  The useful method for us in this class is the Sql method which allows us to execute SQL commands; using this we can add our ROWGUID column and our FILESTREAM column with all the constraints we need (and of course the appropriate commands to remove it all again as well for the Down method).

Migrations code
Migrations SQL commands
Now if we run the Update-Database command from the Package Manager Console we get a table with all the right columns of the right types for being able to use FILESTREAM.

So that's half the battle won, the next challenge is being able to read to and write from the table.

Storing and retrieving file data

So how do we query data in a FILESTREAM column?  Well this is the bit where we fall back to the System.Data.SqlTypes namespace, specifically the SqlFileStream class.  We use this class to read the contents of the file back from the server as a stream, but this only works in the context of a SQL transaction.

So the first thing we need to do is get the file path and the SQL transaction information, we can then pass this to the SqlFileStream constructor to get our stream, after which it's just a case of reading from the byte array in our entity and writing to the SqlFileStream stream.  To get this information we need to run a custom SQL statement.  We could do this using a SqlCommand object, but I still want to stick to Entity Framework a bit more, fortunately there's the DbContext.Database.SqlQuery<TElement> class which we can use to run raw SQL statements, it also handles parameters so we can parametrize the query (great for guarding against SQL injection attacks) and it an enumerable collection mapped to TElement (which does not have to be a part of our data model).
Raw Data Query
Raw Data Query
The FileStreamRowData class here is a custom class with a string property for the path, and a byte array for the transaction context.

Running all of this inside of a transaction scope will get information required (the call to "First" will enumerate the collection) to pass to the SqlFileStream constructor, we can then use this to write data to the stream.
Writing to the FILESTREAM
Writing to the FILESTREAM
The same applies when writing to the database as well, but with the source and destination reversed.  Also when writing to the database you would need to save the entity first.  Wrapping up the Entity Framework bit in the same transaction scope means that even if you call "SaveChanges" on your context, if the transaction does not successfully complete then the changes are stilled rolled back.

So does it work?

Well, yes it does, and it works pretty nicely as well.  It's maybe not the final solution that I'll use as I'm still investigating a couple of other options, but it's certainly not a solution that I would be upset at using, and by hiding the complexity in the data services the client need never know how the file information is being held in the database or using which technologies.

How do I play with it?

You could probably work most of what you need out from this post, but for convenience sake I've also put up the whole solution onto GitHub, so feel free to head over and take a look.  If you come up with any suggestions or improvements then feel free to contribute.

Tuesday 30 July 2013

Playing with CoffeeScript

I've recently been playing around with CoffeeScript lately, and as I have a tendency to do I decided to crack open a prime number challenge and see what the solution looked like.

The Challenge

I recently set this up as a challenge at work as a bit of fun which goes as follows.

"Calculate the first 10,000 prime numbers, output the largest prime number and the sum of all palindromic primes"

I've implemented the solution a number of times using C#, C++, Python, Go and JavaScript and it is the latter that I wanted to compare the solution to.  The JavaScript solution I created typically ran in about 15ms after a fair amount of tweaking and profiling to squeeze as much out of it as I could do (within my own abilities).

In all I found writing the solution a pleasant experience, with a very simple and expressive syntax which abstracts away some of the ugliness that is JavaScript (not that I particularly dislike JavaScript, it's quite fun actually).  List comprehensions were incredibly useful and powerful as they are in other languages and writing iterations and checks were very simple.

Anyway, here's my solution.  I'm sure it's not perfect, and there are probably some CoffeeScript tricks that I've not picked up yet.  But the solution is as fast as my JavaScript implementation and much easier to read, so all positives in my book :)

My JavaScript solution along with a couple of others is available on the gist as well.



EDIT: Since posting this I tweaked the solution a little bit after noticing that I was evaluating all of the values in the array of candidates, instead of working only up to the square root of the upper value