Tuesday, 28 July 2009

Server-side validation

I must admit, I had assumed that server-side validation was something everyone just did. I thought it was obvious that the server should validate whatever data it receives, rather than relying on client-side validation. It seems, however, that I was mistaken in that belief - some people remain convinced that validation of, for example, mandatory fields in a form should be done on the client... and only done on the client. What follows is a brief explanation of why all applications must do server-side validation.



Let me give you an obvious example of where things can very easily go wrong: APIs. If your application has an API, you are essentially making your services available to third parties. Now, that might not necessarily mean people outside of your company, but it almost certainly means someone other than you, the "back-end guy" (or gal.)

So when you provide an API, you have no control over what data gets passed to your application - they could pass down entirely bogus data: null values; incorrect data types; values outside of certain bounds (strings too long, numbers too big, future birth dates.) Because you have no control over what they pass down, it's clear that you need to have some measures in place to prevent that dodgy data from causing your application to topple over and your database to be corrupted.

Now, there are some of you out there who may think that a try-catch routine does all the work for you; you can be lazy and let the built-in error handling do your validation on your behalf. This isn't so and here's why:

  • If you are INSERTING to or UPDATING a database in the "TRY", you will need to roll back any changes in the "CATCH";

    • This not only means you have to put a rollback mechanism in place (okay, you can use a transaction), but you spend time undoing something you've just done (which is all a transaction does for you).


  • If you're accessing a database at all, there is an overhead in establishing a connection to the database and doing whatever it is you want to do there;

  • If you are altering a session state, you will need to roll that back if things go wrong;

  • If you are processing a form in something like .Net, you will need to retain the viewstate and pass it back to the client, reporting the exception, which is costly both in terms of processing and response time;

  • Indeed, any processing you do on the server that could fail if the data is invalid is wasted processing if it turns out the data is invalid.



So, try-catch routines are not the be all and end all. Have a look at this simple example in C#, assuming all of the parameters are mandatory (i.e., they are set to be NOT NULL on the database), and that the user must be 18 years or over to sign up:

[csharp]public Boolean UserSignup( String email, String forename, String surname, DateTime dateOfBirth )
{
If ( String.IsNullOrEmpty( email ) )
throw ArgumentNullException("Email");

If ( String.IsNullOrEmpty( forename ) )
throw ArgumentNullException("Forename");

If ( String.IsNullOrEmpty( surname ) )
throw ArgumentNullException("Surname");

If ( dateOfBirth.AddYears(18) > DateTime.Today )
throw UserTooYoungException();

try
{
// Insert the user into the database.
...
}
catch ( SqlException sqlEx )
{
// Handle any exceptions from SQL, such as violation of email uniqueness.

// Report failure.
return False;
}

// Report success.
return True;
}[/csharp]
Now, in that example, we tested three strings to make sure they are actually there. If I had constraints on the length of any of those strings, I should also have done it there, but I've kept it simple for now. Also, the email parameter should be tested with a regular expression to ensure it is indeed a valid email address (not that it has an endpoint, just that it is of the correct format), but I've left that out for brevity. The user's date of birth is also tested to make sure they're at least declaring themself old enough to sign up for the site.

There we go - four quick tests that will process so quickly on the server I won't even be able to calculate an average processing time unless I put them in a several-thousand loop. And the code remains clean, which is the complaint I've been hearing from people who don't want to do server-side validation. You could, of course, do some nice modularisation on the whole validation section, and you might want to collate and report on all the exceptions, rather than raising just the first one you come across, but you get the picture.

So we have, for all intents and purposes, 0ms of processing on the server to validate those four parameters. I've certainly not added a significant overhead to the processing and I've ensured that any expensive calls to the database are much more likely to be successful. You'll notice, though, that I've still put that part inside of a try-catch routine as it allows me to catch any exceptions the database may raise where, for example, unique constraints are violated*.

Now consider this very basic example:
[csharp]public Double MyDivisionFunction( Double numerator, Double denominator )
{
return ( numerator / denominator );
}[/csharp]
If my denominator is 0 (zero), it's going to fail. No two ways about it, an exception will be thrown. If your service is doing any sort of division, you have to check to see if you're attempting to divide by zero - I can't imagine there are still developers out there who don't put at least that check in place. Actually, I imagine there are lots of them, but it's better to pretend they don't exist - they give the rest of us a bad name.

All it would take in there is the simple check (most of) you would do anyway:

[csharp]public Double MyDivisionFunction( Double numerator, Double denominator )
{
if ( denominator != 0 )
{
return ( numerator / denominator );
}
else
{
// Return something to represent infinity
// or throw an exception.
}
}[/csharp]
Hopefully, it should be clear just from those two quick examples that server-side validation is important. Yes, you absolutely should do client-side validation as much as possible, but mostly because of the user experience - validating without making a round trip to the server means the user can see almost immediately where there are things they need to resolve, and you've saved on all that server processing to boot. It absolutely, must not be the only point of validation - even disregarding APIs - because all sorts of things can go wrong with client-side validation:

  • In a browser, client-side validation often requires JavaScript to be enabled. If JavaScript is disabled (extremely common in corporate environments), you don't get any validation;

  • There can be locale-related quirks that you haven't accounted for in the client-side validation

    • You can't control the locale on the client machine, but you certainly can on the server - that's yours to do with as you please - and it makes a whole lot more sense to write that validation once than to attempt to account for all the differences there may be in the clients.





Anyway, that's my tuppence on the subject. If you weren't convinced about it before, hopefully you can now see that server-side validation isn't the overhead you thought it was, actually makes your life much easier as a developer, and isn't just a nice-to-have, when-I-get-round-to-it thing - it's a must.

I came up with a bit of an analogy for this: not doing server-side validation is a bit like buying a new car without first knowing the make, the model, the price, the colour, how it feels to drive, and your own bank balance. You just wouldn't do it.

* It is far quicker to check within the SQL script or stored procedue for uniqueness (or any other database constraints, for that matter) than to make two (or more) separate database calls: one to check for uniqueness; the second to insert the new record, if the first call confirms no constraints will be broken. Indeed, splitting it into two calls increases the likelihood of a failure at the second step as, on applications with a high transaction rate, the data may well be out-of-date by the time the insert is attempted (consider a system for booking flights or one used to buy and sell stocks and shares.)

No comments:

Post a Comment