Primarily, the argument seems to be that it's not needed if you're writing both the client and server components - that you're happy to rely on your own ability as a front-end developer and not bother with server-side validation. In this post, I will attempt to demonstrate how even data with a high level of accuracy can benefit from server-side validation, and how you can do that validation in an entirely inobtrusive way.
[plain highlight="1" gutter="false"]Data Correctness | No Validation (ms) | Validation (ms) | Without vs With
100% Accurate | 0.583 | 0.588 | 101%
99% Accurate | 0.577 | 0.593 | 103%
95% Accurate | 0.579 | 0.559 | 97%
90% Accurate | 0.587 | 0.533 | 91%
0% Accurate | 0.579 | 0.025 | 4%[/plain]
There, that's a quick set of results for you to show you the potential performance implications from a basic bit of server-side validation. As you can see, the overhead when you have a very high level of confidence in the data the server is receiving is in the region of a few percent (it actually varied between 98% and 104%, depending on the run I did, for accuracy of 100% and 99%) but, as you become less certain about the data (i.e., as you lose or give away control of the client/interface), the performance gains from validating become more and more significant. At a time when all of the data you are receiving is bad, you're looking at a performance saving in the region of 95-96% by comparison with not validating... and even at 90% accuracy, you're looking at a performance saving of as much as 10%!
So, let me tell you how I came up with those admittedly very simplistic statistics: I ran a simple series of tests, attempting database inserts a number of times (the above results come from averaging out 50 runs of 100 attempted inserts each.) Let me show you the code:
[csharp]protected void Page_Load(object sender, EventArgs e)
{
Double accurate100pcDurationNoValidation = 0;
Double accurate100pcDurationValidation = 0;
Double accurate99pcDurationNoValidation = 0;
Double accurate99pcDurationValidation = 0;
// etc.
// The loops.
for (int i = 0; i < 50; i++)
{
accurate100pcDurationNoValidation += _
RunLoop(connectionString, 0, 'Joe', 'Bloggs', 'joe@test.com', validDoB, NO_VALIDATION);
accurate100pcDurationValidation += _
RunLoop(connectionString, 0, 'Joe', 'Bloggs', 'joe@test.com', validDoB, VALIDATION);
accurate99pcDurationNoValidation += _
RunLoop(connectionString, 1, 'Joe', 'Bloggs', 'joe@test.com', validDoB, NO_VALIDATION);
accurate99pcDurationValidation += _
RunLoop(connectionString, 1, 'Joe', 'Bloggs', 'joe@test.com', validDoB, VALIDATION);
// etc.
}
// Display the results, calculating the averages.
}
private Double RunLoop(String connectionString, Double numBad, String forename, _
String surname, String email, DateTime dateOfBirth, _
Boolean skipValidation)
{
SqlConnection conn = null;
// Start the stopwatch.
DateTime startRun = DateTime.Now;
for (Int32 i = 0; i < 100; i++)
{
// Force in some bad ones, giving an invalid birth date.
if (i < numBad)
{
try
{
// Use an invalid Date of Birth.
User user = new User(forename, surname, email, DateTime.Today, skipValidation);
conn = new SqlConnection(connectionString);
conn.Open();
user.Save(conn);
}
catch (Exception)
{
// Do nothing with the exception - just keep going.
continue;
}
finally
{
// Close the connection.
}
}
else
{
try
{
User user = new User(forename, surname, email, dateOfBirth, skipValidation);
conn = new SqlConnection(connectionString);
conn.Open();
user.Save(conn);
}
catch (Exception)
{
// Do nothing with the exception - just keep going.
continue;
}
finally
{
// Close the connection.
}
}
}
// Stop the stopwatch.
DateTime endRun = DateTime.Now;
// Do some tidying up, including clearing out the records we just added.
return (endRun.Subtract(startRun).TotalMilliseconds);
}[/csharp]
The User class referenced in the above code is the key thing here. Whereas I previously described up-front validation within a given method, I've put the validation in the property setters of the User class, which is where they absolutely belong. I've seen people put some validation in the getter and in some very specific circumstances, that makes sense, but the setter is almost always the best place to validate the data: you can keep your validation logic self-contained within the class; any attempts to enter invalid data are immediately rejected by the class itself; it's clean; and it's easy for you to manage because there's only one place you need to validate that value for that class.
For the purposes of the testing, I used the property setters when the "skipValidation" property was set to "false" and directly set the private properties, circumventing the setters, when it was set to "true":
[csharp htmlscript="false"]public class User
{
#region Properties
// User Id
private Int64 _pkUserId;
public Int64 Id
{
get { return _pkUserId; }
}
// Forename
private String _forename;
public String Forename
{
get { return _forename; }
set
{
if (String.IsNullOrEmpty(value))
throw new ArgumentNullException('Forename');
else if (value.Length > 50)
throw new ArgumentOutOfRangeException('Forename');
else
_forename = value;
}
}
// Surname (same as Forename)
// Email (same as Forename, but length can be up to 255)
// DateOfBirth
private DateTime _dateOfBirth;
public DateTime DateOfBirth
{
get { return _dateOfBirth; }
set
{
if (value == null || value == DateTime.MinValue)
throw new ArgumentNullException('DateOfBirth');
else if (value.AddYears(18) > DateTime.Today)
throw new ArgumentOutOfRangeException('DateOfBirth', 'User must be at least 18');
else
_dateOfBirth = value;
}
}
#endregion
#region Constructor
public User(String forename, String surname, String email, DateTime dateOfBirth, _
Boolean skipValidation)
{
if (skipValidation == true)
{
// Avoid the property setters, skipping validation.
_forename = forename;
_surname = surname;
_email = email;
_dateOfBirth = dateOfBirth;
}
else
{
Forename = forename;
Surname = surname;
Email = email;
DateOfBirth = dateOfBirth;
}
}
#endregion
#region Helper Methods
public Save(SqlConnection conn)
{
// Attempt an INSERT.
}
#endregion
}[/csharp]
One other thing I need to point out is that the validation in the property setters directly reflects what is in the database - the forename, for example, is NOT NULL and has a data type of nvarchar(50). Also, the observant among you may have noticed that, each time there was a validation exception raised, it was the absolute last thing that was checked (the over-18 age restriction), which I did deliberately because it would have been an unfair test otherwise.
Again, the test set up was basic, but also consider these factors:
- The database was on the local machine, so there was no network latency, which would have skewed the results even more in favour of validation;
- Web services and remote procedure calls can be quite slow (often much slower than direct database calls) - if you can avoid making wasted calls because of bad data, you should;
- Wrapping multiple database interactions in a transaction will add a further overhead, but a necessary one - even with validation (though I didn't use them in my simple test);
- If you do your validation explicitly, you know exactly what you're checking for and what you want to eliminate - if you rely on built-in database logic and native error handling, you may well get unexpected results, lock-ups, and/or corrupted data.
No comments:
Post a Comment