On Validation: What's your first grade teacher's middle name?

I'm in the process of trying to figure out how i want to handle validation within MG Unity. There are a few options to explore. I'm studying how Reactor handles validation out of the box, using the specs for each field in the database as it's guide, and it reminded me of something ...

I needed to order a copy of my birth certificate a few weeks ago. There's a company online called Vitalchek.com that handles it for you for all 50 US states. There's only one "small" problem i ran into ... the validation routines screwed up my application entirely, because the developer looked at validation from a narrow data-centric perspective. As a result, the validation routine did not allow me to enter my valid phone number or zip code, and my birth certificate was sent to a non-existent city in Maine. UPS couldn't call me either to try and figure out what went wrong, so they just sent the birth cert back to the sender.

The developer was stupid enough (sorry, but i can't find a kinder adjective) to prepend a 0 in front of my 4 digit zip code when i kept trying to follow the error handling's directive and "enter a VALID zip code". He also stripped the phone number of the useless extra digits and characters, reformatted it according to the US standard and UPS tried to call me at some phone number in Toronto, at ext. 7 no less.

I run into this all the time because i live in Europe. My valid zip code is 4 digits long. My valid phone number is 11 digits long including the country code. I should almost certainly include the "+" in front of it to indicate that it's a country code, which makes my valid phone number a string that is now 12 characters long. If i had an extension, it could easily be a string with something like 20 or so characters.

If i lived in Britain, my zip code would include letters and digits.

There is no reason whatsoever for a zip code or a phone number to be stored as numerics or validated as numerics (after maybe stripping them of certain characters) - because our applications have no reason to use them as numerics.

The bottom line is this. If a user doesn't want to fill in their correct zip code or phone number, "validating" those fields to numerics that are 5 and 10 digits long respectively, or in any way developer can dream up, will not force anyone to enter their valid data. A valid phone number for a particular person is one you can use to call them. A valid zip code for a particular person is the one the post office or a courier can use to deliver a package to them. A valid email address for a particular person is one that can gets to their inbox. There's no regex in the world that can prove an email to be valid for a given person, and that's what we're asking for when we ask someone to fill out an email field. Please give us an email address where we can contact you.

In this sense, the user is solely in control of whether the data they enter is valid or not. All our clever validation routines and regex's can only get in the way. In my case, they get in the way all the time.

What i find particularly annoying is when a developer decides that a field can be at the most 20 characters long for instance and i get a "Please enter a VALID address" message back when i need 25 or 30 charaters for the address. Again, i can't enter my valid address where the post office can find me.

The ONLY reason i see for validation is if the user's entry could cause the application to error, and there's no way for the programmer to work around that. Say for instance you allow your user to decide what tax rate they want to pay, carte blanc, and you know you have to multiply by that number to get their total. Then in my mind it's necessary to validate the taxRate as a numeric before you let the application try and multiply by it. But cases like this are rare.

And what about those fancy regexes used to validate email addresses? I've used them in the past, but i won't anymore, and i'll tell you why. I've never gone back and updated them in all the applications i've built in the past when new domains are released. Have you? And i've run into a bunch of validation routines that only validate for .com, .net, and .org. Personally, I'm fine there with my gmail.com address, but this certainly trips people up. Only a keen developer would suspect an outdated regex when they see a "Please enter a VALID email address" error message on a valid .biz email, and try a .com address instead. A normal user would have no clue what's going on.

And we as developers never find out about all these validation snafus that live within our own work. Our users just freak out and give up, like i do, when confronted by what seem to them to be senseless error messages and forms that don't let them enter their valid data.

I've run into validation brick walls like this where the validation routines would not allow me to enter my valid data on some very high profile sites, including macromedia.com, adobe.com, and symantec.com. And it's always either something in the address or the phone number. I run into it suprisingly often, more times than not, and am forced to enter invalid data in order to trick the form into working.

The Vitalchek developer was really clever tho'. I have to give him credit for ignoring my corrections after several attempts to warn me with the INVALID messages, and just correcting my data for me. As in "whoops, this user is too stupid to fill in valid information, so we'll just do it for him after 3 tries". He had me fooled into thinking the form accepted my corrections! So i didn't have a chance to employ any of my own tricks.

We actually keep a Skype number with a Los Angeles area code for just this purpose. Now if only Skype would offer virtual addresses for filling out forms on the internet that would be forwarded to your real address, we'd be all set.

In my mind, validation should be as user-centric as possible. I've run into way too many problems with web application that obviously took a database-centric approach toward validation to believe otherwise. If your app has to send an email to the user, then require the email address. If it doesn't, then don't. As much as possible, let the user decide what is valid and what isn't in regards to their data, and give them plenty of room in your fields to do that. Let them decide what they want to tell you about themselves, unless the functionality that the user is "contracting" with you to provide by filling out the form really requires that information.

And please, please, please, if you have to ask a secret question, let me as the user decide what it is! I honestly don't know the answer to most of the secret questions out there in those dropdowns!

Comments
I feel your pain!
I also think you forgot one of the most common mistake USA-based developers do. They keep pretending your mail address include a "state". No need to say this doesn't apply in most countries (including whole Europe).
# Posted By Massimo Foti | 9/27/06 12:11 AM
Hey Massimo,

Well, i didn't want to rub it in too much, but yes, of course! I run into that very often.

After so many years on the web, i almost find it strange that the envelope manufacturers of the world still let you write ANYTHING on an envelope. Man, are they behind the times! ;-)

ciao!
Nando
# Posted By Nando | 9/27/06 1:33 PM
Well said Nando, I have to agree with you on this.
Just wondering which portion/form at adobe.com you had trouble with. Please let me know ?
# Posted By Rahul Narula | 9/28/06 2:07 AM
I think there was something on the registration form for Adobe Developer Week ??? but i could be mistaken. I was able to work around it quickly so it only left a small impression in my mind "Aha ... Adobe.com! That's surprising. Macromedia merged with a US-centric company ..."

I DO remember the bug report form ... i think at that time it was still on macromedia.com. I'd been seeing an inconsistency with CFAS picking up changes to code i pre-compiled and uploaded mixed with source code i uploaded to the server and let CFAS compile on first use. It was a little complex to explain, and i very carefully typed into the description field the steps to reproduce, in an as concise and accurate way as i could, read it over several times and corrected, restated things more accurately, clicked Submit, and got back a "Invalid Description" error, explaining to me that a valid bug description was less than 2000 characters long.

Jez that pissed me off. No offence to anyone involved. Really. I know how hard it is to get everything "right" in this business from all the different perspectives you need to look at things. But i was really pissed, nonetheless, and almost gave up. It seemed like an important bug to fix, so in the end, i decided to persist and try again.

I opened Word, so i could track the number of characters, and started again. I can't remember whether the validation script wiped my description or not, but my impression is that it did. It took me a long time to figure out how to say it in less than 2000 characters, but finally i managed it at exactly 1999 characters by leaving out some details which i thought were important and clipping my sentences into phrases, and sure enough, almost as a tribute to the developer who very accurately built that form, my bug report was accepted. ;-)

The bug has not been fixed to my knowledge, and i've always wondered if it was because of how i shortened the description.

I'm not sure if the bug report form on Adobe.com has the same validation approach now as it did then. I do remember i complained somewhere, and got a response from someone at Macromedia that they'd look into it. But i never heard if it had been changed. Maybe the engineers really want that the descriptions are no longer than 2000 characters, and did not accept the suggestion.

I was in what turned out to be a very heated discussion with one of our clients in LA a few weeks ago. I pointed out an overly restrictive validation routine on their site and just said it should be corrected, indicating all the ways it could fail. They didn't want to do it. The director of marketing, no less, insisted that they absolutely needed to have VALID data if their business was going to succeed. He would not be persuaded. And we're talking about a form here where the first and last name HAVE to be less than 15 characters long each, the email address can only be on the domains .com, .net and .org, and HAS to be less than 20 characters, and the phone number ... well, you know. 10 digits, numeric characters only, no spaces either.

Well, I'm way over 2000 characters, so I'd better leave it at this before someone gets annoyed at my loquaciousness! ;-) n.
# Posted By Nando | 9/28/06 7:14 AM
Hi Nando, suppose i want to track all comments of a post, without commenting it: i must add a comment, with a non-empty text, that will be published, while i would only receive comments to the post: is this another example of bad validation?
spero di essere stato chiaro. ti abbraccio
# Posted By salvatore fusto | 9/29/06 1:06 AM
Hi Salvatore,

I'm not sure what's good and bad, but i do know that validation needs to much more user centric than it is these days on the web.

I needed to send my birth certificate back the the US to be recertified as valid (talk about validation! i'm not enough proof that i've been born) and used FedEx to do it. Their online form insisted on spell checking our company address, and Centro Nord Sud became Central Nord Sued ... again, without my consent. Nobody seems to know how to collect my valid address anymore on their web forms. The bigger the company, the worse it seems to be.
# Posted By Nando | 10/3/06 8:33 PM
This is why the company I work for uses a US/Canada validation schema and an international one. The international one is extremely generous in it's validation (basically provide us with address 1 and a country - the rest is gravy). The system is setup that I we add additional schemas per country if get lots of bad data and it's become a problem - however, when it comes down to addresses outside of the US/Canada I'll assume that the user knows better than any research that I can do.

BTW, nice to see my captcha here.
# Posted By Peter J. Farrell | 10/5/06 1:01 PM
BlogCFC was created by Raymond Camden. This blog is running version 5.1.004.