Wednesday, February 11, 2015

Google's No reCAPTCHA in PHP

Google bought reCAPTCHA in 2009, and recently they made their small revolution by replacing the usual but at the end craaaaaaaazy skewed texts with a simple checkbox, except in a few cases where there might be a regular captcha or a small popup to ask you to identify pictures of cats :


On my website I was still using the old library with PHP classes and functions. Now they provide a RESTful JSON web service to do the same. Unfortunately Google provides no example and if you are using PHP on a shared hosting you will run into problems if you try to use methods commonly found on the Internet, StackOverflow included.

So here it is.

On the page displaying the captcha:


In the PHP script handling the form:



The public and private keys are provided by Google for each reCAPTCHA instance.

The Internet tells you to use the one-liner file_get_contents(url) to retrieve the JSON response, unfortunately on shared hosting this is not an option and you have to fallback to cURL instead.

The 1st generation

The story behind reCAPTCHA is an interesting one.
Originally the slogan was "Stop spam. Read books" because each day millions of captchas were solved to help digitizing books were OCR technologies were unable to do so.

They had this system with two different OCR engines and if neither of them was able to find a word from the dictionary, then the world would be presented to users. Words from OCR get 0.5 point, and answers from humans get 1.0 point. When the image gets 2.5 points with the same deciphered word, it is considered as accepted.
Words considered valid are then shown next to unknown words in order to tell humans and machines apart (this is called a Turing test). These valid words can be words that were decoded correctly by the two OCR engines or words resulting from the process described in the previous paragraph.

The New York Times Archive

At some point, the slogan disappeared because reCAPTCHA stopped reading books... Apparently it helped digitizing 13 million articles from The New York Times.

The 2nd generation

On websites not using the new API demonstrated in the animated GIF above, the captcha image looks like this:

It is only today that I realized the number on the right comes right from Google Street View and made the link between Google buying reCAPTCHA and these house numbers. So instead of helping culture millions of people are now helping Google with their business and working for free for the company. (Yet this blog is hosted by Google, I know ;-))

Like me you are probably amazed how the guys at Google are able to automatically detect house numbers in all images the cameras take, so you would ask yourself: Is it really that complicated to identify a 3 digit house number?
No, it is not! But Google wants to make sure its OCR engine is right before filling its database with wrong information to provide its users accurate information and make more money.

The 3rd generation

It's great news for us humans only having to click a textbox instead of trying to decipher impossibly complex CAPTCHAs as found on downloading websites. Remember the one with the cats and dogs, and I am not talking of Microsoft latest finding?

Alternatives

TextCaptcha is another good alternative you should checkout.

Good reads

At Stanford they studied how CAPTCHAs have become hard to solve for humans.

Last but not least, I didn't resist posting this, a Turing test letting only computers in:



(Source: http://www.smbc-comics.com/?id=2999)

No comments:

Post a Comment