I recently evaluated CAPTCHA for a client requirement. Here is a brief overview (Source: blogs, wiki, white papers found on google search)
CAPTCHA stands for: “Completely Automated Public Turing test to Tell Computers and Humans Apart”. It refers to a technology familiar to anyone who’s registered on a popular website – the “what word is shown on this image” challenge. As the “Turing test” alludes to, the purpose is to distinguish between humans and computers. Types of CAPTCHA:
* Photo/Image CAPTCHA
* Animated CAPTCHA
* Sound CAPTCHA
* Multiple choice questions
* Logic questions
CAPTCHA doesn’t prevent hackers or attackers to the site. It merely attempts to prevent bots and spammers.
How does CAPTCHA work?
CAPTCHA fools the bots by asking questions or generating pictures only human can answer. They contain distorted letters, different pictures with different letters in different shapes. After the user submits the answer CAPTCHA validate the answer. Since Bots cannot recognize each letter alone, this is a fairly difficult to break.
How to create CAPTCHA?
CAPTCHA can be written using any programming languages including Java. The code should provide three main functions. First, the code should generate a random picture with different properties. Second, validate the user answer. Third, make these pictures secure. Also, there are many things to make the code more reliable like” Rotate the text randomly, add random spaces in between characters, use a TTF fonts and change the font randomly every time, use a random text and image size every time, use more advanced text distortion and colors, move the lines randomly, store the password in a random cookie. In addition, there are CAPTCHA creator programs which allow users to choose their CAPTCHA shapes. Sophisticated libraries often provide extensions for developers to create their own algorithm for drawing images.
* Preventing Comment Spam in Blogs.
* Protecting Website Registration.
* Protecting Email Addresses from Scrapers.
* Online Polls
* Search Engine Bots.
* Worms and Spam
Implementations of CAPTCHA
There are several implementations of CAPTCHA – commercial and open-source, in almost every programming language. The following are some Java based CAPTCHA frameworks
* reCaptcha (most popular, available as web-service)
* SimpleCaptcha (Java)
* JBoss Seam Captcha (for Java Seam based projects, works out of the box, can extend algorithm)
* jCaptcha (Java)
* Kaptcha (very simple java alternative to jCaptcha)
* … and many more
* When captchas get funky that humans with 20/20 vision start struggling; accessibility is far away
* Prone to common attacks
* The image challenge is inaccessible to visually impaired users. This problem is usually addressed by providing an alternative audio CAPTCHA for these users. However, many audio CAPTCHAs can be difficult to hear even to those with good hearing due to background noise and distorted pronunciation.
* Image CAPTCHAs are not infallible. A number of projects have shown that automatic character recognition software can often read the letters in the image.
There are several alternatives to CAPTCHA. Most of them can be easily incorporated into an existing web framework:
- Dummy form elements
Dummy form elements can be added to trick bots into filling them and hiddeen those from users with CSS. Additionally, dummy elements should be named suggestively to fool the bots – for example, subject, name, URL. Then when form is submitted, system can check if any of these fields have been filled and if so you can isolate a “bot.”
- Session variable / GET request detection
This isn’t CAPTCHA alternative, but it can be used to filter out spam-bots. A variable is put in session when a GET request is made and when a form is submitted the system checks the session for that variable. This can filter out bots that submit request directly to POST without getting a page with the form. However this system can be easily fooled by creating a bot that acts like a web browser.
- Session variable with time computation
Similar to the above session idea where, time can be recorded when the form was loaded. On post, system calculates the time difference and if it’s less than say 5 seconds, it can be ignored as spam. However spam bots could easily adjust for this by building in a delay.
- 5 Layer Spam Filter
It uses some cunning techniques to identify bots without having to resort to CAPTCHA:
* Do fields hidden off-screen still get filled in
* Is the form filled in in seconds?
* Does Askimet mark it as spam
* Etc …
- Forced Visual cues
This is a simpler alternative. The webpage with a “yes” and a “no” radio button can make “no” the default and have the visitors state that they are not spammers by selecting “yes.”
- SAPTCHA (“Text based CAPTCHA)
SAPTCHA stands for Semi Automatic Public Turing Test to Tell Computers and Humans Apart.
The key concept is same as with CAPTCHA: user is presented with test question or instructions and must give correct answer to use resource. Main difference is that computer does not try to automatically generate “unique” test questions on each query; only verification of answer is automatic. Instead, unique test question and answer[s] is set by moderator or owner when SAPTCHA is installed, and should be easy to change if needed.
Comparison of SAPTCHA versus CAPTCHA features
Advantages of SAPTCHA over CAPTCHA:
1. SAPTCHA software is much easier to implement than CAPTCHA
2. Textual SAPTCHA does not discriminate against disabled who can use internet. [Audio CAPTCHA plus visual CAPTCHA would double effort and is thus very uncommon in practice]
3. There is methods for breaking image based CAPTCHAs. Even with popular CAPTCHA, the system may still get spammed by entirely automatic bot. SAPTCHAs can be much more varied and there won’t be common method of breaking until it becomes possible for computers to interpret human instructions in normal human language.
Advantages of CAPTCHA over SAPTCHA (disadvantages of SAPTCHA):
1. With SAPTCHA, when banning spammer, moderator must enter new question and answer. With CAPTCHA, though, there’s point 1 above (& CAPTCHA code won’t remain useful forever either), so for not extremely popular websites it seems highly unlikely that even in long run CAPTCHA would save work.
2. If SAPTCHA is used to protect registration, it is easier to register many accounts at once than with CAPTCHA; may matter with popular email services.
3. Verbal SAPTCHA is problematic when it is multi-language resource that needs frequent changes.
- Mouse Intervention CAPTCHA
A simple Mouse Intervention CAPTCHA implemented in a Java applet. The server generates some drawings and asks the user to click on all drawings with an odd number of edges. The mouse click events are recorded. As long as the mouse is clicked within the dark area of drawings with an odd number of edges, access is granted.
CAPTCHA is widely used across the internet including by Google, Yahoo and Microsoft. Hence discarding this solution should be done only for a ground-breaking alternative. With CAPTCHA, the criteria would be to pick an image/audio based CAPTCHA or text-based CAPTCHA for the project depending on the target user base. Once that’s decided, one of the several free libraries can be chosen to fit well into an existing technology stack. A relatively “light” library that provides easy extension hooks for custom extensions of CAPTCHA algorithms would be ideal.