some smart people said:
First Use - Alta-Vista
In 1997 Alta Vista sought ways to block or discourage the automatic submission of URLs to their search engine. This free "add-URL" service is important to AltaVista since it broadens its search coverage. Yet some users were abusing the service by automating the submission of large numbers of URLS, in an effort to skew AltaVista's importance ranking algorithms.
Andrei Broder, Chief Scientist of AltaVista, and his colleagues developed a filter. Their method is to generate an image of printed text randomly so that machine vision (OCR) systems cannot read it but humans still can. In January 2002 Broder stated that the system had been in use for "over a year" and had reduced the number of "spam add-URL" by "over 95%." A U.S. patent was issued in April 2001.
Yahoo's Chat Room Problem
In September 2000, Udi Manber of Yahoo! described this "chat room problem" to researchers at CMU: 'bots' were joining on-line chat rooms and irritating the people there by pointing them to advertising sites. How could all 'bots' be refused entry to chat rooms?
CMU's Prof. Manual Blum, Luis A. von Ahn, and John Langford articulated some desirable properties of a test, including:
the test's challenges can be automatically generated and graded
the test can be taken quickly and easily by human users
the test will accept virtually all human users with high reliability while rejecting very few
the test will reject virtually all machine users
the test will resist automatic attack for many years even as technology advances
CMU's CAPTCHA Research
The CMU team developed a 'hard' GIMPY CAPTCHA which picked English words at random and rendered them as images of printed text under a wide variety of shape deformations and image occlusions, the word images often overlapping. The user was asked to transcribe some number of the words correctly.
A simplified version of GIMPY (EZ GIMPYU), using only one word-image at a time, was installed by Yahoo!, and is in use currently in their chat rooms to restrict access to only human users.
Pioneering CAPTCHA Research at PARC
PARC’s research builds on its pattern and image analysis competencies to create reading-based CAPTCHAs. Principal Scientist Henry Baird, an expert on computer vision and document image analysis, also organized the first NSF-funded International Workshop on Human Interactive Proofs, held at PARC in January 2002.
Baird also collaborated with Richard Fateman and Allison Coates of UC Berkeley to develop PessimalPrint, a CAPTCHA that uses a model of document image degradations that approximates ten aspects of the physics of machine-printing and imaging of text. This model included spatial sampling rate and error, affine spatial deformations, jitter, speckle, blurring, thresholding, and symbol size. Their paper, PessimalPrint: a Reverse Turing Test, was the first refereed technical publication on CAPTCHAs.
Bracing for the Arms Race
Most CAPTCHA research to date has been limited to academic applications. Far more powerful algorithms will be required for commercial CAPTCHAs. As CAPTCHAs become more prevalent, bot programmers are expected to unleash armies of bots bent on breaking them.
Most research programs focus on either building CAPTCHAs or breaking them through, e.g., dictionary and computer-vision attacks. PARC research is unique in that it does both: we play both offense and defense. From exploring how to break them, researchers are discovering new techniques for building CAPTCHAs that are less vulnerable. For example, BaffleText uses non-English pronounceable character strings to defend against dictionary-driven attacks, and Gestalt-motivated image-masking degradations to defend against image restoration attacks.
User-focused studies
PARC’s user-focused approach makes BaffleText algorithms more commercially viable by ensuring they are not too frustrating for people to use. Drawing on PARC’s long tradition of workplace studies that merge insights from both social and computer sciences, researchers have conducted usability studies to confirm the human legibility and user acceptance of BaffleText images.
PARC is seeking corporate partners interested in using PARC CAPTCHA technology inside their own products and applications. To learn more, please contact Julie Chen, Business Development, 650-812-4758.
The funny thing, the pessimal print CAPTCHA is used to help decode old scanned texts. One word is known and the second is not. The program knows which is which and will have you type both. IF you get the known word correct the computer logs what you typed and after so many people have typed the same string for that word it assumes we all got it right and makes it a known word. When the publisher gets all the words back the document is cleared for printing.
Kinda interesting in a geek sort of way.