JISE

CAPTCHA is now commonly used as standard security technology to tell computers and humans apart. The most widely deployed CAPTCHAs are text-based schemes. In this paper, we document how we have broken such a text-based scheme which uses the "connecting characters together (CCT)" principle. CAPTCHAs of this type can be classified into three types: CAPTCHA with overlap but no noise arcs; CAPTCHA with noise arcs but no overlap; and CAPTCHA without noise arcs and overlap. Yahoo!, Baidu CAPTCHA and reCAPTCHA were selected as representatives of the three types. The CCT CAPTCHA is effectively resistant to segmentation and recognition in the early attacks. In contrast to early works that recognized the text after segmentation, we combined recognition with segmentation to break the CCT CAPTCHA. Our method segments the text by extracting the recognized characters. The experiments show that our extraction and segmentation attack on Yahoo! CAPTCHA achieved a success rate of 78% and an overall (individual character recognition with OCR) success rate of 55%. The segmentation and recognition success rate of Baidu CAPTCHA was 34%. On re- CAPTCHA we achieved 34% success rate.