Secure Hindi Handwritten CAPTCHA Generation using OCR Operations
Keywords:
CAPTCHA, Handwriting synthesis, Hindi script, Cyber security, SpamAbstract
Handwritten CAPTCHAs can be generated from pre-written or synthesized words, with added distortions and noise to survive OCR attacks. This paper takes a different approach for generating CAPTCHAs with the use of OCR operations themselves to secure the CAPTCHAs. Therefore, we utilize a number of operations found in many handwriting recognition systems like segmentation, baseline detection, etc. To distort a pre-written word image itself, so that breaking the resulting CAPTCHA becomes more difficult. These OCR operations are in addition to the global image distortions that are generally done on the CAPTCHAs. The proposed method is reported for Hindi handwritten words as the cursive script of Hindi allows various OCR operations on it. To the best of our knowledge, this work is the first to generate Hindi handwritten CAPTCHAs. We evaluate our method on KATT database of offline Hindi handwritten text. In terms of usability, we have achieved 88% to 90% accuracy. Security evaluation is done using holistic word recognition with accuracy less than 0.5%. Lexicon based attack is made difficult by working at Hindi sub-word level and then randomly selecting sub-words to build a CAPTCHA.
