There is hardly a business in the world that does not use forms to some extent. Since they are useful for collecting and recording data from a large number of individuals, forms are often used for invoices, purchase orders, insurance claims, medical records, tax statements, credit card applications, and many others. However, processing forms requires data entry, which can be costly to a business in terms of both time and money. Human data entry is costly and prone to error, which is why many businesses seek an automated solution. Automated forms processing software relies on OCR/ICR (optical/intelligent character recognition) technology, and can make forms processing much more efficient and economical. Automated forms processing technology is relatively new, and has not been perfected, but there are ways of optimizing its performance.

When a document or form is initially scanned into a computer, it is typically stored as an image file like a TIFF, JPG, or PDF. When viewed on the screen, the user can read these documents just as easily as if they were physically right in front of them. However, to the computer, these image files are just meaningless pictures; the computer cannot ‘read’ the text from them like a human. This is where OCR software comes in. OCR software performs an analysis of the light and dark areas of an image in order to locate text; when it is found, it is identified. OCR is extremely accurate when used with clear, high resolution PDFs. However, it is not good at recognizing handwriting. ICR, intelligent character recognition, is designed for handwriting recognition, and employs various handwriting-specific recognition algorithms. Unfortunately, ICR technology has not advanced to the point where the user can just run an ICR program on any document and receive machine-editable text. Luckily, there are ways to help ICR software do its job well, and to effectively use it for forms processing.

ICR and Forms Processing

One way that forms processing software solutions help ICR do its job effectively is to optimize document images for ICR processing. This often means despecking and deskewing images in order to make the writing as clear as possible. Automated forms processing solutions are also generally custom-tailored to their applications in order to provide as much helpful information as possible to the ICR engine. For example, if it is known beforehand that a given field of a form will contain only numbers, that information can be passed on to the ICR, greatly reducing the possibility for error. These specifications are called validation rules, and are crucial to developing an accurate forms processing solution, since ICR is a new and developing technology. Generally, ICR based solutions also include a manual validation step to confirm the software’s results. ICR based forms processing solutions can be very useful providing they are properly designed and implemented based on the task they are designed to accomplish. Always be wary of stock ICR software as it is likely to be highly inaccurate.

OCR and Forms Processing

Forms processing solutions requiring only OCR are generally much simpler to implement. OCR technology has made huge advances in recent years, and most OCR software is over 90% accurate. Of course, OCR only works on machine generated fonts, and is can’t be used for handwriting. Using good quality, clean scans of forms ensures that OCR software can do its job properly. Of course, there is still a degree of customization needed in order to develop an OCR solution specific to a given task.

