Using High Volume Scanning, Capture and Workflow to Maximize Business Efficiency
By Ian Llado, Sales Account Manager, Optical Image Technology, Inc. and Arthur Gehring, Director of Marketing, Datacap Inc.
View printable PDF (opens in new window)
How many pages do you process daily? Five hundred? Five thousand? More? Are you wading in piles of papers that are waiting to be scanned and indexed? Sometimes the backlog can seem insurmountable. Fret no longer! High volume scanning is a cost-efficient answer to the challenge of eliminating paper and centralizing information. It can reap benefits well beyond the initial investment. When it is integrated with data capture and a strong, automated workflow product, the possibilities for increasing efficiency are limited only by the imagination of the business managers and users.
High volume scanning and its complementary components help with document preparation through scanning, indexing, data capture, storage and processing. Together, they dramatically accelerate the information availability and delivery. You may ask yourself, “What is high volume scanning? What complimentary components are there, and what do they do? How can this help our organization to improve our business processes? What should we look for in choosing a vendor and solution for scanning, indexing, and improving your business processes?” Read on for answers to these questions and more.
Inputs and Options for Effective High Volume Scanning
If you are regularly scanning your paper documents, that is the key first step. High volume scanning provides a path out from under the paper mountain, literally, but that is just the beginning. Scanning can be the first step in a series in which you can revolutionize the way you do business. Adding scanning and its complementary tools results in improved processes, elevated accuracy, and reduced costs. When a business is looking for a high volume scanning system, there are four key factors to take into consideration: document types; scanning location; scanning equipment and software; and document preparation, identification and classification.
- Document Types – What types of documents are you going to be scanning? You will want to take into account whether the document layouts are very structured or variable. Will changing regulations periodically alter the way the document looks, where key information on the form is located, or form size or color? Is the information contained on the document subject to frequent reorganization? (This will affect how key values can be extracted for indexing purposes.) Believe it or not, a document page color can affect the quality of the electronic image produced during scanning depending upon the scanner that is selected.
How many documents will you need to process in a day? You wouldn’t want to use an everyday desktop scanner to scan in thousands of documents a day. A typical scanner would not be fast enough to handle the workload; it wasn’t designed for that type of duty. What image quality do you need from the scanning process? In what condition is the original paper document? Certain scanners are designed to produce high-quality images and also offer the ability to tweak the settings to produce the best possible images from the paper copy.
- Scanning Location – While you perform scanning, you can also use data capture to transform designated content from paper documents into digital information. You can scan and index remotely, centrally, or use a combination of both. Will the scanning be taking place at the point of origin, or will it be done at a central location? There would be vastly different requirements for a scanner at a remote office that might get 1000 documents a day to scan and a central office for 10 remote offices that needs to scan 10,000 documents a day. Cost savings can result from scanning at the point of origin, since this saves money on shipping paper copies of documents to a central location. Time savings are also realized as less time is spent copying, packaging and shipping, which also reduces cycle times. No matter which route you choose, the earlier you get the data into an electronic format, the sooner you can start processing it.
- Scanning Equipment and Software – All the prior questions come into play here. What type of scanner do you need? Do you just need a high volume scanner? What bells and whistles are you looking for on your scanner? Do you need a scanner with image enhancement and processing capabilities? You must capture the highest quality image possible for the most accurate data recognition. If you just need to scan a lot of paper, your requirements in a scanner will differ from a site that needs to scan a lot of paper while enhancing the electronic image produced. Scanner speed is also a very important factor in high volume scanning settings. You may need to use tools that enable you to de-skew images or drop colors so you can enhance the recognition quality. You will also need to choose the right type of scanner for your document types and regulatory requirements. For example, if you have documents with a variety of font colors, and you need to maintain that look for company or regulatory requirements, you will need a color scanner.
- Document Preparation, Identification and Classification – How much document preparation needs to happen before scanning can begin? How are these documents broken into batches or groups? How are you going to identify and classify these documents? Will you insert separator sheets to identify the batches of documents, or can the documents be separated using barcodes or some other indicator to automatically identify them? Document preparation, encompasses everything from opening the envelope to flattening out the enclosed paperwork, to removing staples and/or paperclips that were holding the documents together. All of this must happen before the scanner is even in the picture. What type of documents are you scanning – all one type or mixed types? Are you scanning individual pages or batches of pages? Are the pages single or double sided? Are the pages universally portrait style, landscape style, or is there a mix? The scanner and software you choose must be able to handle a variety of input types. Some software packages can identify documents using an image, pattern, text, or even fingerprint analysis. Automatically identifying the document type reduces manual intervention, as well as document preparation time and overall cycle time, as the process speed is increased.
Variations on Data Capture and Recognition
Document indexing should be a main focus when evaluating data capture and recognition programs. However, you must do your homework first. You need to know about the documents from which you will be extracting information. From which fields within the document do you wish to pull data? Do these fields vary in location from document to document, or revision to revision, or are these fields in the same constant location? The technology required to extract data from a static location varies from that required to extract data from constantly changing locations on a page. This is something you need to keep in mind. If the location of the information frequently changes, you will need to ensure that the solution you choose can adapt to it.
Another document indexing factor to consider is the type of technology you want to use to pull the data from the document. Do you need to use OCR, ICR, OMR, or Barcode? Some sites use a combination of recognition tools to offer them greater flexibility - for example, an ICR system validating an OCR system.
- OCR, optical character recognition software, reads typed (machine printed) characters. This type of recognition is good if you are capturing information from text documents. However, it does have its limitations. It does not recognize fonts below 8 points very well, or fonts above 24 points. OCR is also greatly affected by the quality of the image. The more faded or fuzzy the image, the less accurate the quality of data captured will be.
- ICR, intelligent character recognition software, primarily reads handwritten characters. It can also read machine-printed characters, but not quite as well as an OCR engine. It is important to note that ICR works best on neat handwriting and when characters are not touching one another (occurrences of which can be decreased by savvy form design).
- OMR, optical mark recognition software, reads checkboxes and bubbles on forms. However, sometimes persons completing forms mark outside of the designated answer areas. You will want to ensure that your OMR program flags these instances for human follow-up to determine the intention and categorize the response correctly. Another consideration to keep in mind with OMR is to make sure you can set separate outputs for a human reviewer’s display and for the electronic recording of the data. What a reviewer needs to see may vary greatly from how you want to store the information electronically. Also keep in mind that if your forms allow multiple answers to a question, the OMR software needs to be able to export that data correctly for review and easy compilation.
- Barcode recognition is an additional means of document indexing. This provides automated indexing for sites that are set up to capitalize on it. Barcode indexing can use barcodes to denote where document batches should be separated. It can also pull data from the barcode and use it to populate information for indices. Barcode recognition has a high accuracy rate; however, barcodes are generally unreadable by people. Thus, the best option is to have a human-readable version available on each page, too, in case the barcode is not recognized and the document has to be processed manually.
Forms processing can help eliminate the indexing bottleneck by automating the indexing of images, which is normally a very time-consuming process. Forms processing acts on business information to feed another data system. This tool reads the data completed on forms and exports it to the desired destination. When evaluating forms processing software, keep in mind what page values and data types you are expecting to process. You will want a forms processor that is flexible enough to handle the variety. You also need to ensure that the forms processor you choose works well with the data recognition technologies you need. For example, can the forms processor read handwriting on forms? You will also need to ensure that the forms processor you choose can output information to the system(s) and/or application(s) you choose. Can it process the information and export it to where it needs to go?
Document indexing and forms processing can be performed simultaneously. Things to keep in mind for both document indexing and forms processing are:
- Delivery – What will you be delivering, in what format, and to what systems? Will you be delivering images, data or indexes? Will you be sending the data to an electronic document management (EDM) system, an automated workflow system, an enterprise resource planning (ERP) system, or to one or more databases or applications like SAP? You need to determine what you will do with the data, and then ensure that the product(s) you selected are able to meet these needs. If you are just saving electronic images in a repository, your system processing needs will be different than those of someone who is exporting data to a database as indices. Depending upon what format you need to deliver, you may need to make sure your solution can convert your information as needed. For example, you may need to convert your information to PDF, xml, tiff, jpg or png. Make sure the solution you choose can handle this conversion.
- Validation – One reason for choosing an automated system is to reduce the number of errors. Validation is a way in which the system itself checks the values it extracts. Do you need to look up data to check it before accepting a value? Do you want to perform this lookup manually, or using an automated system? For example, some systems will allow you to query against a database to ascertain if an entered value is legitimate based on field characters, values, or masks. If the value matches, the validation is successful. If not, the entered value is rejected and can be sent for review.
- Verification – This step is performed by a person who checks values that failed validation, were not recognized by OCR, ICR, OMR, or barcode products, or were given a low confidence rating for accuracy. Once the person reviews the item in question, it can then be run through validation again as a double-check.
Workflow and Business Process Management Tools
Once you are scanning a high volume of data into your system, what can you do to use that data as effectively as possible? Workflow, a business process management tool, allows you to perform your manual business processes electronically. Automated workflow electronically routes your documents through the appropriate business processes. This helps to facilitate decision-making by ensuring that the right people get the right information at the right time. Users can route scanned images and files automatically from any place, at any time, reducing processing time, increasing information security, and assisting with compliance. Check to make sure your intended workflow system tracks the actions performed in order to help you meet regulatory requirements. You can even leverage workflow to improve your processes. Automated workflow allows for the integration of data with other systems. It can utilize the extracted data from capture, pushing or pulling data to or from mainframes and other back-end systems. A sample illustration of data capture feeding workflow would be when a capture application pulls key indexing information from a newly submitted claims form. This extracted data can then be used to launch a new claims form process workflow.
One of the features of a workflow system is that it will allow you to set time requirements for tasks. For example, you can configure the system to allow a request for a signature to take one day. If desired, alerts can be automatically sent if the signature is not completed in the allotted time. Another feature of a workflow system is the ability to deal with absentees. If a user is absent from work or is on vacation, his work can be electronically redirected to another employee. Workflow systems can be set up to deliver jobs according to your instructions. Jobs can be sent to a specific person, a certain group of people, a person in a certain role or position, or even assigned to several people in a queue in order to balance the workload. Users can even be notified via email when they have a new workflow job that requires action.
Workflows can be configured to start automatically when any document, a certain type of document, or a group of objects, enters the system or a specific part of the system. The workflow can be based on indices assigned to the document or characteristics of the document. Workflows can be configured to require that certain steps be performed on a document and can even launch external applications such as Microsoft Word, Microsoft Excel or email programs.
A key to successful workflow implementations is flexibility. Workflow is a dynamic process; therefore, you want to ensure that it can be updated on the fly and that you have an easily adaptable system. For example, if you are using a rules-based system to feed your workflow, you can easily make changes. This type of adaptability is critical in a high volume scanning environment. You need to ensure that your chosen system can handle changes in document types, in business rules, and to the workflow itself. As you expand your use of workflow, it needs to be able to grow with you.
Flexibility is Key
Throughout every aspect of this process, from scanning to workflow, flexibility is a key requirement. Your system needs to be able to accommodate a variety of document types and styles. If your paper document should change, can every system component handle it – from scanning to data recognition to document indexing to workflow? If your workflow can handle the change, but the recognition software cannot get it to workflow, there will be a breakdown in your processing. What if your business rules change? Is your system dynamic enough to adjust? Having a capture system that accepts the change and a workflow system that doesn’t recognize the change will result in a process breakdown.
Making High Volume Scanning Work for You
Combining scanning, data recognition and workflow in one end-to-end solution makes the movement of text, data and image files faster, easier, and more accurate, which results in improved customer service and reduced manual handling. This technology potpourri can help speed data capture and delivery and become a vital link in the business information supply chain. Ensuring that these varied components work together can be a challenge, but with the right partners and preparation, it is one that can be met and surpassed. The many benefits derived from this type of investment make it well worth the effort.
For information about Datacap’s scanning, indexing and forms processing products, visit their website at www.datacap.com or contact Art Gehring at (914) 366-0100 or sales@datacap.com. To learn more about Optical Image Technology’s indexing, barcode server or workflow products, visit their website at www.docfinity.com, call Ian Llado at (814) 238-0038 or contact info@docfinity.com.
©2006 Optical Image Technology, Inc. All rights reserved. DocFinity, IntraVIEWER, and XML FormFLOW are trademarks or registered trademarks of Optical Image Technology, Inc.



