Ottawa PC Users' Group, Inc.
 Software Reviews


Optical Character Recognition (OCR) Software
by Dunc Petrie

The startling drop in scanner prices has pushed a niche peripheral into the mainstream and added OCR to the vocabulary.

Simply, OCR software accepts a scan of text-containing material - that is otherwise a bitmap (pixel) graphic or an ordered collection of dots - and converts it to text that can be accepted by a word processor.

OCR software is readily divided into two groups: basic featured and high-end. The former incorporates most products bundled (often called limited or special editions) with a scanner; many offer surprising sophistication and could suffice for the majority of casual users. High-end OCR software offers more sophisticated programs that are better suited to deal with multiple columns of text or pages that combine text, charts and images. If these criteria represent your needs, read on.

Two leading contenders are Xerox' Textbridge Pro 98 and Caere's OmniPage Pro, version 8. I tried their demo versions (15 days for Textbridge and 25 scans for OmniPage). I used each product on a variety of documents and was hard-pressed to choose a winner.

Ideally, 100 percent text recognition is desired. Practically, this depends on many interrelated factors and will vary: one program tended to edge the other on one document but yielded on the next. Some judicious tinkering with the brightness and contrast controls often helps. Regardless, both processed documents faster than I could keyboard them myself. When the pages are complex (text and images, for example) then manual or automatic zoning is often necessary to separate the components.

Caere OmniPage Pro

This is expensive if you must purchase a full version (about $600). Caere has reversed a long-standing policy of offering upgrades solely to users of their products (including entry- level and bundled products); now it offers the upgrade (about $170) in local retail outlets to anyone with any OCR program. Textbridge is also available locally as a full version or an upgrade (of any OCR program); it is usually less expensive (the upgrade is about $100).

Both programs offer integration with recent versions of WordPerfect and MS Word for Windows and output formats that support an extensive choice of applications. Check first to determine if your particular product/version is supported. Textbridge uniquely outputs documents in Adobe's Acrobat (PDF) format; normal, image only and image plus text modes are supported. Both offer support for TWAIN scanners; most recently manufactured scanners support this standard. Textbridge supports many ISIS-standard scanners; OmniPage Pro integrates support for Hewlett Packard's AccuScan feature.

A chance conversation with a Caere owner revealed one annoying feature. After purchasing the product you are forced to register (toll free phone call or on the company's Website) or it will cease to function after 25 sessions; copy protection is alive and well! Worse, this process must be repeated each time it is removed and reinstalled.

Xerox application

Xerox also offers two document management programs that provide: direct scanning into most office suites, OCR, image editing (using MGI PhotoSuite), document indexing and searching (including boolean, proximity and natural language), a colour copy function (needs a colour scanner and printer), fax facility and a forms fill-in module. The more robust, Pagis Pro, version 2, incorporates Textbridge Pro 98 and costs about $130. Its less expensive companion, Pagis ScanWorks, substitutes Textbridge Classic but retains the other features; it costs about $100. A few stores may still offer Pagis Pro 97. For about $100 retail it offers the full version of Textbridge (but Textbridge 96, not 98) and the associated document management features but no image editor.

Pagis Pro/ScanWorks documents are stored in a proprietary format - XIF - that supports selective storage methods (for example, different resolution and bit depths of text or graphics) in the internal file format to optimize file size, resolution and colour fidelity. To implement this paradigm your TWAIN driver is overlaid with a proprietary interface. Some TWAIN drivers will not cooperate and generate error messages. In these stubborn cases an option to install the scanner's proprietary interface is provided. (Be careful: this feature may not be available in the earlier Pagis Pro 97.) Unfortunately, non-compatible scanners may lose some XIF-based features.

This is an intriguing product; it sells for little more (about $130) than the basic Textbridge Pro but it has received very good reviews. Check the supported scanner list at www.pagis.com before purchasing.

Adobe Capture

Finally, any look at advanced OCR software is incomplete without mentioning Adobe Capture, a module within Adobe's Acrobat suite. This program offers a unique approach to OCR. The previous programs translated the bitmap graphics to text or their "best guess." The latter might include wrong characters or simply gibberish. If Acrobat Capture cannot reliably translate the information then it simply places the original bitmap in the document - no guessing. "Average" users are supported by the integrated module but high volume (presumably corporate) users can upgrade: the trade-off is more features but additional licensing fees. Other Adobe Acrobat modules provide additional document management features that not only maintain the original format of complex documents but also permit extensive reformatting, indexing and even multimedia additions. While Pagis Pro 2 exhibits some of these features the Adobe Acrobat suite remains unique in others; however, the latter is more expensive (about $300).

From my limited perspective, Adobe Acrobat documents appear pervasive on the Web. Another benchmark: today's CD-ROM based software frequently includes the user help manuals as Acrobat files. However, these personal observations do not constitute a formal survey. If price is the principal consideration then Pagis Pro is the least expensive choice. However, cross-platform compatibility, documents for publication on the Web, (large) enterprise publication and extensive multimedia integration would likely drive the decision into Adobe's camp. Both companies are powerhouses in document management; I expect that each product has a devoted following. 


Bottom Line:

Proprietary packages:
OmniPage Pro ($600) from Caere
Pagis Pro Version 2 ($130) and Pagis ScanWorks ($100) from Xerox
Web Site: http://www.pagis.com
Adobe Capture Module in Adobe Acrobat ($300) from Adobe


Copyright and Usage
Ottawa Personal Computer Users' Group (OPCUG), Inc. 
3 Thatcher St. 
Ottawa, Ontario 
K2G 1S6 

The opinions expressed in these reviews may not necessarily
represent the views of the OPCUG or its members.
Original HTML coding for this page provided by Alan German
Page created: 25-Nov-98