The startling drop in scanner prices has pushed a
        niche peripheral into the mainstream and added OCR to the
        vocabulary. 
        Simply, OCR software accepts a scan of text-containing
        material - that is otherwise a bitmap (pixel) graphic or
        an ordered collection of dots - and converts it to text
        that can be accepted by a word processor. 
        OCR software is readily divided into two groups: basic
        featured and high-end. The former incorporates most
        products bundled (often called limited or special
        editions) with a scanner; many offer surprising
        sophistication and could suffice for the majority of
        casual users. High-end OCR software offers more
        sophisticated programs that are better suited to deal
        with multiple columns of text or pages that combine text,
        charts and images. If these criteria represent your
        needs, read on. 
        Two leading contenders are Xerox' Textbridge Pro 98
        and Caere's OmniPage Pro, version 8. I tried their demo
        versions (15 days for Textbridge and 25 scans for
        OmniPage). I used each product on a variety of documents
        and was hard-pressed to choose a winner. 
        Ideally, 100 percent text recognition is desired.
        Practically, this depends on many interrelated factors
        and will vary: one program tended to edge the other on
        one document but yielded on the next. Some judicious
        tinkering with the brightness and contrast controls often
        helps. Regardless, both processed documents faster than I
        could keyboard them myself. When the pages are complex (text
        and images, for example) then manual or automatic zoning
        is often necessary to separate the components. 
        Caere OmniPage Pro 
        This is expensive if you must purchase a full version
        (about $600). Caere has reversed a long-standing policy
        of offering upgrades solely to users of their products (including
        entry- level and bundled products); now it offers the
        upgrade (about $170) in local retail outlets to anyone
        with any OCR program. Textbridge is also available
        locally as a full version or an upgrade (of any OCR
        program); it is usually less expensive (the upgrade is
        about $100). 
        Both programs offer integration with recent versions
        of WordPerfect and MS Word for Windows and output formats
        that support an extensive choice of applications. Check
        first to determine if your particular product/version is
        supported. Textbridge uniquely outputs documents in
        Adobe's Acrobat (PDF) format; normal, image only and
        image plus text modes are supported. Both offer support
        for TWAIN scanners; most recently manufactured scanners
        support this standard. Textbridge supports many ISIS-standard
        scanners; OmniPage Pro integrates support for Hewlett
        Packard's AccuScan feature. 
        A chance conversation with a Caere owner revealed one
        annoying feature. After purchasing the product you are
        forced to register (toll free phone call or on the
        company's Website) or it will cease to function after 25
        sessions; copy protection is alive and well! Worse, this
        process must be repeated each time it is removed and
        reinstalled. 
        Xerox application 
        Xerox also offers two document management programs
        that provide: direct scanning into most office suites,
        OCR, image editing (using MGI PhotoSuite), document
        indexing and searching (including boolean, proximity and
        natural language), a colour copy function (needs a colour
        scanner and printer), fax facility and a forms fill-in
        module. The more robust, Pagis Pro, version 2,
        incorporates Textbridge Pro 98 and costs about $130. Its
        less expensive companion, Pagis ScanWorks, substitutes
        Textbridge Classic but retains the other features; it
        costs about $100. A few stores may still offer Pagis Pro
        97. For about $100 retail it offers the full version of
        Textbridge (but Textbridge 96, not 98) and the associated
        document management features but no image editor. 
        Pagis Pro/ScanWorks documents are stored in a
        proprietary format - XIF - that supports selective
        storage methods (for example, different resolution and
        bit depths of text or graphics) in the internal file
        format to optimize file size, resolution and colour
        fidelity. To implement this paradigm your TWAIN driver is
        overlaid with a proprietary interface. Some TWAIN drivers
        will not cooperate and generate error messages. In these
        stubborn cases an option to install the scanner's
        proprietary interface is provided. (Be careful: this
        feature may not be available in the earlier Pagis Pro 97.)
        Unfortunately, non-compatible scanners may lose some XIF-based
        features. 
        This is an intriguing product; it sells for little
        more (about $130) than the basic Textbridge Pro but it
        has received very good reviews. Check the supported
        scanner list at www.pagis.com
        before purchasing. 
        Adobe Capture 
        Finally, any look at advanced OCR software is
        incomplete without mentioning Adobe Capture, a module
        within Adobe's Acrobat suite. This program offers a
        unique approach to OCR. The previous programs translated
        the bitmap graphics to text or their "best guess."
        The latter might include wrong characters or simply
        gibberish. If Acrobat Capture cannot reliably translate
        the information then it simply places the original bitmap
        in the document - no guessing. "Average" users
        are supported by the integrated module but high volume (presumably
        corporate) users can upgrade: the trade-off is more
        features but additional licensing fees. Other Adobe
        Acrobat modules provide additional document management
        features that not only maintain the original format of
        complex documents but also permit extensive reformatting,
        indexing and even multimedia additions. While Pagis Pro 2
        exhibits some of these features the Adobe Acrobat suite
        remains unique in others; however, the latter is more
        expensive (about $300). 
        From my limited perspective, Adobe Acrobat documents
        appear pervasive on the Web. Another benchmark: today's
        CD-ROM based software frequently includes the user help
        manuals as Acrobat files. However, these personal
        observations do not constitute a formal survey. If price
        is the principal consideration then Pagis Pro is the
        least expensive choice. However, cross-platform
        compatibility, documents for publication on the Web, (large)
        enterprise publication and extensive multimedia
        integration would likely drive the decision into Adobe's
        camp. Both companies are powerhouses in document
        management; I expect that each product has a devoted
        following.  
Bottom Line:
OmniPage Pro (Proprietary, $600)
Caere Corporation
Pagis Pro Version 2 (Proprietary, $130)
Pagis ScanWorks (Proprietary, $100)
Xerox Corporation
http://www.pagis.com
Adobe Capture Module in Adobe Acrobat (Proprietary, $300)
Adobe Systems Incorporated
Originally published: November, 1998