Need login help?

Log In   

Updated OCR AppleScripts for Acrobat Pro X, PDFpen

On pages 73–79 of Take Control of Your Paperless Office, I describe a way of automating OCR with Acrobat Pro, PDFpen/PDFpenPro, or Readiris Pro using a combination of AppleScripts that I supply and Mac OS X’s Folder Actions feature. (This process isn’t needed at all when your scanner’s software can perform OCR automatically, or can direct incoming scans to another application that does.)

The scripts I provided initially work fine with Acrobat Standard 7 and Acrobat Pro versions 7, 8, and 9. But the recently released Acrobat Pro X is quite a bit different, requiring me to significantly revise the Acrobat scripts. While I was at it, I also updated the script for PDFpen and PDFpenPro to better detect and use whichever version of the software you have installed.

If you’d like to know the details about why changes were needed in Acrobat Pro X and how I handled them, skip ahead to Acrobat Pro X and Scriptability. Or, if you just want to get down to business, read on.

To begin with, download the new scripts and follow the steps on pages 75–76 to install and configure them.

Having done that, note that Step 2 at the bottom of p. 76 under “Acrobat Scripts” requires a bit of revision in order to work with Acrobat Pro X. It should now read:

2. Choose the Recognize Text command appropriate to the version of Acrobat you’re using:
  • Acrobat Pro X: Click Tools, then Recognize Text, then In This File.
  • Acrobat Pro 8 or 9: Choose Document > OCR Text Recognition > Recognize Text Using OCR.
  • Acrobat Pro or Standard 7: Choose Document > Recognize Text Using OCR > Start.

In addition, the description of the scripts on p. 77, while still accurate for Acrobat versions 7–9, should be modified as follows to cover X too:

Now you’re ready to try out your script—either by scanning a document or by dragging an existing scanned image into the folder to which the AppleScript is attached. I provide two different Acrobat scripts. Both work with Acrobat Standard version 7 and Acrobat Pro versions 7, 8, 9, and X, but they have slightly different behaviors:
  • OCR This (Acrobat): This script causes versions of Acrobat prior to X to recognize the text in PDF documents and then save the file (with the existing name, in the existing location) and close it with no need for interaction at all. For Acrobat Pro X, it runs the “OCR This” Action (see below), after first adding the Action if necessary.
  • OCR This (Acrobat) with Save As: This script causes versions of Acrobat prior to X to recognize the text and then prompt you to enter a name and select a location; after saving the file, the script then instructs Acrobat to close the window. (There may be a very brief delay before the window closes.) For Acrobat Pro X, it runs the “OCR This” Action (see below), after first adding the Action if necessary.

    With this script, the original file remains in the folder to which you’ve attached the folder action; you can later delete it manually if you wish.
WARNING: Do not save the file in the folder to which the OCR This folder action is attached! If you do, this will trigger the script to run again on the new file.

The “OCR This” Action in Acrobat Pro X

The design of Acrobat Pro X precludes running the OCR command directly using AppleScript. However, it does support a type of built-in automation called Actions. So, my OCR This scripts for Acrobat X work around Acrobat’s limitations by creating (a basic version) of an Action that performs OCR, and then running that Action. As created by either of the OCR This (Acrobat) scripts, the “OCR This” Action prompts you to choose a file name and location for saving the searchable file. For complicated reasons, I was unable to reliably automate creation of an Action that would save a file in place, although you can easily modify the script to do so yourself with a few clicks. To change the way the Action behaves, follow these steps:
  1. In Acrobat Pro X, choose File > Action Wizard > Edit Actions.
  2. Select OCR This and click Edit.
  3. To change OCR settings, click the Options button next to “Recognize Text (using OCR)”; this gives you the same settings as described earlier (see Step 3 under “Acrobat Scripts,” p. 77).
  4. To make the Action save the file in place (rather than prompting you with a Save As dialog), choose The Same Folder Selected at Start from the Save To pop-up menu. Alternatively to specify another folder, choose A Folder on My Computer, navigate to a folder, and click Choose.

Acrobat Pro X and Scriptability

I had hoped that I could simply modify my earlier “OCR This (Acrobat)” scripts to work the same way with Acrobat Pro X as they did with versions 7, 8, and 9. However, Adobe made such dramatic (and, in my opinion, unfortunate) changes in the design of Acrobat Pro X that I was forced to rethink my approach entirely.

Incredibly, Acrobat Pro X takes a massive step backward in scriptability—and that’s saying something, because earlier versions were barely scriptable at all. Adobe has implemented an entirely new user interface in Acrobat Pro X that is largely invisible to Apple’s Accessibility API. This means all the tricks I used previously to work around Acrobat’s lack of scriptability no longer function. I not only can’t tell Acrobat to run OCR directly, I also can’t tell it to choose a menu command to run OCR (since there is no longer any such command), and I can’t even use UI scripting to tell AppleScript to pretend to be a mouse clicking controls on the screen, because the controls aren’t standard UI elements.

After several hours of hair-pulling, I finally came up with a hack that should, for most users anyway, get fairly close to the results I had earlier. I’ll be the first to admit that this hack is ugly and fragile, and I can’t guarantee that it will work for you, but it did work for me on two Macs (both running Snow Leopard), and I offer it here entirely AS IS.

Here’s how it works.

One of the new features in Acrobat Pro X is Actions—configurable processes that perform repetitive behaviors, vaguely like Automator actions (but not nearly as good). It turns out that you can create an Action that performs OCR, and then you can use UI scripting to run that Action. What’s more, I was able to automate the creation of (a basic version of) a suitable Action using UI scripting. So, although it happens in a roundabout way, I can approximate the end result of getting an AppleScript to perform OCR in Acrobat Pro X.

As created by either of my scripts, the “OCR This” Action prompts you to choose a file name and location for saving the searchable file. I would have liked, when creating the Action, to specify further options, such as whether to save the file in place or prompt for a new name and location. And in theory I could have used keystrokes to “walk” down pop-up menus to do just that. Alas, those menus get keyboard focus only when the pointer is over the Create New Action window, and that’s something I can’t guarantee with plain-vanilla AppleScript (moving the pointer requires a third-party AppleScript addition, and odds are the average user won’t have that installed).

All that said, I think these scripts do a reasonably good job of solving the problem, and you can of course customize the Action to your liking if need be.

—Posted by Joe Kissell on February 27, 2011

Buy Take Control of Your Paperless Office







Permalink