Quote:
Originally Posted by Jim Thompson
I talked to AABBY technical support this morning. The guy put me on hold and asked his collegues. Then he said that I would need to touch each page to achieve the kind of zoning that I want: To automatically cut off the header/page number of every page in a novel, so that a series of words that began on page 4 and ended on page 5 could be searched as text and found as words in a single sentence. Do you think my misunderstanding was with him or with you? It's very important to me that I transform thousands of pages into searchable text without having to look at each page.
|
My guess: He's wrong; he, and his colleagues, have very likely not zoned thousands of pages and looked for shortcuts to make the process faster. Technically, he's correct; there is no auto-cutoff of headers/footers. However, there are ways to avoid zoning them.
*IF* the pages are the same size & shape, you can open one of them, zone it, save the zoned blocks, and apply those to all pages. I do this often. And then I scroll through the thumbnail view of the pages to see if there are any specific pages needing different zoning--first pages of chapters, or double-column index, or table of contents.
If the pages aren't the same size & shape because the scanning wasn't done with that in mind, I will auto-zone all the pages, flip through them one at a time, and manually adjust the zones, either deleting the tiny box around the header, or dragging it down to just the main body content. This takes longer, but still not as long as manually zoning each page.
The zoning, however it's done, is much quicker and less frustrating than the OCR correction. FineReader's OCR is very good--but checking it requires stopping at each *suspect*, most of which are done correctly & just need to be confirmed.