03-24-2009, 04:50 PM | #1 |
Groupie
Posts: 159
Karma: 170
Join Date: Feb 2009
Device: PRS-505
|
ABBYY Finereader and text formating
Hi,
always seems to turn to you guys for the best info regarding editing/conversions/books , have another one to lay on you if you have some experience on finereader 9 pro. Seems each time i try to ocr a pdf and use the formating options all the paragraphs get mixed, well alot of them. So my next option is to use exact copy. Thats great but then if i save to html it gives me some horrible results when porting to lrf or epub. So next step i tried is to convert to doc. Then in doc i get those horrible text boxes. Tried to ctrl-a to select all but nothing happens as all the text in the boxes. Saving to txt yeld to results. So anyone has any idea how to properly either get formated text from finereader without loosing paragraph order or remove all the boxes in word so i can then copy to html and later on convert to epub. any ideas are welcomed! thanks! |
03-24-2009, 05:16 PM | #2 |
Retired & reading more!
Posts: 2,764
Karma: 1884247
Join Date: Sep 2006
Location: North Alabama, USA
Device: Kindle 1, iPad Air 2, iPhone 6S+, Kobo Aura One
|
I use Finereader 9 Pro and have seen the text boxes you refer to but only with PDFs that I got from somewhere else. Most of what I do is with PDFs from my scansnap scanner. They work fine. One possible suggestion - you might try converting the PDF pages to an image format (e.g. JPEG) and input the images to Finereader. I have a program called "PDF to Image Converter" that I've used. You can get more info about it here.
I don't know if that will help. Where I've had the most problems with the text boxes is with brochures I've downloaded & OCRed. They are very frustrating. Good luck. |
Advert | |
|
03-24-2009, 06:31 PM | #3 |
Grand Sorcerer
Posts: 5,185
Karma: 25133758
Join Date: Nov 2008
Location: SF Bay Area, California, USA
Device: Pocketbook Touch HD3 (Past: Kobo Mini, PEZ, PRS-505, Clié)
|
It helps to manually zone the Finereader batch; if it's zoning the pages with text boxes, they'll be text boxes in a Word doc, but if it's all zoned as one big block, that should be standard text on the page.
|
03-24-2009, 10:12 PM | #4 | |
Groupie
Posts: 159
Karma: 170
Join Date: Feb 2009
Device: PRS-505
|
Quote:
|
|
03-24-2009, 10:13 PM | #5 |
Groupie
Posts: 159
Karma: 170
Join Date: Feb 2009
Device: PRS-505
|
That might be an idea, i ll check with the settings, zoning everything in one block would work !
|
Advert | |
|
03-24-2009, 10:20 PM | #6 |
Groupie
Posts: 159
Karma: 170
Join Date: Feb 2009
Device: PRS-505
|
Can't seem to find anything to read in one block. Analyse will always select the read regions. And formated text does a horrible job at mixing all the pragraph... dont undetand shouldn t be too hard to see that what is scans first doesn t come after what is scanned second... weird bug!
|
12-15-2011, 06:37 PM | #7 | |
Enthusiast
Posts: 27
Karma: 10
Join Date: Jul 2011
Device: Kindle Paperwhite
|
Quote:
What you do is select the "Formatted Text" layout in the top bar. I have ABBYY FineReader 11 Pro. So i'm not sure if your version has it. But version 11 is simply amazing. I selected that, and it got rid of all the stupid boxes and put it in a nice flat format, and it allows you to convert this file straight to EPUB and it even has an option to send to Kindle (through email)!! I got a low quality image PDF about PHP and SQL programming converted with VERY little errors, which were easy to correct! Last edited by linnx88; 12-15-2011 at 08:03 PM. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
ABBYY FineReader - Proof reading tips? | PieOPah | Workshop | 23 | 03-02-2012 01:03 AM |
ABBYY Finereader - Possible to command line/auto convert? | tessel | Workshop | 3 | 04-06-2011 11:08 AM |
Abbyy FineReader Dictionaries | Mebyon | Workshop | 2 | 02-10-2010 02:57 PM |
ABBYY FineReader cannot see images | chinesealbumart | Workshop | 8 | 05-15-2009 11:03 PM |
Ended wanted: coupon code for Abbyy finereader | moz | Flea Market | 1 | 03-12-2008 02:10 AM |