12-01-2016, 09:48 AM | #1 |
eBook FANatic
Posts: 18,301
Karma: 16071131
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
|
ABBYY Fine Reader
I either I have a complete mis-understanding or the ABBYY people dont't understand and answer my querys with references to their manual which is useless as ......
I wish to control the "mode e.g. greyscale or rgb" and "forma e.g. jpg or png of illustrations. I find references to these in the "Tools" menu, but no place to select values ." When I'm through editing in Fine Reader, I save the book as HTML. If anyone can help, he or she will receive my blessings to the seventh generation, and I'm sure HarryT will ensure this gets to the proper forum. |
12-10-2016, 12:18 AM | #2 | |
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
The only way I know of to specify the output image is to Right Click the thumbnail of the page on the left side and press "Save Selected Pages as Images": then you can select whatever image type you want from the dropdown: The only thing is this will export the entire page as an image... So you will have to do your image manipulation in an outside program. Last edited by Tex2002ans; 12-10-2016 at 12:23 AM. |
|
Advert | |
|
12-10-2016, 04:05 AM | #3 |
eBook FANatic
Posts: 18,301
Karma: 16071131
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
|
Thank you, sir.
You describe what I have also found. I was hoping for more control. |
12-29-2018, 07:05 AM | #4 |
Connoisseur
Posts: 77
Karma: 2178856
Join Date: Oct 2013
Device: Kobo Clara HD
|
Problem mit Sperrschrift - problem with spaced type
Bei älteren Büchern mit Sperrschrift macht finereader beim ocr oft daraus Wörter mit Leerzeichen. Wenn man ein epub daraus macht, geht die Durchsuchbarkeit für die betreffenden Wörter verloren. Wie kann man die Wörter mit Leerzeichen automatisch umwandeln in Wörter ohne Leerzeichen?
For older books with a blocking font, finereader often makes words with spaces in the ocr. When you make an epub of it, the searchability for the words in question is lost. How to automatically convert the words with spaces into words without spaces? |
12-29-2018, 08:25 AM | #5 |
eBook Enthusiast
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
I'm afraid that any OCR process is going to involve manual editing afterwards to get a usable file. OCR is pretty good, but it's far from perfect.
|
Advert | |
|
12-29-2018, 11:54 AM | #6 |
Grand Sorcerer
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
The wiki articles OCR and OCR villains can provide some things to watch for. A spell checker is often a good thing to use to find initial problems with the output of OCR documents. As HarryT said you will need to proof read and manual fix errors.
Dale |
12-31-2018, 05:33 AM | #7 | |
eBook Enthusiast
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
Quote:
|
|
01-05-2019, 04:46 AM | #8 |
Connoisseur
Posts: 77
Karma: 2178856
Join Date: Oct 2013
Device: Kobo Clara HD
|
@HarryT
@DaleDe auf deutsch: Was ich meinte ist Folgendes: Nach dem Speichern des Buchs mit Finereader als epub wurden Wörter, die im Original in Sperrschrift gedruckt waren als Wörter mit Leerzeichen dargestellt (z.B.: W o r t oder w o r d). Ich möchte nun in Sigil mit regex jedes W o r t bzw. w o r d finden das mit Leerzeichen dargestellt ist, und dann die gefundenen Wörter mit Leerzeicen durch dieselben Wörter, aber ohne Leerzeichen, ersetzen. Vielleicht hat jemand ne Idee, wie man das mit regex vereinfachen kann? Das Thema betrifft also zum einen Finereader als Problem (Verursacher des Fehlers), aber zum anderen Sigil (bzw. Regex) als Lösung (Korrektur des Fehlers). Eigentlich gehört der Thread nicht nur zu Finereader, sondern auch zu Sigil. In English: What I meant is this: After saving the book with finereader as epub, words that were originally printed in block letters were represented as words with blanks (for example: W o r d or w o r d). In sigil, I would like to find with regex any W o r d or w o r d that are shown with spaces, and then replace the found words with spaces by the same words, but without spaces. Maybe someone has an idea how to simplify this with regex? So the topic is finereader-topic as a problem (cause of the error), but also a sigil-topic (or regex-topic) as a solution (correction of the error). Actually, the thread belongs not only to finereader, but also to Sigil. |
01-05-2019, 06:37 AM | #9 | |
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Search: \b(\w) (\w) (\w) (\w)\b Replace: \1\2\3\4 That should point you towards all of these spaced out words, so: Find: a b c d Replace: abcd Or maybe you can start out with more \w... like 7 or 8 of them, then work your way down. |
|
01-05-2019, 12:06 PM | #10 | |
Grand Sorcerer
Posts: 5,582
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
@famfam I found a suitable regex in a German forum and used it to create a simple throwaway plugin that should automatically remove all unwanted spaces. Please make a backup copy before running this plugin! Note that if you uncomment the following line in plugin.py by removing the # sign: Code:
#unspaced_word = '<span class="italics">{}</span>'.format(unspaced_word)
And for completeness' sake here are instruction for Calibre Editor:
BTW, you also might want to post your question in the German MR subforum. Last edited by Doitsu; 01-06-2019 at 06:42 AM. |
|
01-07-2019, 09:47 AM | #11 |
Connoisseur
Posts: 77
Karma: 2178856
Join Date: Oct 2013
Device: Kobo Clara HD
|
@doitso
Bingo. I tried the plugin 'spacer'. auf deutsch: Ich habe das plugin getestet, und finde das resultat ermutigend. Ich werde aber noch weiter testen. Finereader macht eben doch soviele unvorhersehbare Fehler beim ocr, dass auch das beste plugin nicht alles reparieren kann. in english: I have tested the plugin, and find the result encouraging. I will still test further. Finereader just makes so many unpredictable errors in the ocr that even the best plugin can not fix everything. "the plugin will wrap all replaced words in <span> tags. ..." Das und den folgenden Tip hab ich noch nicht getestet. Ich melde mich, sobald ich es verstanden oder falsch verstanden habe. I have not tested this and the following tip. I'll contact you as soon as I understand or misunderstand it. Last edited by famfam; 01-07-2019 at 09:56 AM. |
05-25-2019, 04:38 AM | #12 |
Connoisseur
Posts: 77
Karma: 2178856
Join Date: Oct 2013
Device: Kobo Clara HD
|
Area type image or Background picture in Finereader
Bereichstyp Bild oder Hintergrundbild
Area type image or Background picture When to choose which area type in finereader? For what purpose is the area type Background picture intended? The images will be used in epub (Kindle Paperwhite and Kobo Clara). |
05-25-2019, 05:32 AM | #13 |
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
You only have to change the area type if Finereader chose it wrong automatically. These are pretty much the only 3 you're ever going to see:
If Finereader misses marking a box around some text, like a tricky header/footer or a caption, you can click on the Draw Recognition Area box and manually create your own:
Once you run OCR on that page, it'll automatically resize the box and convert into one of the above 3 types. Probably when you have a transparent image behind the text or some sort of watermark. I've never seen this in the wild or even work correctly, but I almost exclusively work on books + B&W. Maybe it's more prevalent in business documents and color. |
06-06-2019, 12:44 PM | #14 |
mostly an observer
Posts: 1,515
Karma: 987654
Join Date: Dec 2012
Device: Kindle
|
My third novel was about ski-bums, and when I scanned it through Finereader to a Word doc, the "m" was every time rendered as "rn", so that it became a book about ski-burns.
|
06-06-2019, 01:10 PM | #15 | |
Grand Sorcerer
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
Quote:
Dale Last edited by DaleDe; 06-07-2019 at 12:13 PM. |
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
ABBYY Lingvo format dictionary in Neo Reader? | cicabum | Onyx Boox | 3 | 12-26-2015 02:39 PM |
Converting from PDF to ePub using Abbyy Fine Reader | Mr Davo | ePub | 13 | 06-20-2013 03:43 PM |
If I have ABBYY Finereader, do I need ABBYY PDF Transformer? | graycyn | 2 | 06-12-2012 06:23 PM | |
Epub works fine on Reader, fails epubcheck spectacularly | jmatthew | ePub | 3 | 01-05-2011 06:03 AM |
Calibre epub works fine on Reader, fails epubcheck spectacularly | jmatthew | Calibre | 2 | 01-04-2011 03:12 PM |