Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 12-01-2016, 09:48 AM   #1
crutledge
eBook FANatic
crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.
 
crutledge's Avatar
 
Posts: 18,301
Karma: 16071131
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
ABBYY Fine Reader

I either I have a complete mis-understanding or the ABBYY people dont't understand and answer my querys with references to their manual which is useless as ......

I wish to control the "mode e.g. greyscale or rgb" and "forma e.g. jpg or png of illustrations.

I find references to these in the "Tools" menu, but no place to select values ."

When I'm through editing in Fine Reader, I save the book as HTML.

If anyone can help, he or she will receive my blessings to the seventh generation, and I'm sure HarryT will ensure this gets to the proper forum.
crutledge is offline   Reply With Quote
Old 12-10-2016, 12:18 AM   #2
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by crutledge View Post
I wish to control the "mode e.g. greyscale or rgb" and "forma e.g. jpg or png of illustrations.
I don't believe you can have Finereader automatically export the images to a specific image format. (If I recall correctly, it exports Color as JPG and B&W as PNG).

The only way I know of to specify the output image is to Right Click the thumbnail of the page on the left side and press "Save Selected Pages as Images":

Click image for larger version

Name:	FinereaderImages1.png
Views:	426
Size:	37.0 KB
ID:	153473

then you can select whatever image type you want from the dropdown:

Click image for larger version

Name:	FinereaderImages2.png
Views:	368
Size:	25.9 KB
ID:	153474

The only thing is this will export the entire page as an image... So you will have to do your image manipulation in an outside program.

Last edited by Tex2002ans; 12-10-2016 at 12:23 AM.
Tex2002ans is offline   Reply With Quote
Advert
Old 12-10-2016, 04:05 AM   #3
crutledge
eBook FANatic
crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.
 
crutledge's Avatar
 
Posts: 18,301
Karma: 16071131
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
Thank you, sir.

You describe what I have also found. I was hoping for more control.
crutledge is offline   Reply With Quote
Old 12-29-2018, 07:05 AM   #4
famfam
Connoisseur
famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.
 
Posts: 77
Karma: 2178856
Join Date: Oct 2013
Device: Kobo Clara HD
Problem mit Sperrschrift - problem with spaced type

Bei älteren Büchern mit Sperrschrift macht finereader beim ocr oft daraus Wörter mit Leerzeichen. Wenn man ein epub daraus macht, geht die Durchsuchbarkeit für die betreffenden Wörter verloren. Wie kann man die Wörter mit Leerzeichen automatisch umwandeln in Wörter ohne Leerzeichen?

For older books with a blocking font, finereader often makes words with spaces in the ocr. When you make an epub of it, the searchability for the words in question is lost. How to automatically convert the words with spaces into words without spaces?
famfam is offline   Reply With Quote
Old 12-29-2018, 08:25 AM   #5
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
I'm afraid that any OCR process is going to involve manual editing afterwards to get a usable file. OCR is pretty good, but it's far from perfect.
HarryT is offline   Reply With Quote
Advert
Old 12-29-2018, 11:54 AM   #6
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
The wiki articles OCR and OCR villains can provide some things to watch for. A spell checker is often a good thing to use to find initial problems with the output of OCR documents. As HarryT said you will need to proof read and manual fix errors.

Dale
DaleDe is offline   Reply With Quote
Old 12-31-2018, 05:33 AM   #7
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Quote:
Originally Posted by DaleDe View Post
The wiki articles OCR and OCR villains can provide some things to watch for. A spell checker is often a good thing to use to find initial problems with the output of OCR documents. As HarryT said you will need to proof read and manual fix errors.

Dale
I've just added another entry to the "OCR villains" page which wasn't there, and that's the misinterpretation of the letter pair "cl" as "d", so you end up with "clock" as "dock", "close" as "dose", etc. That's one I've come across a lot.
HarryT is offline   Reply With Quote
Old 01-05-2019, 04:46 AM   #8
famfam
Connoisseur
famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.
 
Posts: 77
Karma: 2178856
Join Date: Oct 2013
Device: Kobo Clara HD
@HarryT
@DaleDe

auf deutsch:

Was ich meinte ist Folgendes:
Nach dem Speichern des Buchs mit Finereader als epub wurden Wörter, die im Original in Sperrschrift gedruckt waren als Wörter mit Leerzeichen dargestellt (z.B.: W o r t oder w o r d). Ich möchte nun in Sigil mit regex jedes W o r t bzw. w o r d finden das mit Leerzeichen dargestellt ist, und dann die gefundenen Wörter mit Leerzeicen durch dieselben Wörter, aber ohne Leerzeichen, ersetzen.
Vielleicht hat jemand ne Idee, wie man das mit regex vereinfachen kann?
Das Thema betrifft also zum einen Finereader als Problem (Verursacher des Fehlers), aber zum anderen Sigil (bzw. Regex) als Lösung (Korrektur des Fehlers). Eigentlich gehört der Thread nicht nur zu Finereader, sondern auch zu Sigil.

In English:

What I meant is this:
After saving the book with finereader as epub, words that were originally printed in block letters were represented as words with blanks (for example: W o r d or w o r d). In sigil, I would like to find with regex any W o r d or w o r d that are shown with spaces, and then replace the found words with spaces by the same words, but without spaces.
Maybe someone has an idea how to simplify this with regex?
So the topic is finereader-topic as a problem (cause of the error), but also a sigil-topic (or regex-topic) as a solution (correction of the error). Actually, the thread belongs not only to finereader, but also to Sigil.
famfam is offline   Reply With Quote
Old 01-05-2019, 06:37 AM   #9
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by famfam View Post
After saving the book with finereader as epub, words that were originally printed in block letters were represented as words with blanks (for example: W o r d or w o r d). In sigil, I would like to find with regex any W o r d or w o r d that are shown with spaces, and then replace the found words with spaces by the same words, but without spaces.
Maybe someone has an idea how to simplify this with regex?
I doubt there are many legitimate 4+ single characters by themselves:

Search: \b(\w) (\w) (\w) (\w)\b
Replace: \1\2\3\4

That should point you towards all of these spaced out words, so:

Find: a b c d
Replace: abcd

Or maybe you can start out with more \w... like 7 or 8 of them, then work your way down.
Tex2002ans is offline   Reply With Quote
Old 01-05-2019, 12:06 PM   #10
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,582
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by Tex2002ans View Post
I doubt there are many legitimate 4+ single characters by themselves
Since there are no italics blackletter fonts, German printers had to use increased letter spacing for emphasis. I.e., there might be even longer words.

@famfam I found a suitable regex in a German forum and used it to create a simple throwaway plugin that should automatically remove all unwanted spaces. Please make a backup copy before running this plugin!

Note that if you uncomment the following line in plugin.py by removing the # sign:

Code:
#unspaced_word = '<span class="italics">{}</span>'.format(unspaced_word)
the plugin will wrap all replaced words in <span> tags.

And for completeness' sake here are instruction for Calibre Editor:
  • Paste (?<=[ ])([\w][ ]+){1,}[\w](?=[ .,:;!?]) in the Find box.
  • Select Regex-function from the Mode drop-down box.
  • Click Create/Edit, paste the following code in the Code box:

    Code:
    def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
        spaced_word = match.group(0)
        unspaced_word = spaced_word.replace(' ', '')
        #unspaced_word = '<span class="italics">{}</span>'.format(unspaced_word)
        return unspaced_word
  • enter a function name, e.g. Spacer, and click OK.

BTW, you also might want to post your question in the German MR subforum.
Attached Files
File Type: zip Spacer_v0.0.1.zip (1.2 KB, 234 views)

Last edited by Doitsu; 01-06-2019 at 06:42 AM.
Doitsu is offline   Reply With Quote
Old 01-07-2019, 09:47 AM   #11
famfam
Connoisseur
famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.
 
Posts: 77
Karma: 2178856
Join Date: Oct 2013
Device: Kobo Clara HD
@doitso

Bingo. I tried the plugin 'spacer'.

auf deutsch:
Ich habe das plugin getestet, und finde das resultat ermutigend. Ich werde aber noch weiter testen. Finereader macht eben doch soviele unvorhersehbare Fehler beim ocr, dass auch das beste plugin nicht alles reparieren kann.

in english:
I have tested the plugin, and find the result encouraging. I will still test further. Finereader just makes so many unpredictable errors in the ocr that even the best plugin can not fix everything.

"the plugin will wrap all replaced words in <span> tags. ..."

Das und den folgenden Tip hab ich noch nicht getestet. Ich melde mich, sobald ich es verstanden oder falsch verstanden habe.

I have not tested this and the following tip. I'll contact you as soon as I understand or misunderstand it.

Last edited by famfam; 01-07-2019 at 09:56 AM.
famfam is offline   Reply With Quote
Old 05-25-2019, 04:38 AM   #12
famfam
Connoisseur
famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.famfam ought to be getting tired of karma fortunes by now.
 
Posts: 77
Karma: 2178856
Join Date: Oct 2013
Device: Kobo Clara HD
Area type image or Background picture in Finereader

Bereichstyp Bild oder Hintergrundbild
Area type image or Background picture

When to choose which area type in finereader?

For what purpose is the area type Background picture intended?

The images will be used in epub (Kindle Paperwhite and Kobo Clara).
famfam is offline   Reply With Quote
Old 05-25-2019, 05:32 AM   #13
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by famfam View Post
When to choose which area type in finereader?
You only have to change the area type if Finereader chose it wrong automatically. These are pretty much the only 3 you're ever going to see:
  • Text (Green)
  • Table (Blue)
  • Picture (Red)

If Finereader misses marking a box around some text, like a tricky header/footer or a caption, you can click on the Draw Recognition Area box and manually create your own:
  • Recognition Area (Gray)

Once you run OCR on that page, it'll automatically resize the box and convert into one of the above 3 types.

Quote:
Originally Posted by famfam View Post
For what purpose is the area type Background picture intended?
Probably when you have a transparent image behind the text or some sort of watermark.

I've never seen this in the wild or even work correctly, but I almost exclusively work on books + B&W. Maybe it's more prevalent in business documents and color.
Tex2002ans is offline   Reply With Quote
Old 06-06-2019, 12:44 PM   #14
Notjohn
mostly an observer
Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.
 
Posts: 1,515
Karma: 987654
Join Date: Dec 2012
Device: Kindle
Quote:
Originally Posted by HarryT View Post
I've just added another entry to the "OCR villains" page which wasn't there, and that's the misinterpretation of the letter pair "cl" as "d", so you end up with "clock" as "dock", "close" as "dose", etc. That's one I've come across a lot.
My third novel was about ski-bums, and when I scanned it through Finereader to a Word doc, the "m" was every time rendered as "rn", so that it became a book about ski-burns.
Notjohn is offline   Reply With Quote
Old 06-06-2019, 01:10 PM   #15
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
Quote:
Originally Posted by Notjohn View Post
My third novel was about ski-bums, and when I scanned it through Finereader to a Word doc, the "m" was every time rendered as "rn", so that it became a book about ski-burns.
Check the OCR villains in our wiki. Both rn and cl were there and several more.

Dale

Last edited by DaleDe; 06-07-2019 at 12:13 PM.
DaleDe is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
ABBYY Lingvo format dictionary in Neo Reader? cicabum Onyx Boox 3 12-26-2015 02:39 PM
Converting from PDF to ePub using Abbyy Fine Reader Mr Davo ePub 13 06-20-2013 03:43 PM
If I have ABBYY Finereader, do I need ABBYY PDF Transformer? graycyn PDF 2 06-12-2012 06:23 PM
Epub works fine on Reader, fails epubcheck spectacularly jmatthew ePub 3 01-05-2011 06:03 AM
Calibre epub works fine on Reader, fails epubcheck spectacularly jmatthew Calibre 2 01-04-2011 03:12 PM


All times are GMT -4. The time now is 12:24 PM.


MobileRead.com is a privately owned, operated and funded community.