Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 02-26-2014, 06:47 PM   #1
u238110
Connoisseur
u238110 began at the beginning.
 
Posts: 91
Karma: 10
Join Date: Feb 2014
Location: Long Island, NY
Device: Aura, N514KUBKKEP, 4.7.10.413
ABBYY FineReader doesn't seem to retain page layout

Screen shot speaks for itself. Also, anything else a beginner should know about ABBYY FineReader?




Does epub even support centered text and thing likes that? The methodology of these e-reader formats seems to be that of conforming the text to the display rather than maintaining the original look of the document. In any case, I still want to know why ABBYY FineReader is doing this.
u238110 is offline   Reply With Quote
Old 02-26-2014, 07:04 PM   #2
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
Certainly ePub supports centered text and all sorts of things like that. Read about ePub in our Wiki and check out CSS.

Dale
DaleDe is offline   Reply With Quote
Old 02-26-2014, 11:53 PM   #3
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
EPUB export is only available as "Formatted Text" or "Plain Text". If you take a look at the explanation in the User's Guide, page 24:

http://www.abbyy.com/fr11guide_en.pdf

Quote:
[...]
c. Formatted Text
Retains fonts, font sizes, and paragraphs, but does not retain the exact spacing or locations of the objects on the page. The text produced will be left–aligned. Texts in right–to–left scripts will be right–aligned.

Note: Vertical texts will be changed to horizontal in this mode.

d. Plain Text
This mode does not retain text formatting.
Although that is a slight lie, the EPUB export strips out even the fonts/font sizes to make even cleaner code.

If you wanted more complex formatting to be carried over, you would have to export to one of the formats supported in "Exact Copy", "Editable Copy", or you can take a shot at trying to clean up the HTML Exported by using "Flexible Copy" (although let me tell you, the HTML is UGLY).

Last edited by Tex2002ans; 02-27-2014 at 12:05 AM.
Tex2002ans is offline   Reply With Quote
Old 02-27-2014, 02:29 PM   #4
u238110
Connoisseur
u238110 began at the beginning.
 
Posts: 91
Karma: 10
Join Date: Feb 2014
Location: Long Island, NY
Device: Aura, N514KUBKKEP, 4.7.10.413
That explains it! Thanks! Now, going by DaleDe's post, it seems like ABBYY FineReader uses a stripped down implementation of epub. Meaning they don't allows users to use the full extent of the format. Any idea why this is?
u238110 is offline   Reply With Quote
Old 02-27-2014, 03:31 PM   #5
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by AIDM2 View Post
That explains it! Thanks! Now, going by DaleDe's post, it seems like ABBYY FineReader uses a stripped down implementation of epub. Meaning they don't allows users to use the full extent of the format. Any idea why this is?
Well the EPUB export was just introduced in Finereader 11.... so they probably have tweaks they can do in making the export better in future versions (I see Finereader 12 was just recently released earlier this month, although no mention of better EPUB support).

Since the document is stored in a proprietary format, the Finereader engine itself probably transforms their data into some intermediary, and then transform THAT into the desired output format:
  • When choosing "Formatted Text"
    • Step 1: Full Representation
    • Step 2: Formatted Text intermediary (stripping out most formatting, making it left aligned, only carrying over basic formatting)
    • Step 3: Export to chosen format
  • When choosing "Plain Text"
    • Step 1: Full Representation
    • Step 2: Plain Text intermediary (stripping everything, only carrying over basic text)
    • Step 3: Export to chosen format.

So in order to support a new format, they probably only have to create a new "template" at Step 3, to take everything in that intermediary, and convert it into its equivalent in the output format.

So it probably isn't as simple as just "let's add center/right/justified justification when exporting to EPUB". An update to transfer over alignment in Step 2 would mean having to update all of the Step 3 transformations (most likely very complex) for all the other supported formats.

If you want to carry over more formatting, you would have to choose one of the many formats supported in "Exact Copy", "Editable Copy" or "Flexible" (HTML).

I personally think that the EPUB output is the best/most minimal output, with the cleanest code. Then you can easily just add your little tweaks (alignment, headings, split chapters, etc. etc.) using Sigil or Calibre's Editor. If you need any help with that, I would be glad to help... I use Finereader every day converting books.

Last edited by Tex2002ans; 02-27-2014 at 03:38 PM.
Tex2002ans is offline   Reply With Quote
Old 02-27-2014, 11:05 PM   #6
u238110
Connoisseur
u238110 began at the beginning.
 
Posts: 91
Karma: 10
Join Date: Feb 2014
Location: Long Island, NY
Device: Aura, N514KUBKKEP, 4.7.10.413
Thanks a lot. Where can I post an ad asking if someone can digitize books for me (destructive method)?
u238110 is offline   Reply With Quote
Old 02-28-2014, 09:41 AM   #7
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
Quote:
Originally Posted by AIDM2 View Post
Thanks a lot. Where can I post an ad asking if someone can digitize books for me (destructive method)?
Check our wiki. There are plenty of people that will do this. E-book_conversion#Conversion_Services (mostly conversion services) and also Digitizing Paper Books to Ebooks

Dale

Last edited by DaleDe; 02-28-2014 at 09:44 AM.
DaleDe is offline   Reply With Quote
Old 03-01-2014, 03:41 AM   #8
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 11,462
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
Quote:
Originally Posted by AIDM2 View Post
Thanks a lot. Where can I post an ad asking if someone can digitize books for me (destructive method)?
AIDM2:

I can highly recommend someone; PM me if you need references for someone to whom we refer our own clients.

Hitch
Hitch is offline   Reply With Quote
Old 03-15-2014, 03:20 AM   #9
u238110
Connoisseur
u238110 began at the beginning.
 
Posts: 91
Karma: 10
Join Date: Feb 2014
Location: Long Island, NY
Device: Aura, N514KUBKKEP, 4.7.10.413
So, there's a book that has sanskrit text on the bottom of almost every page (see image). Would it be possible to program ABBYY FineReader to ignore every instance of that text?

Source: The Materia Medica of the Hindus (1877), Udoy Chand Dutt, Sir George King
Attached Thumbnails
Click image for larger version

Name:	uHooYL4.gif
Views:	232
Size:	44.2 KB
ID:	126194  

Last edited by WT Sharpe; 07-31-2014 at 08:43 AM. Reason: Changed oversized graphic to thumbnail attachment.
u238110 is offline   Reply With Quote
Old 03-15-2014, 05:17 AM   #10
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by AIDM2 View Post
So, there's a book that has sanskrit text on the bottom of almost every page (see image).
Also, here is a link to the Archive.org version:

https://archive.org/details/materiamedicahi00kinggoog

Quote:
Originally Posted by AIDM2 View Post
Would it be possible to program ABBYY FineReader to ignore every instance of that text?
You will have to manually intervene in those cases.

You can either (ranked from worst to best):
  • Ignore those sections and clean it in when you output
  • You can manually delete the garbage text from the Text Window (right side)
    Click image for larger version

Name:	SanskritDelete.png
Views:	273
Size:	154.5 KB
ID:	120256 Click image for larger version

Name:	SanskritDelete2.png
Views:	260
Size:	155.5 KB
ID:	120257
  • You can resize the "Text Recognition Box" to not include the footnote
    • Don't forget to "Read" the page again.
    Click image for larger version

Name:	SanskritResize.png
Views:	257
Size:	157.1 KB
ID:	120259
  • You can resize the text box, and then mark the footnote as an Image instead.
    • I would choose this method.
    • Don't forget to "Read" the page again.
    Click image for larger version

Name:	SanskritImage.png
Views:	250
Size:	162.5 KB
ID:	120258

Making the Sanskrit footnotes images will allow you to export them all, and perhaps you can then feed those images into an OCR tool that can read Sanskrit, you can more easily manually transcribe the text, or you can as a last resort, embed the sanskrit in the book as images (although I HIGHLY recommend against embedding text as images).

Last edited by Tex2002ans; 03-15-2014 at 06:53 AM.
Tex2002ans is offline   Reply With Quote
Old 03-15-2014, 05:57 AM   #11
u238110
Connoisseur
u238110 began at the beginning.
 
Posts: 91
Karma: 10
Join Date: Feb 2014
Location: Long Island, NY
Device: Aura, N514KUBKKEP, 4.7.10.413
Thanks for taking the time to type up that quality description.
u238110 is offline   Reply With Quote
Old 03-15-2014, 09:22 AM   #12
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
If you are going to save anything as an image, if the resolution is settable, make it as high as possible if there is a chance you might try to recognize it. You can always reduce resolution, but you can't go up.
mrmikel is offline   Reply With Quote
Old 03-15-2014, 02:47 PM   #13
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by mrmikel View Post
If you are going to save anything as an image, if the resolution is settable, make it as high as possible if there is a chance you might try to recognize it. You can always reduce resolution, but you can't go up.
Yes, always a good idea in ANY program to export as source/highest resolution.

I always assume this (although I guess if someone is not familiar with Finereader, or just getting into working with OCR/images, they wouldn't set this the first time around).

In Finereader, you can do this by going into Tools - Options - (press on the little "Save" tab) - Go through each format and make sure "Picture Settings" is set to "Best quality (source image resolution"):

Click image for larger version

Name:	FinereaderImageExport.png
Views:	251
Size:	8.2 KB
ID:	120287
Tex2002ans is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Need help with Abbyy Finereader 10 (linebreaks) NASCARaddicted Workshop 11 01-19-2017 04:10 PM
ABBYY FineReader Sale anamardoll General Discussions 15 02-20-2013 11:25 AM
If I have ABBYY Finereader, do I need ABBYY PDF Transformer? graycyn PDF 2 06-12-2012 06:23 PM
Abbyy Finereader 11 Pro $99 chainring Deals and Resources (No Self-Promotion or Affiliate Links) 6 02-13-2012 07:12 AM
ABBYY FineReader cannot see images chinesealbumart Workshop 8 05-15-2009 11:03 PM


All times are GMT -4. The time now is 01:15 PM.


MobileRead.com is a privately owned, operated and funded community.