Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 11-27-2011, 03:07 PM   #1
TopCat
Junior Member
TopCat began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Nov 2010
Device: iPad
Question Overlapping text when converting html to mobi/epub

I had PDF I scanned, without OCR initially. Today I decided to use ABBYY to OCR the book and export it to html, which I assumed would be easier to convert to .mobi. However, the exported html copy of the book has these huge gaps pertaining to the borders between the text on sequential pages. This causes Calibre to overlap mounds of text when converting to mobi or epub. I tried playing around with the conversion options, but haven't stumbled upon a way to have it ignore/remove that white space. Is there a way to do that in Calibre?

Before
After

Last edited by TopCat; 11-27-2011 at 03:09 PM.
TopCat is offline   Reply With Quote
Old 11-28-2011, 01:42 AM   #2
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Enable heuristics - ABBYY has some ugly markup that tries to look exactly like the original book, and this markup is often quite screwy and fails to work as designed. The huge gap is a page break, you can also enable unwrap lines with heuristics. The Heuristics routines try to preserve as much of the original formatting as possible while cleaning out the garbage.
ldolse is offline   Reply With Quote
Old 11-28-2011, 03:53 AM   #3
TopCat
Junior Member
TopCat began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Nov 2010
Device: iPad
Heuristics didn't do anything to fix the problem, unfortunately. Given that I'm using a trial of ABBYY, do you have any suggestions on a neater OCR app that can help with the transition from textual PDF to mobi/epub?
TopCat is offline   Reply With Quote
Old 11-28-2011, 04:19 AM   #4
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
I think ABBYY is probably the best in terms of maintaining the original formatting unfortunately. If you want you could open a bug, attach the original ABBYY generated html book, mark it private, and have Kovid assign it to me. I maintain the function that attempts to clean up ABBYY markup, but to be honest it's had limited testing, just a few of my own docs from that I'd converted, and it's possible different versions of ABBYY generated markup won't work with the function.

Last edited by ldolse; 11-28-2011 at 04:23 AM.
ldolse is offline   Reply With Quote
Old 11-28-2011, 07:13 AM   #5
GRiker
Comparer of the Ephemeris
GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.
 
Posts: 1,497
Karma: 424627
Join Date: Mar 2009
Device: iPad
I have had some better results using ABBYY to convert to RTF, then to the destination format. It's not perfect, but it does seem to do a better job with page breaks.

G
GRiker is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Overlapping text jeeperz Kobo Reader 23 09-26-2011 10:03 PM
Converting Mobi or HTML file to Epub Patuba Sigil 1 07-23-2011 05:14 PM
Converting Mobi or HTML file to Epub Patuba ePub 7 07-19-2011 01:11 PM
Calibre Indent Issue When Removing Blank Lines (Converting From HTML to MOBI or EPUB) David Derrico Calibre 5 08-04-2010 01:13 AM
EPUB Overlapping Text - Please Help coaver Calibre 16 07-27-2010 01:40 AM


All times are GMT -4. The time now is 06:49 AM.


MobileRead.com is a privately owned, operated and funded community.