Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 11-27-2011, 02:07 PM   #1
TopCat
Junior Member
TopCat began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Nov 2010
Device: iPad
Question Overlapping text when converting html to mobi/epub

I had PDF I scanned, without OCR initially. Today I decided to use ABBYY to OCR the book and export it to html, which I assumed would be easier to convert to .mobi. However, the exported html copy of the book has these huge gaps pertaining to the borders between the text on sequential pages. This causes Calibre to overlap mounds of text when converting to mobi or epub. I tried playing around with the conversion options, but haven't stumbled upon a way to have it ignore/remove that white space. Is there a way to do that in Calibre?

Before
After

Last edited by TopCat; 11-27-2011 at 02:09 PM.
TopCat is offline   Reply With Quote
Old 11-28-2011, 12:42 AM   #2
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Enable heuristics - ABBYY has some ugly markup that tries to look exactly like the original book, and this markup is often quite screwy and fails to work as designed. The huge gap is a page break, you can also enable unwrap lines with heuristics. The Heuristics routines try to preserve as much of the original formatting as possible while cleaning out the garbage.
ldolse is offline   Reply With Quote
Advert
Old 11-28-2011, 02:53 AM   #3
TopCat
Junior Member
TopCat began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Nov 2010
Device: iPad
Heuristics didn't do anything to fix the problem, unfortunately. Given that I'm using a trial of ABBYY, do you have any suggestions on a neater OCR app that can help with the transition from textual PDF to mobi/epub?
TopCat is offline   Reply With Quote
Old 11-28-2011, 03:19 AM   #4
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
I think ABBYY is probably the best in terms of maintaining the original formatting unfortunately. If you want you could open a bug, attach the original ABBYY generated html book, mark it private, and have Kovid assign it to me. I maintain the function that attempts to clean up ABBYY markup, but to be honest it's had limited testing, just a few of my own docs from that I'd converted, and it's possible different versions of ABBYY generated markup won't work with the function.

Last edited by ldolse; 11-28-2011 at 03:23 AM.
ldolse is offline   Reply With Quote
Old 11-28-2011, 06:13 AM   #5
GRiker
Comparer of the Ephemeris
GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.GRiker ought to be getting tired of karma fortunes by now.
 
Posts: 1,496
Karma: 424697
Join Date: Mar 2009
Device: iPad
I have had some better results using ABBYY to convert to RTF, then to the destination format. It's not perfect, but it does seem to do a better job with page breaks.

G
GRiker is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Overlapping text jeeperz Kobo Reader 23 09-26-2011 09:03 PM
Converting Mobi or HTML file to Epub Patuba Sigil 1 07-23-2011 04:14 PM
Converting Mobi or HTML file to Epub Patuba ePub 7 07-19-2011 12:11 PM
Calibre Indent Issue When Removing Blank Lines (Converting From HTML to MOBI or EPUB) David Derrico Calibre 5 08-04-2010 12:13 AM
EPUB Overlapping Text - Please Help coaver Calibre 16 07-27-2010 12:40 AM


All times are GMT -4. The time now is 06:29 PM.


MobileRead.com is a privately owned, operated and funded community.