Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 11-29-2011, 11:52 PM   #1
dmorris68
Junior Member
dmorris68 began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Nov 2011
Device: Nook Color
Looking for advice on tricky conversion

Hi all, long time lurker, first time poster.

I have a PDF (yes, I know the caveats of PDF conversion) copy of a book I'm trying to get converted to ePub to read on my NC because I just don't care to read PDFs there. I own the physical book as well but obviously would prefer not to lug it around since I always have my NC with me.

Took a Hail Mary shot at a straight conversion with calibre and as expected the results were less than stellar. No page breaks, table contents not converted but spread down the page one cell per line, etc. Also the book has a lot of simple math notation, primarily fractions and ratios -- the fractions are expressed in horizontal bar form, not like x/y, and the numerators and bars get dropped in the conversion, leaving only the denominator. Stuff like that. Basically makes the conversion useless.

Okay, so realizing the limitations of PDF conversion, I loaded up the PDF in NitroPDF Pro and exported as a Word DOC. It exported cleanly and looks beautiful in Word 2007. But calibre doesn't support DOC, so I saved as ODT which is supported. LibreOffice refused to open the ODT, so I suspect some special Microsoft sauce had been applied (big surprise), and trying to get a valid ODT out of Libre failed because it somewhat mangles the DOC file presentation. Calibre churned on the Word-generated ODT for nearly 20 minutes before producing an even worse conversion than from the PDF.

Seeing that RTF is a supported source format, I tried that next. Saved as RTF, previewed RTF in Word, looked great. Tried to convert, it's also worse than the PDF attempt.

Next tried HTML. Saved as HTML from Word, calibre bundled it all up nicely in a ZIP before producing a likewise worse-than-PDF conversion. I'm aware of the horrible HTML Word tends to generate so not terribly surprised there, I guess.

I'm trying to find some combination of format conversions that will help me, but no luck so far. I have this perfect DOC file just taunting me -- I feel like it should be easy to get it into ePub even if by some circuitous route, but I'm stumped. I'm actually surprised that the PDF conversion is the most readable, such that it is, of all the other formats I've tried thus far.

Any ideas on intermediate conversions or settings tweaks I might be able to make to get this done? I've tried with heuristics on and off, but otherwise haven't attempted to tweak much.

TIA!

David
dmorris68 is offline   Reply With Quote
Old 11-30-2011, 02:59 AM   #2
itimpi
Wizard
itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.
 
Posts: 4,553
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
I would normally expect HTML to convert well. You mentioned saving as HTML, I assume you used the Option to use "Web (filtered)" option - this is needed to stop Word inserting all sort of Microsoft specific features into the HTML.

When you did a Save As to "Web (filtered)" from word did you try loading the result into a browser to see if that displayed OK? If so I would expect it to convert OK.
itimpi is offline   Reply With Quote
Advert
Old 11-30-2011, 09:00 AM   #3
dmorris68
Junior Member
dmorris68 began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Nov 2011
Device: Nook Color
I don't recall which HTML option I chose, but I'll go back and do it again to be sure it's the "Web (filtered)" option, and I'll preview in a browser first. At work atm so will try when I get home tonight.

Thanks for the tip!
dmorris68 is offline   Reply With Quote
Old 11-30-2011, 07:13 PM   #4
dmorris68
Junior Member
dmorris68 began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Nov 2011
Device: Nook Color
Unfortunately, that didn't help much.

I thought it would at first -- I re-saved as HTML choosing the Web filtered option and noticed a different confirmation dialog, so obviously didn't choose that the first time. I opened up in Chrome and looked semi-decent. Not great but I figured there's only so much fidelity to be retained with a plain HTML conversion, and it still might be better than the PDF conversion. So I deleted the old HTML ZIP from calibre and re-imported the new one and fired up a conversion. When it finished and I previewed within Calibre, it still mangled the book's TOC but not so bad I couldn't touch it up in Sigil and if not, no big deal -- I don't use the TOC much anyway. I moved through a few pages and although things were off, it looked like it still might be a better option, although the page length seemed to be too long.

Deleted the old ePub from the NC and transferred the new one. No dice, it was horrible. This is a 223 page book, it came in at something like 681 pages in the ePub. Lines were not wrapped properly (words broken in the middle) and pages would abruptly stop a few lines into a page and continue on the next page. No images imported either, as they did from the PDF.

So back to the drawing board.

I have another idea to try -- total shot in the dark but maybe I get lucky and see an improvement. I'll report back with the results, but welcome further advice.
dmorris68 is offline   Reply With Quote
Old 12-01-2011, 06:02 AM   #5
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
Since you own the physical book, it might be easier to scan it and clean it up than try to make a PDF work. At least it might be worthwhile to try a page or two and see what you get.
mrmikel is offline   Reply With Quote
Advert
Old 12-02-2011, 11:33 AM   #6
mrsquash
Connoisseur
mrsquash will become famous soon enoughmrsquash will become famous soon enoughmrsquash will become famous soon enoughmrsquash will become famous soon enoughmrsquash will become famous soon enoughmrsquash will become famous soon enough
 
Posts: 78
Karma: 648
Join Date: Jul 2009
Location: USA
Device: K4-NT, Sony T1
dmorris68,
You might have a look at the fully functional 30-day free trial of Atlantis Word Processor. It will save directly from RTF (or .doc, for that matter) to epub and does a very good job. I've often used Abbyy PDF transformer to get a doc file, style it up in Word, then open in Atlantis and save as epub.

I've had some problems with Atlantis' styles dropping italics in the past, so that's why I usually do any styling in Word.

More info on its ebook functions:
http://www.atlantiswordprocessor.com/en/help/ebook.htm
mrsquash is offline   Reply With Quote
Old 12-02-2011, 12:50 PM   #7
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,166
Karma: 1410083
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
@ dmorris68,
The problem with converting from PDF to Word is that you end up with a document what includes a lot of styles and that is, what is causing your format troubles in generating an Epub. The converters try to simplify the styles and often fail to find the right structure.
When I start preparing a PDF document to Epub I use an OCR to extract the text and image information. Then I save this in a Word document. So fare more or less the same like you are doing the conversion. Now take a look to the styles in Word. You will find that there are hundreds of them in the file. This is what makes the document looking so nice in Word but unusable for an Epub conversion. You need to simplify this to a set of styles what you can then translate into a CSS (only one font, a set of sizes, bold, italic, a set of headlines h1, h2, ...). Then you can save this file to .rtf or .html and use a converter like Calibre and or Sigil. By the way, if you have well defined headlines, you can use this to generate the TOC in Sigil with one click.

Last edited by Divingduck; 12-02-2011 at 12:52 PM.
Divingduck is offline   Reply With Quote
Old 12-02-2011, 01:32 PM   #8
mrsquash
Connoisseur
mrsquash will become famous soon enoughmrsquash will become famous soon enoughmrsquash will become famous soon enoughmrsquash will become famous soon enoughmrsquash will become famous soon enoughmrsquash will become famous soon enough
 
Posts: 78
Karma: 648
Join Date: Jul 2009
Location: USA
Device: K4-NT, Sony T1
dmorris68,
I'd definitely second Divingduck's recommendations. The first conversion I did created a new and very slightly different style for almost every line in the document and the resultant css file was huge, about 250K if I remember correctly. Going back and re-styling the .doc file in a word-processor was pretty fast and fixed everything up.

I've only worked with novels, though, not a complex document such as yours.
mrsquash is offline   Reply With Quote
Old 12-02-2011, 05:55 PM   #9
dmorris68
Junior Member
dmorris68 began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Nov 2011
Device: Nook Color
Thanks for all the tips!

I'll check the trial of Atlantis, and also look at just restyling everything in Word.

Worst case attempt would be a plain text extraction. There aren't that many images so replacing them wouldn't be hard, but my main concern is the math notations -- they'd be stripped and I have to hand-edit them back in. This is a 223 page book and while it's not heavy on advanced math, it does have a lot of simple equations and fractions. Not looking forward to essentially proofing and hand-editing 200 pages, but it may be my only choice if I want a truly clean epub.

Hopefully the style edit in Word will do the trick, until then I guess I'm stuck with the PDF. At least I have an electronic copy, it's just not in the most convenient format (the PDF reader on the NC sucks).
dmorris68 is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Need help on tricky e-book upload GregMitch34 Conversion 1 10-27-2011 12:57 PM
Here's a Tricky One Tegan Recipes 2 05-27-2011 08:49 PM
Desperately seeking.... advice on epub conversion? Direct Ebooks ePub 11 11-03-2009 10:19 AM
Unutterably Silly Tricky Poll GeoffC Lounge 22 05-11-2009 11:24 AM


All times are GMT -4. The time now is 10:00 PM.


MobileRead.com is a privately owned, operated and funded community.