View Full Version : Preserving Formatting through conversion?


Ham88
07-12-2009, 09:10 PM
There may be a thread on this already but I couldn't find one, so here it is. I have downloaded a book that is only available in PDF, so far as I could find anyway. So I went and converted it to epub through calibre. The formatting is entirely messed up, for example all of the words with two l's in it have lost one of them (call is now cal), and all of the dialog between characters has just become one paragraph. So basically my question is there a way to preserve the formatting through conversion without having to go through the file manually and fix it?

Kostas
07-12-2009, 09:38 PM
Hi Ham88,

Indeed, it's a strange behavior. Calibre usually gives very good results (in lrf conversion which is the format I use). Maybe your pdf source is not a "normal" text based file.
My only suggestion (I'm pretty sure other expereienced members will give you more) would be to give another try with the following online converter:
http://www.lib2go.com/

I have tested for lrfs and it gives fair results.
Good luck!

Ham88
07-12-2009, 09:55 PM
Lib2go hates my files, as it claims they are too big and I noticed something else its based on calibre so I would get the same results anyway, so any other ideas?

Timoleon
07-12-2009, 09:58 PM
How about running it through soPDF first, and then letting Calibre tackle it?

Ham88
07-12-2009, 10:07 PM
I just looked up soPDF, I may be wrong but it appears to be a command line driven which is something I avoid because I'm incompetent with the lovely command line. But I'm currently using PDFread and am hoping this will yield the results that I want. The only problem is that this requires me to be patient, something that I lack. If this doesn't work I'll try soPDF and hope for the, best thanks for the help so far.

frabjous
07-13-2009, 12:08 AM
I wrote a barebones GUI for SoPDF. If you go further down in the SoPDF thread, you'll find discussion of it, and a download link.

I'm not sure SoPDF tool is the right tool here, however. PDFread is a good one to try. Even though you're not using a Sony, running PDFLRF on it, and then running that through calibre may give you a good result too.

doreenjoy
07-13-2009, 02:19 AM
I have the same problem when using Calibre to convert PDFs: all the paragraph breaks are lost, and I end up with one long long text file. I have yet to find a good PDF to LRF or PDF to ePUB converter.

Ham88
07-13-2009, 09:17 PM
All of the things I have tried has failed, soPDF worsened the problem as the eventual epub file was only one or two words per line on average. I think I'm going to have to read this book as a PDF on my PC unfortunately. Thanks for the help.

HarryT
07-18-2009, 06:12 AM
Calibre is an excellent program, but PDF conversion is not one of its strengths. Try "Book Designer" - it converts PDFs as well as anything I've come across.

Unfortunately, however, PDF files sometimes simply cannot be converted well. A PDF files does not contain "text" - it has no paragraphs, lines, words, etc; just drawing instructions for individual letters. As such, it is extraordinarily difficult to convert.

DDHarriman
07-19-2009, 07:01 AM
Hi

My advice is to OCR the PDF file, save the result in a format you will be able to edit (per example Microsoft word), proof read it and correct the errors found, create final eBook in ePub or other form your reader can handle.

Two of the best software applications for OCR are, Finereader and Omnipage.

Best regards,

frabjous
07-19-2009, 12:06 PM
Hi

My advice is to OCR the PDF file, save the result in a format you will be able to edit (per example Microsoft word), proof read it and correct the errors found, create final eBook in ePub or other form your reader can handle.

I don't think that falls under the category of "Preserving format through conversion" "without having to go through the file manually and fix it", which is what was asked for.

I continue to maintain that PDFread and/or PDRLRF>Calibre are the best ways to go to preserve the look of the original PDF, while formatting better for an e-ink device.

DDHarriman
07-19-2009, 02:14 PM
Hi

Frabjous

You are perfectly correct, I missed that the final intent from Ham88 was to:
()to preserve the formatting through conversion without having to go through the file manually and fix it?

Ham88

Rephrasing, the answer to the question (cited above) you have posed is: no!

Best regards,

JSWolf
07-19-2009, 02:18 PM
There is no such thing as a novel length PDF that will convert from PDF to any other format without errors.

frabjous
07-19-2009, 03:28 PM
The tools I suggested convert the PDF pages to images, but then remove the margins and cut up the images into manageable chunks, and also change the file format--not that matters much if it's just a sequence of images. I'm not sure what you have in mind by "errors", but they should preserve the look more or less exactly, and there would be nothing to manually fix. The only question is whether the results would look nice enough for the purposes of reading.

JSWolf
07-19-2009, 03:37 PM
The tools I suggested convert the PDF pages to images, but then remove the margins and cut up the images into manageable chunks, and also change the file format--not that matters much if it's just a sequence of images. I'm not sure what you have in mind by "errors", but they should preserve the look more or less exactly, and there would be nothing to manually fix. The only question is whether the results would look nice enough for the purposes of reading.
I was meaning converting the PDF to some reflowable format and not actually images. It cannot be done without error.

AJ Starr
07-19-2009, 04:44 PM
There may be a thread on this already but I couldn't find one, so here it is. I have downloaded a book that is only available in PDF, so far as I could find anyway. So I went and converted it to epub through calibre. The formatting is entirely messed up, for example all of the words with two l's in it have lost one of them (call is now cal), and all of the dialog between characters has just become one paragraph. So basically my question is there a way to preserve the formatting through conversion without having to go through the file manually and fix it?

I assume that since you have converted it, that this PDF is not DRM'd. One thing that I use that I've not seen anywhere on these threads is my word processor, Word Perfect, reads PDF's into it's format. I know you didn't want to go through file manipulation, but my experience has Word Perfect converting the file excellently. Though I've only done chapter by chapter pdfs. Then it shouldn't be too hard to go back through Caliber.

AJ

MerLock
08-31-2009, 03:22 AM
Calibre is an excellent program, but PDF conversion is not one of its strengths. Try "Book Designer" - it converts PDFs as well as anything I've come across.

Unfortunately, however, PDF files sometimes simply cannot be converted well. A PDF files does not contain "text" - it has no paragraphs, lines, words, etc; just drawing instructions for individual letters. As such, it is extraordinarily difficult to convert.

I used BD to convert some PDF's I have and it seems to be doing a fairly good job. However, the only thing it is doing poorly is maintaining italics and bold type face. Am I missing a setting or can this just not be done?