07-12-2009, 08:10 PM | #1 |
Zealot
Posts: 134
Karma: 994
Join Date: Apr 2009
Location: Maine, United States
Device: Ectaco Jetbook
|
Preserving Formatting through conversion?
There may be a thread on this already but I couldn't find one, so here it is. I have downloaded a book that is only available in PDF, so far as I could find anyway. So I went and converted it to epub through calibre. The formatting is entirely messed up, for example all of the words with two l's in it have lost one of them (call is now cal), and all of the dialog between characters has just become one paragraph. So basically my question is there a way to preserve the formatting through conversion without having to go through the file manually and fix it?
|
07-12-2009, 08:38 PM | #2 |
Still wondering why
Posts: 253
Karma: 800
Join Date: Jun 2009
Location: Athens, Greece
Device: PRS 505, (BlackBerry Bold ?)
|
Hi Ham88,
Indeed, it's a strange behavior. Calibre usually gives very good results (in lrf conversion which is the format I use). Maybe your pdf source is not a "normal" text based file. My only suggestion (I'm pretty sure other expereienced members will give you more) would be to give another try with the following online converter: http://www.lib2go.com/ I have tested for lrfs and it gives fair results. Good luck! |
Advert | |
|
07-12-2009, 08:55 PM | #3 |
Zealot
Posts: 134
Karma: 994
Join Date: Apr 2009
Location: Maine, United States
Device: Ectaco Jetbook
|
Lib2go hates my files, as it claims they are too big and I noticed something else its based on calibre so I would get the same results anyway, so any other ideas?
|
07-12-2009, 08:58 PM | #4 |
Time Enough at Last
Posts: 387
Karma: 1151316
Join Date: Feb 2008
Location: New England
Device: iPad 3, iPhone 5, Kindle 3, Fire, Sony PRS-350
|
How about running it through soPDF first, and then letting Calibre tackle it?
|
07-12-2009, 09:07 PM | #5 |
Zealot
Posts: 134
Karma: 994
Join Date: Apr 2009
Location: Maine, United States
Device: Ectaco Jetbook
|
I just looked up soPDF, I may be wrong but it appears to be a command line driven which is something I avoid because I'm incompetent with the lovely command line. But I'm currently using PDFread and am hoping this will yield the results that I want. The only problem is that this requires me to be patient, something that I lack. If this doesn't work I'll try soPDF and hope for the, best thanks for the help so far.
|
Advert | |
|
07-12-2009, 11:08 PM | #6 |
Wizard
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
|
I wrote a barebones GUI for SoPDF. If you go further down in the SoPDF thread, you'll find discussion of it, and a download link.
I'm not sure SoPDF tool is the right tool here, however. PDFread is a good one to try. Even though you're not using a Sony, running PDFLRF on it, and then running that through calibre may give you a good result too. |
07-13-2009, 01:19 AM | #7 |
01000100 01001010
Posts: 1,889
Karma: 2400000
Join Date: Mar 2009
Device: Polyamorous
|
I have the same problem when using Calibre to convert PDFs: all the paragraph breaks are lost, and I end up with one long long text file. I have yet to find a good PDF to LRF or PDF to ePUB converter.
|
07-13-2009, 08:17 PM | #8 |
Zealot
Posts: 134
Karma: 994
Join Date: Apr 2009
Location: Maine, United States
Device: Ectaco Jetbook
|
All of the things I have tried has failed, soPDF worsened the problem as the eventual epub file was only one or two words per line on average. I think I'm going to have to read this book as a PDF on my PC unfortunately. Thanks for the help.
|
07-18-2009, 05:12 AM | #9 |
eBook Enthusiast
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
Calibre is an excellent program, but PDF conversion is not one of its strengths. Try "Book Designer" - it converts PDFs as well as anything I've come across.
Unfortunately, however, PDF files sometimes simply cannot be converted well. A PDF files does not contain "text" - it has no paragraphs, lines, words, etc; just drawing instructions for individual letters. As such, it is extraordinarily difficult to convert. |
07-19-2009, 06:01 AM | #10 |
Guru
Posts: 860
Karma: 4380
Join Date: Feb 2008
Location: Almada, Portugal
Device: Cybook Gen3, Sony PRS 505, Kindle DXG and Samsung Galaxy Note
|
Hi
My advice is to OCR the PDF file, save the result in a format you will be able to edit (per example Microsoft word), proof read it and correct the errors found, create final eBook in ePub or other form your reader can handle. Two of the best software applications for OCR are, Finereader and Omnipage. Best regards, |
07-19-2009, 11:06 AM | #11 | |
Wizard
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
|
Quote:
I continue to maintain that PDFread and/or PDRLRF>Calibre are the best ways to go to preserve the look of the original PDF, while formatting better for an e-ink device. |
|
07-19-2009, 01:14 PM | #12 |
Guru
Posts: 860
Karma: 4380
Join Date: Feb 2008
Location: Almada, Portugal
Device: Cybook Gen3, Sony PRS 505, Kindle DXG and Samsung Galaxy Note
|
Hi
Frabjous You are perfectly correct, I missed that the final intent from Ham88 was to: “(…)to preserve the formatting through conversion without having to go through the file manually and fix it?” Ham88 Rephrasing, the answer to the question (cited above) you have posed is: no! Best regards, |
07-19-2009, 01:18 PM | #13 |
Resident Curmudgeon
Posts: 75,834
Karma: 134321338
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
There is no such thing as a novel length PDF that will convert from PDF to any other format without errors.
|
07-19-2009, 02:28 PM | #14 |
Wizard
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
|
The tools I suggested convert the PDF pages to images, but then remove the margins and cut up the images into manageable chunks, and also change the file format--not that matters much if it's just a sequence of images. I'm not sure what you have in mind by "errors", but they should preserve the look more or less exactly, and there would be nothing to manually fix. The only question is whether the results would look nice enough for the purposes of reading.
|
07-19-2009, 02:37 PM | #15 | |
Resident Curmudgeon
Posts: 75,834
Karma: 134321338
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
|
|
Tags |
calibre, conversion, pdf |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Preserving <br /> on epub -> txt conversion | billingd | Calibre | 1 | 08-11-2010 06:24 AM |
[KOBO] Strip existing formatting to apply my own default formatting to all books | digital_steve | Calibre | 2 | 08-10-2010 06:34 PM |
Need help formatting HTML for good conversion | ficbot | Calibre | 2 | 04-15-2010 09:36 PM |
TXT conversion to ePub or LRF - paragraph formatting | Zapped | Calibre | 6 | 10-23-2009 05:06 PM |
Preserving TOC upon conversion from Lit to Mobi | mobelby | Calibre | 0 | 07-31-2009 07:59 AM |