10-28-2009, 01:42 PM | #1 |
Hibernian eBook Warrior
Posts: 184
Karma: 1264
Join Date: Aug 2009
Location: Cork, Ireland
Device: Sony Reader
|
Desperately seeking.... advice on epub conversion?
Hi Guys.
I am interested in converting pdf files into Ebooks. I am currently outsourcing this work which is proving costly. Although the quality is excellent to be fair. I would like to be able to do this work myself, but I'm unsure of what the best method is and I dont want to start on one path to find out another is better. I'm asking for your advice/experiences. I want to produce proffessional quality epubs, fully indexed etc. I'm very familiar with all the technical issues, but should I use one of these automated packages or start at the beginning with xtml/css? Or is adobe inDesign the best overall tool? I'd appreciate all your comments and feedback |
10-28-2009, 01:45 PM | #2 |
eBook Enthusiast
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
Moved to the appropriate forum section.
|
Advert | |
|
10-29-2009, 12:27 AM | #3 |
Wizard
Posts: 1,196
Karma: 1281258
Join Date: Sep 2009
Device: PRS-505
|
I've tried various methods, but the one that has proved most successful for me is to edit the text in Word (making proper use of styles etc) and then use Atlantis to generate the ePub file. You could use Atlantis for everything, but I find Word 2007 easier to use for editing (mostly because I'm used to it and I have it anyway). The $35 for Atlantis is very reasonable for what it offers. The advantage to this is that I can edit the text in a word processor and don't have to guess what it will look like or fiddle with xml to set things up properly.
I have access to inDesign CS4 and have tried it for ePubs, and frankly it's inferior to Atlantis - you need to split the book into separate documents yourself in order to ensure that it doesn't go over the mobileADE 300k limit, which is just the sort of extra hassle I can do without. inDesign does offer more flexibility with ToC generation, has options for image manipulation and makes it easy to embed fonts in the ePub, but none of these justify the extra effort involved unless you have special needs. I'm sure you can get excellent results converting rtf files in calibre as well, which has the benefit of being free. For best results you'll probably want to tweak the css settings and XPath options for the ToC etc. There are also a few free add-ons to Word floating around that might be worth checking out, though they tend to enforce their own particular notions and can be fiddly (hard to moan when they're free though). One thing you need to realise is that PDFs are a real pain to convert. Very few are fully tagged, meaning that you need to scan through the text to correct broken paragraphs and incorrectly inserted line breaks or hyphens (I use a Word wildcard search for paragraphs, Find: ([!."\?\!\)])^13 Replace:\1 though you still need to check each instance). Each document will offer its own variation of the particular problems you can run into. I'm afraid there is no 1-click solution, converting a PDF can easily take a couple of hours, or much more depending on how much you need to reconform the text. A lot depends on how much variation there is in your text and how much you want to preserve that in the finished item. There are various options for saving the PDF as a docx file for editing. I happen to use Nuance PDF Converter, which generally does a decent job of stripping out headers and footers, though it can still trip up at times. |
10-29-2009, 12:32 AM | #4 |
Snooty Bestselling Author
Posts: 1,485
Karma: 1000000
Join Date: Aug 2009
Location: Ipswich, QLD, Australia
Device: PRS-650
|
Do you have the source files for the PDFs?
|
10-29-2009, 05:02 AM | #5 |
Hibernian eBook Warrior
Posts: 184
Karma: 1264
Join Date: Aug 2009
Location: Cork, Ireland
Device: Sony Reader
|
Hi Guys.
Thanks for the advice. Publishers present me with PDF/Quark files of their books and I then outsource their conversion to ePub. The biggest challenge i feel will be converting the pdf's back to word/text file. Are any of the auto programs any good for this? Thanks again |
Advert | |
|
10-29-2009, 07:11 AM | #6 |
Wizard
Posts: 1,244
Karma: 3439432
Join Date: Feb 2008
Device: Amazon Kindle Paperwhite (300ppi), Samsung Galaxy Book 12
|
Don't start from the .pdfs --- instead use the Quark source.
Dump to XPress Tags or .html or some other sort of tagged format, then massage that, adding back in anything which wasn't in the main text flow (or get a specialized XTension/utility such as textractor). PDFs convert the formatting into localized text changes and positional information which is difficult to extract. If you must use a .pdf as a source, use a utility such as Marcel Weiher's TextLightning.app which will analyze that positional information and then allow you to use global search-replace techniques to convert the local-formatting into proper styles. William |
10-29-2009, 09:21 AM | #7 |
reader
Posts: 6,975
Karma: 5183568
Join Date: Mar 2006
Location: Mississippi, USA
Device: Kindle 3, Kobo Glo HD
|
The only book on a similar subject I am aware of is Kindle Formatting: The Complete Guide. This is probably worth buying for anyone intending to format multiple ebooks. I don't remember if it discusses starting with PDFs though. Since this is somewhat Kindle-specific, there probably is room for a similar "ePub Formatting" book.
|
10-30-2009, 05:48 AM | #8 |
Connoisseur
Posts: 92
Karma: 50000
Join Date: Oct 2009
Device: none
|
I'm facing the same problems as Direct Ebooks. I have PDF documents as source files and I need some easy way to edit them. One program is MS Word from where I can easily take it to InDesign and create an e-book. Problem is the conversion from PDF to DOC. I checked TextLightning.app which WillAdams mentioned but it's only for Mac and I have Win XP. I searched from google for "pdf to doc converter" but most of the softwares I found are shareware. I also tried to find open source software from http://www.sourceforge.net but didn't find any. For now, I have just found this http://www.somepdf.com/downloads.html which is free. I tried it but I'm facing new problems with it.
As you can see from the "pdf_sample.gif" file, there are 2 hyphens which just tell to the reader that the word is continueing to next row. If I copy&paste those words manually to notepad, hyphens will disappear and the words are showing correctly but the line feed is wrong as you can see from the "notepad_sample.gif". When I use Some PDF tool to convert PDF to DOC, it leaves all the hyphens and the words are showing incorrectly as you can see from the "word_sample.gif". I should check all the hyphens manually because sometimes those are necessary. I can't just use find&replace and erase all the hyphens. Also, line feed creates sometimes one extra space between words so some words have hyphen and one empty space. That means I really need to check every case manually to see if there is hyphen or hyphen and empty space. Problem is: either I check all the hyphens manually or every line feed. Both options are very troublesome to do manually for books with hundreds of pages. I'm using MS Word to make few styles and then export that DOC file to InDesign and create an e-book. Can you recommend some programs to ease my working process or any other suggestions to make it easier? |
10-30-2009, 10:59 AM | #9 | |
Resident Curmudgeon
Posts: 75,907
Karma: 134368292
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
But why not start with the source that was used to create the PDF? |
|
10-30-2009, 02:58 PM | #10 |
Wizard
Posts: 1,196
Karma: 1281258
Join Date: Sep 2009
Device: PRS-505
|
Chang: going by the output you provide I wouldn't bother trying to get SomePDF to work properly. If it can't even handle hyphens correctly it's not worth using.
If you're looking for a free program, have you tried Mobipocket creator? You can use that to convert a PDF to html, and from some brief tests it seems that it respects tags reasonably well. Tagged paragraphs that are not separated with a blank line are simply given a break tag at the end, which shows up as a manual line-break in Word, but a simple search-replace is all that's needed to convert those back into paragraph marks. It also doesn't get confused by hypens (as long as they're soft hyphens, which any decent PDF-creation program should use for words that are split at line breaks). I wouldn't worry about ragged line-ends such as the ones you show in notepad-sample.gif. You're creating reflowable text and the reader will handle the line lengths when it lays out the eBook. As I said before, a lot depends on whether the PDF was properly tagged when it was initially created. If it wasn't, then there's no magic program to help and nothing for it but to go through the text and correct it by hand. |
11-02-2009, 03:03 PM | #11 |
Time Enough at Last
Posts: 387
Karma: 1151316
Join Date: Feb 2008
Location: New England
Device: iPad 3, iPhone 5, Kindle 3, Fire, Sony PRS-350
|
For a free solution you might try Book Designer. You'll need to clean up the output a bit, but after you do, save it to a lit file and then convert that over to an epub file using Calibre.
|
11-03-2009, 10:19 AM | #12 | |
Resident Curmudgeon
Posts: 75,907
Karma: 134368292
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Seeking advice: My reference book | Steven Lyle Jordan | Writers' Corner | 31 | 11-30-2009 09:49 AM |
Student Seeking advice | Stabiliser | Which one should I buy? | 3 | 04-28-2009 09:33 AM |
Desperately Seeking Software for Digitizing Books. | harryE123 | Reading and Management | 8 | 12-17-2008 08:33 PM |
Newbie seeking advice on what to buy | tarq | Which one should I buy? | 15 | 07-25-2008 01:23 PM |
Yet another noob seeking advice | Voice of Reason | Which one should I buy? | 6 | 04-01-2008 03:49 PM |