Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 10-28-2009, 01:42 PM   #1
Direct Ebooks
Hibernian eBook Warrior
Direct Ebooks is no ebook tyro.Direct Ebooks is no ebook tyro.Direct Ebooks is no ebook tyro.Direct Ebooks is no ebook tyro.Direct Ebooks is no ebook tyro.Direct Ebooks is no ebook tyro.Direct Ebooks is no ebook tyro.Direct Ebooks is no ebook tyro.Direct Ebooks is no ebook tyro.
 
Direct Ebooks's Avatar
 
Posts: 184
Karma: 1264
Join Date: Aug 2009
Location: Cork, Ireland
Device: Sony Reader
Desperately seeking.... advice on epub conversion?

Hi Guys.

I am interested in converting pdf files into Ebooks. I am currently outsourcing this work which is proving costly. Although the quality is excellent to be fair.

I would like to be able to do this work myself, but I'm unsure of what the best method is and I dont want to start on one path to find out another is better. I'm asking for your advice/experiences.

I want to produce proffessional quality epubs, fully indexed etc. I'm very familiar with all the technical issues, but should I use one of these automated packages or start at the beginning with xtml/css?
Or is adobe inDesign the best overall tool?

I'd appreciate all your comments and feedback
Direct Ebooks is offline   Reply With Quote
Old 10-28-2009, 01:45 PM   #2
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Moved to the appropriate forum section.
HarryT is offline   Reply With Quote
Advert
Old 10-29-2009, 12:27 AM   #3
charleski
Wizard
charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.
 
Posts: 1,196
Karma: 1281258
Join Date: Sep 2009
Device: PRS-505
I've tried various methods, but the one that has proved most successful for me is to edit the text in Word (making proper use of styles etc) and then use Atlantis to generate the ePub file. You could use Atlantis for everything, but I find Word 2007 easier to use for editing (mostly because I'm used to it and I have it anyway). The $35 for Atlantis is very reasonable for what it offers. The advantage to this is that I can edit the text in a word processor and don't have to guess what it will look like or fiddle with xml to set things up properly.

I have access to inDesign CS4 and have tried it for ePubs, and frankly it's inferior to Atlantis - you need to split the book into separate documents yourself in order to ensure that it doesn't go over the mobileADE 300k limit, which is just the sort of extra hassle I can do without. inDesign does offer more flexibility with ToC generation, has options for image manipulation and makes it easy to embed fonts in the ePub, but none of these justify the extra effort involved unless you have special needs.

I'm sure you can get excellent results converting rtf files in calibre as well, which has the benefit of being free. For best results you'll probably want to tweak the css settings and XPath options for the ToC etc. There are also a few free add-ons to Word floating around that might be worth checking out, though they tend to enforce their own particular notions and can be fiddly (hard to moan when they're free though).

One thing you need to realise is that PDFs are a real pain to convert. Very few are fully tagged, meaning that you need to scan through the text to correct broken paragraphs and incorrectly inserted line breaks or hyphens (I use a Word wildcard search for paragraphs, Find: ([!."\?\!\)])^13 Replace:\1 though you still need to check each instance). Each document will offer its own variation of the particular problems you can run into. I'm afraid there is no 1-click solution, converting a PDF can easily take a couple of hours, or much more depending on how much you need to reconform the text. A lot depends on how much variation there is in your text and how much you want to preserve that in the finished item. There are various options for saving the PDF as a docx file for editing. I happen to use Nuance PDF Converter, which generally does a decent job of stripping out headers and footers, though it can still trip up at times.
charleski is offline   Reply With Quote
Old 10-29-2009, 12:32 AM   #4
nomesque
Snooty Bestselling Author
nomesque ought to be getting tired of karma fortunes by now.nomesque ought to be getting tired of karma fortunes by now.nomesque ought to be getting tired of karma fortunes by now.nomesque ought to be getting tired of karma fortunes by now.nomesque ought to be getting tired of karma fortunes by now.nomesque ought to be getting tired of karma fortunes by now.nomesque ought to be getting tired of karma fortunes by now.nomesque ought to be getting tired of karma fortunes by now.nomesque ought to be getting tired of karma fortunes by now.nomesque ought to be getting tired of karma fortunes by now.nomesque ought to be getting tired of karma fortunes by now.
 
nomesque's Avatar
 
Posts: 1,485
Karma: 1000000
Join Date: Aug 2009
Location: Ipswich, QLD, Australia
Device: PRS-650
Do you have the source files for the PDFs?
nomesque is offline   Reply With Quote
Old 10-29-2009, 05:02 AM   #5
Direct Ebooks
Hibernian eBook Warrior
Direct Ebooks is no ebook tyro.Direct Ebooks is no ebook tyro.Direct Ebooks is no ebook tyro.Direct Ebooks is no ebook tyro.Direct Ebooks is no ebook tyro.Direct Ebooks is no ebook tyro.Direct Ebooks is no ebook tyro.Direct Ebooks is no ebook tyro.Direct Ebooks is no ebook tyro.
 
Direct Ebooks's Avatar
 
Posts: 184
Karma: 1264
Join Date: Aug 2009
Location: Cork, Ireland
Device: Sony Reader
Hi Guys.

Thanks for the advice.
Publishers present me with PDF/Quark files of their books and I then outsource their conversion to ePub.
The biggest challenge i feel will be converting the pdf's back to word/text file.
Are any of the auto programs any good for this?

Thanks again
Direct Ebooks is offline   Reply With Quote
Advert
Old 10-29-2009, 07:11 AM   #6
WillAdams
Wizard
WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.
 
WillAdams's Avatar
 
Posts: 1,234
Karma: 3350652
Join Date: Feb 2008
Device: Amazon Kindle Paperwhite (300ppi), Samsung Galaxy Book 12
Don't start from the .pdfs --- instead use the Quark source.

Dump to XPress Tags or .html or some other sort of tagged format, then massage that, adding back in anything which wasn't in the main text flow (or get a specialized XTension/utility such as textractor).

PDFs convert the formatting into localized text changes and positional information which is difficult to extract. If you must use a .pdf as a source, use a utility such as Marcel Weiher's TextLightning.app which will analyze that positional information and then allow you to use global search-replace techniques to convert the local-formatting into proper styles.

William
WillAdams is offline   Reply With Quote
Old 10-29-2009, 09:21 AM   #7
wallcraft
reader
wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.
 
wallcraft's Avatar
 
Posts: 6,975
Karma: 5183568
Join Date: Mar 2006
Location: Mississippi, USA
Device: Kindle 3, Kobo Glo HD
The only book on a similar subject I am aware of is Kindle Formatting: The Complete Guide. This is probably worth buying for anyone intending to format multiple ebooks. I don't remember if it discusses starting with PDFs though. Since this is somewhat Kindle-specific, there probably is room for a similar "ePub Formatting" book.
wallcraft is offline   Reply With Quote
Old 10-30-2009, 05:48 AM   #8
Chang
Connoisseur
Chang is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!Chang is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!Chang is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!Chang is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!Chang is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!Chang is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!Chang is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!Chang is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!Chang is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!Chang is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!Chang is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!
 
Posts: 87
Karma: 50000
Join Date: Oct 2009
Device: none
I'm facing the same problems as Direct Ebooks. I have PDF documents as source files and I need some easy way to edit them. One program is MS Word from where I can easily take it to InDesign and create an e-book. Problem is the conversion from PDF to DOC. I checked TextLightning.app which WillAdams mentioned but it's only for Mac and I have Win XP. I searched from google for "pdf to doc converter" but most of the softwares I found are shareware. I also tried to find open source software from http://www.sourceforge.net but didn't find any. For now, I have just found this http://www.somepdf.com/downloads.html which is free. I tried it but I'm facing new problems with it.

As you can see from the "pdf_sample.gif" file, there are 2 hyphens which just tell to the reader that the word is continueing to next row. If I copy&paste those words manually to notepad, hyphens will disappear and the words are showing correctly but the line feed is wrong as you can see from the "notepad_sample.gif".

When I use Some PDF tool to convert PDF to DOC, it leaves all the hyphens and the words are showing incorrectly as you can see from the "word_sample.gif". I should check all the hyphens manually because sometimes those are necessary. I can't just use find&replace and erase all the hyphens. Also, line feed creates sometimes one extra space between words so some words have hyphen and one empty space. That means I really need to check every case manually to see if there is hyphen or hyphen and empty space.

Problem is: either I check all the hyphens manually or every line feed. Both options are very troublesome to do manually for books with hundreds of pages. I'm using MS Word to make few styles and then export that DOC file to InDesign and create an e-book. Can you recommend some programs to ease my working process or any other suggestions to make it easier?
Attached Thumbnails
Click image for larger version

Name:	word_sample.GIF
Views:	403
Size:	9.9 KB
ID:	38294   Click image for larger version

Name:	notepad_sample.GIF
Views:	374
Size:	4.3 KB
ID:	38296   Click image for larger version

Name:	pdf_sample.GIF
Views:	335
Size:	14.9 KB
ID:	38297  
Chang is offline   Reply With Quote
Old 10-30-2009, 10:59 AM   #9
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,851
Karma: 128597114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by Direct Ebooks View Post
Hi Guys.

I am interested in converting pdf files into Ebooks.
What worked for me the last time I did it was to use Adobe Acrobat Professional to convert the PDF. Then you have to take the converted file and the PDF and carefully compare them. That is the only way to do it withiut ending up with a file full of errors.

But why not start with the source that was used to create the PDF?
JSWolf is online now   Reply With Quote
Old 10-30-2009, 02:58 PM   #10
charleski
Wizard
charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.
 
Posts: 1,196
Karma: 1281258
Join Date: Sep 2009
Device: PRS-505
Chang: going by the output you provide I wouldn't bother trying to get SomePDF to work properly. If it can't even handle hyphens correctly it's not worth using.

If you're looking for a free program, have you tried Mobipocket creator? You can use that to convert a PDF to html, and from some brief tests it seems that it respects tags reasonably well. Tagged paragraphs that are not separated with a blank line are simply given a break tag at the end, which shows up as a manual line-break in Word, but a simple search-replace is all that's needed to convert those back into paragraph marks. It also doesn't get confused by hypens (as long as they're soft hyphens, which any decent PDF-creation program should use for words that are split at line breaks).

I wouldn't worry about ragged line-ends such as the ones you show in notepad-sample.gif. You're creating reflowable text and the reader will handle the line lengths when it lays out the eBook.
As I said before, a lot depends on whether the PDF was properly tagged when it was initially created. If it wasn't, then there's no magic program to help and nothing for it but to go through the text and correct it by hand.
charleski is offline   Reply With Quote
Old 11-02-2009, 03:03 PM   #11
Timoleon
Time Enough at Last
Timoleon ought to be getting tired of karma fortunes by now.Timoleon ought to be getting tired of karma fortunes by now.Timoleon ought to be getting tired of karma fortunes by now.Timoleon ought to be getting tired of karma fortunes by now.Timoleon ought to be getting tired of karma fortunes by now.Timoleon ought to be getting tired of karma fortunes by now.Timoleon ought to be getting tired of karma fortunes by now.Timoleon ought to be getting tired of karma fortunes by now.Timoleon ought to be getting tired of karma fortunes by now.Timoleon ought to be getting tired of karma fortunes by now.Timoleon ought to be getting tired of karma fortunes by now.
 
Timoleon's Avatar
 
Posts: 387
Karma: 1151316
Join Date: Feb 2008
Location: New England
Device: iPad 3, iPhone 5, Kindle 3, Fire, Sony PRS-350
For a free solution you might try Book Designer. You'll need to clean up the output a bit, but after you do, save it to a lit file and then convert that over to an epub file using Calibre.
Timoleon is offline   Reply With Quote
Old 11-03-2009, 10:19 AM   #12
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,851
Karma: 128597114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by Direct Ebooks View Post
Hi Guys.

Thanks for the advice.
Publishers present me with PDF/Quark files of their books and I then outsource their conversion to ePub.
The biggest challenge i feel will be converting the pdf's back to word/text file.
Are any of the auto programs any good for this?

Thanks again
Use the Quark file and export it to HTML if possible and go from there. Using the PDF is going to be no end of hassle.
JSWolf is online now   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Seeking advice: My reference book Steven Lyle Jordan Writers' Corner 31 11-30-2009 09:49 AM
Student Seeking advice Stabiliser Which one should I buy? 3 04-28-2009 09:33 AM
Desperately Seeking Software for Digitizing Books. harryE123 Reading and Management 8 12-17-2008 08:33 PM
Newbie seeking advice on what to buy tarq Which one should I buy? 15 07-25-2008 01:23 PM
Yet another noob seeking advice Voice of Reason Which one should I buy? 6 04-01-2008 03:49 PM


All times are GMT -4. The time now is 05:27 PM.


MobileRead.com is a privately owned, operated and funded community.