Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 05-21-2012, 12:36 PM   #1
ralphiedee
Zealot
ralphiedee is no ebook tyro.ralphiedee is no ebook tyro.ralphiedee is no ebook tyro.ralphiedee is no ebook tyro.ralphiedee is no ebook tyro.ralphiedee is no ebook tyro.ralphiedee is no ebook tyro.ralphiedee is no ebook tyro.ralphiedee is no ebook tyro.ralphiedee is no ebook tyro.
 
Posts: 103
Karma: 1370
Join Date: Mar 2012
Device: none
pdf files to epub OR ibooks

I have a set of pdfs that I need to export to epub. The request was actually for this to be done in IBooksAuthor but I will get to that after I explain. I'm no epub guru and know a little about this after doing a bunch of tests.

The client has the files in a pdf and after looking at it the pdf consists mostly of text, after extracting the pdf into a few separate pdfs I began to test.

First in Pages / Looking at the .epub file in an IPad
both views are too small, you cannot see the text within the pdf

then Iba/ Looking at the .ibooks file in an IPad
same thing here

with that known before I inform the client, I need to know the following

if there is a way to extract elements of a pdf file and convert into text and preserve the styling?

don't think so which means plan B which is to have the client send the files in the original app that the files were created in Microsoft publisher. That is bad as I heard that those files can only be opened in THAT app. Do you know of a way to convert that type of file into a word doc?


Last, lets say the client gets me the files in either word or another app like InDesign am I correct in assuming that the paragraph styles, images, fonts and any other elements must be saved and included when copying the files to either iba or Pages?

If not I have to do this from scratch for each page?

anyone?

RD
ralphiedee is offline   Reply With Quote
Old 05-21-2012, 01:44 PM   #2
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 9,653
Karma: 5072002
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2
PDF is the worst source document format to convert. Can you get a hold of the source documents in their original format? That would be much better. There should be a way to export even Microsoft publisher into some other format.
DaleDe is offline   Reply With Quote
 
Enthusiast
Old 05-21-2012, 02:40 PM   #3
ralphiedee
Zealot
ralphiedee is no ebook tyro.ralphiedee is no ebook tyro.ralphiedee is no ebook tyro.ralphiedee is no ebook tyro.ralphiedee is no ebook tyro.ralphiedee is no ebook tyro.ralphiedee is no ebook tyro.ralphiedee is no ebook tyro.ralphiedee is no ebook tyro.ralphiedee is no ebook tyro.
 
Posts: 103
Karma: 1370
Join Date: Mar 2012
Device: none
Well I figured either two ways. If the client can get me those files in word or Indesign then no matter what I have to re style the files for epub. I just finished copying and pasting text from one of the pdf pages I converted to ms. word client must understand that this will add a lot of time to the project as I have to take the background image, make it a template in iba or pages which ever the client chooses then style each paragraph to get close to the pdf style.

If you know of an easier way let me know.

RD
ralphiedee is offline   Reply With Quote
Old 05-21-2012, 05:53 PM   #4
signum
Connoisseur
signum is that somebody.signum is that somebody.signum is that somebody.signum is that somebody.signum is that somebody.signum is that somebody.signum is that somebody.signum is that somebody.signum is that somebody.signum is that somebody.signum is that somebody.
 
Posts: 58
Karma: 45332
Join Date: Aug 2011
Device: none
Quote:
Originally Posted by ralphiedee View Post
Well I figured either two ways. If the client can get me those files in word or Indesign then no matter what I have to re style the files for epub. I just finished copying and pasting text from one of the pdf pages I converted to ms. word client must understand that this will add a lot of time to the project as I have to take the background image, make it a template in iba or pages which ever the client chooses then style each paragraph to get close to the pdf style.

If you know of an easier way let me know.

RD
Here's what works well for me: do the "heavy lifting" part of the conversion with Calibre and the final polishing with Sigil. Your favorite HTML editor can be handy also.

As stated before, PDF is the WORST format to try to convert from. However, one of the amazing capabilities of the Calibre program is its ability to convert many PDFs. Usually, I have no particular need for the library management and synching features of Calibre, so I use just the batch convert program, called ebook-convert. It's not very well-known, but is part of the Calibre package. Use this to convert from PDF to HTML. This is not a useless step because EPUB is just HTML+CSS. You can check your progress from time to time by looking at the HTML with your browser. Once you have this polished to your satisfaction, convert the HTML to EPUB using Sigil directly on the HTML. (Don't panic! This can take a while.) Remember to "save as" an EPUB file. Polish the EPUB a little more with Sigil. Proofread carefully, comparing to the original PDF, and you're done!

The biggest problem with the PDF format is that it discards the entire document structure. Even paragraph boundaries are lost. Everything is just pixels turned on at an x-y location on some virtual paper. Even so, the Calibre people have done some amazing things to even recognize most paragraphs, although not always perfectly. Graphic elements are saved with links to them in the text. Likewise, styling touches such as italics and bold are handled nicely. Even headings and indents are usually recognized. However, if there is more than one column of text or a multi-column table, you've got some work to do because these are converted in the order found in the PDF, that is, linearized. For instance, a two-column table might have all of column 1 converted first into appropriate HTML, followed by all of column 2. There's no way to tell from the PDF that it's even displaying a table. Newspaper columns sometimes have the opposite result: the lines of the columns are interleaved.

One final observation. Proofreading takes much more time than any of the converting and editing steps mentioned above. At least you know you have something to work with that's close to what you want.
signum is offline   Reply With Quote
Old 05-22-2012, 01:45 AM   #5
huebi
Zealot
huebi , Klaatu Barada Niktu!huebi , Klaatu Barada Niktu!huebi , Klaatu Barada Niktu!huebi , Klaatu Barada Niktu!huebi , Klaatu Barada Niktu!huebi , Klaatu Barada Niktu!huebi , Klaatu Barada Niktu!huebi , Klaatu Barada Niktu!huebi , Klaatu Barada Niktu!huebi , Klaatu Barada Niktu!huebi , Klaatu Barada Niktu!
 
Posts: 121
Karma: 5070
Join Date: Dec 2010
Device: none
PDF is a format not made to exchange data but for displaying data to humans in exact the way the publuisher wants it to be displayed. At leaqst there is no formatting at all, headers are big and bold, and paragraphs are not bound together buu only have margins to other objects.

Calibre produces shit as every other converting tool, too. In that case better strip all CSS rules and rebuild the book from scratch. Even after that, you have paragrpahs wich belong together, soft hyphens not being taken back, misinterpreted italics and so on and so one.

The best way of converting is to use on OCR program like Abby Finereader or Omnioage, export the text as .doc, open it with atlantis word processor, checking all misinterpreted sectioins and then convert it to epub ( atlantis do have a suitable epub export), Even after that you need to check the result with sigil or any other source based tool.

There is NO easy process from pdf to epub regardless what the tool programmer promises.

The very best way is to have another format.
huebi is offline   Reply With Quote
Old 05-22-2012, 07:05 AM   #6
ralphiedee
Zealot
ralphiedee is no ebook tyro.ralphiedee is no ebook tyro.ralphiedee is no ebook tyro.ralphiedee is no ebook tyro.ralphiedee is no ebook tyro.ralphiedee is no ebook tyro.ralphiedee is no ebook tyro.ralphiedee is no ebook tyro.ralphiedee is no ebook tyro.ralphiedee is no ebook tyro.
 
Posts: 103
Karma: 1370
Join Date: Mar 2012
Device: none
Great info to know. As I'm a web developer and just got a client who has a lot of epub / ibook projects.

thx

RD
ralphiedee is offline   Reply With Quote
Old 05-22-2012, 07:43 AM   #7
mrmikel
Book Twiddler
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,086
Karma: 1444487
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
You might want to get clear in your own mind the capabilities of epub vs ibooks.

To my mind, epub is more for e-ink readers whereas ibooks are for tablet computers. Both have their advantages and limitations, but they are not the same, though derived from the basic epub source. Its important to make sure the client does not imagine full motion video or animation in e-ink books, for example.

PDF is a terrible source, even plain text is better because you don't have to rip anything out. As for the text size problem, you may be able to look at epub in Sigil and see what is making the text small. That problem should be relatively easy to see for a web developer. However rooting out hundreds or thousands of extra entries will test your patience and sanity. Hopefully regular expressions are part of your toolkit. They can reduce insanity to mere extreme aggravation!
mrmikel is offline   Reply With Quote
Old 05-22-2012, 08:25 AM   #8
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 3,011
Karma: 3594657
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-300, PRS-T1
iBooks is not for tablet computers, it is for iPads. Not all tablets are Apple! On other tables ePUB will work fine. iBooks is limited to Apple products.
iBooks is a derivate of the ePUB standard.
Toxaris is offline   Reply With Quote
Old 05-22-2012, 09:06 AM   #9
pholy
Booklegger
pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.
 
pholy's Avatar
 
Posts: 1,799
Karma: 7999034
Join Date: Jun 2009
Location: Toronto, Ontario, Canada
Device: BeBook(1 & 2010), PEZ, PRS-505, Kobo BT, PRS-T1, Playbook, Kobo Touch
Quote:
iBooks is a deviant of the ePUB standard.
Fixed that for you...
pholy is offline   Reply With Quote
Old 05-22-2012, 01:39 PM   #10
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 9,653
Karma: 5072002
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2
Quote:
Originally Posted by pholy View Post
Fixed that for you...
HaHa! However iBooks is aimed toward the 3Pub 3 version, not the current version. It was a bit ahead of the finalizing of that version but mostly conforms but it missing some ePub 3 features.

Dale
DaleDe is offline   Reply With Quote
Old 05-27-2012, 10:44 PM   #11
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 37,693
Karma: 18475502
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Sony Reader PRS-650, iPad, nook STR
Quote:
Originally Posted by Toxaris View Post
iBooks is not for tablet computers, it is for iPads. Not all tablets are Apple! On other tables ePUB will work fine. iBooks is limited to Apple products.
iBooks is a derivate of the ePUB standard.
iBooks is also for iPhones sort of. The stupid margins make it an abomination on an iPhone.
JSWolf is online now   Reply With Quote
Old 05-27-2012, 11:53 PM   #12
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 2,504
Karma: 13870735
Join Date: Apr 2010
Location: Phoenix, AZ
Device: Kindle2, iPad, KindleFire and NookColor
Quote:
Originally Posted by ralphiedee View Post
I have a set of pdfs that I need to export to epub. The request was actually for this to be done in IBooksAuthor but I will get to that after I explain. I'm no epub guru and know a little about this after doing a bunch of tests.

The client has the files in a pdf and after looking at it the pdf consists mostly of text, after extracting the pdf into a few separate pdfs I began to test.

First in Pages / Looking at the .epub file in an IPad
both views are too small, you cannot see the text within the pdf

then Iba/ Looking at the .ibooks file in an IPad
same thing here

with that known before I inform the client, I need to know the following

if there is a way to extract elements of a pdf file and convert into text and preserve the styling?

don't think so which means plan B which is to have the client send the files in the original app that the files were created in Microsoft publisher. That is bad as I heard that those files can only be opened in THAT app. Do you know of a way to convert that type of file into a word doc?


Last, lets say the client gets me the files in either word or another app like InDesign am I correct in assuming that the paragraph styles, images, fonts and any other elements must be saved and included when copying the files to either iba or Pages?

If not I have to do this from scratch for each page?

anyone?

RD
You can export MSPublisher files as html; txt, postscript, filtered html, Word, Works--PDF, for that matter--almost any format you can imagine. It doesn't work perfectly, but, trust me when I tell you, almost ANYTHING is better than trying to convert PDF into anything else. We do this for a living around here, and I've lost track of the number of so-called "Word" files we get from prospective clients (who know we charge more for PDF convos than from any WP-type program) that are the result of so-called "conversion programs." I won't speak to Calibre; Kovid's done a lot of good work.

But every other "convert from PDF" program out there produces exactly what Huebi said it does. I opened two files this past week that gave me "bad juju" (in the original sense, not the new urban sense), and exported them out in html, only to be starting at every single word having not one, but two, spans; a span encompassing the first letter of the word (inexplicably--no formatting that would explain that) and the remainder of the word. This is invariably the result of attempting to export PDF to either Word or html; I've yet to see any program function differently. You'll have to be prepared for a boatload of proofing in html or in Word, just for the text issues, and THEN in html, for the garbage-code issues. If you're doing this for yourself, that's fine--but for a client? You need to clean out that code, particularly if you're attempting to format for iBooks, which is notoriously finicky; those rampant wild spans will wreak havoc with any text-formatting you intend to add/keep. Be prepared for a LOT of regex. Using iBooks iAuthor won't change any of that.

Jut my $.02,
Hitch
Hitch is offline   Reply With Quote
Old 05-28-2012, 12:31 PM   #13
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 37,693
Karma: 18475502
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Sony Reader PRS-650, iPad, nook STR
And one other thing you need to be prepared for is you have to do a 100% A/B comparison to make sure your conversion has not cause any errors with the text. And from a PDF conversion there are errors with the text you will have to fix.
JSWolf is online now   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
TidBITS: iBooks Now Opens EPUB Files Directly kjk Apple Devices 4 04-07-2011 03:07 PM
Send PDF files to iBooks? itimpi Devices 2 02-11-2011 02:49 PM


All times are GMT -4. The time now is 05:38 PM.


MobileRead.com is a privately owned, operated and funded community.