View Single Post
Old 05-27-2012, 11:53 PM   #12
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 11,503
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
Quote:
Originally Posted by ralphiedee View Post
I have a set of pdfs that I need to export to epub. The request was actually for this to be done in IBooksAuthor but I will get to that after I explain. I'm no epub guru and know a little about this after doing a bunch of tests.

The client has the files in a pdf and after looking at it the pdf consists mostly of text, after extracting the pdf into a few separate pdfs I began to test.

First in Pages / Looking at the .epub file in an IPad
both views are too small, you cannot see the text within the pdf

then Iba/ Looking at the .ibooks file in an IPad
same thing here

with that known before I inform the client, I need to know the following

if there is a way to extract elements of a pdf file and convert into text and preserve the styling?

don't think so which means plan B which is to have the client send the files in the original app that the files were created in Microsoft publisher. That is bad as I heard that those files can only be opened in THAT app. Do you know of a way to convert that type of file into a word doc?


Last, lets say the client gets me the files in either word or another app like InDesign am I correct in assuming that the paragraph styles, images, fonts and any other elements must be saved and included when copying the files to either iba or Pages?

If not I have to do this from scratch for each page?

anyone?

RD
You can export MSPublisher files as html; txt, postscript, filtered html, Word, Works--PDF, for that matter--almost any format you can imagine. It doesn't work perfectly, but, trust me when I tell you, almost ANYTHING is better than trying to convert PDF into anything else. We do this for a living around here, and I've lost track of the number of so-called "Word" files we get from prospective clients (who know we charge more for PDF convos than from any WP-type program) that are the result of so-called "conversion programs." I won't speak to Calibre; Kovid's done a lot of good work.

But every other "convert from PDF" program out there produces exactly what Huebi said it does. I opened two files this past week that gave me "bad juju" (in the original sense, not the new urban sense), and exported them out in html, only to be starting at every single word having not one, but two, spans; a span encompassing the first letter of the word (inexplicably--no formatting that would explain that) and the remainder of the word. This is invariably the result of attempting to export PDF to either Word or html; I've yet to see any program function differently. You'll have to be prepared for a boatload of proofing in html or in Word, just for the text issues, and THEN in html, for the garbage-code issues. If you're doing this for yourself, that's fine--but for a client? You need to clean out that code, particularly if you're attempting to format for iBooks, which is notoriously finicky; those rampant wild spans will wreak havoc with any text-formatting you intend to add/keep. Be prepared for a LOT of regex. Using iBooks iAuthor won't change any of that.

Jut my $.02,
Hitch
Hitch is offline   Reply With Quote