Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 05-04-2013, 03:53 PM   #1
crashnburn
Groupie
crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.
 
Posts: 154
Karma: 2160280
Join Date: Jul 2009
Device: iPad1 iOS 5.1.1b, iPhone 4
Arrow Steps/ best practices for converting PDFs to ePUBs? Thoughts and Ideas

Steps/ best practices for converting PDFs to ePUBs? Thoughts and Ideas:

Just to preserve this knowledge and information in contextually separate topic/ subject I am creating a fresh thread here - Please post replies/ ideas relating to the conversion discussion here -

ORIGINAL INFO:
I have some PDFs that I am reading & highlighting in Good Reader I'd like to push to ePub so that I could do so in Marvin instead.

PS: I am posting additional details, thoughts, questions relating to my post and now the knowledge from your replies. Please do check it out and share thoughts.

UPDATED INFO:
- I understand the variation and variety of PDFs and the structural and placement related markers that they may have within make this complex - sometimes next to impossible
- If these are scanned books/ pages then the OCR conversion is another major step
- If the text and graphics are placed in various boxes in weird ways, god help us

BUT:
IF.. If the PDF is fairly simple structured and has a single column flow of text (with selectable Text - No OCR aspect here) - where structure and markers are relatively simpler..
...
WHICH TOOLS and STEPS would you suggest?

I have one such PDF - Looks almost like a Word Document saved as a PDF.

Quote:
Originally Posted by crashnburn View Post
Is there a thread/ location/ tutorial that outlines steps/ best practices for converting PDFs to ePUBs?

I have some PDFs that I am reading & highlighting in Good Reader I'd like to push to ePub so that I could do so in Marvin instead.

I am sure faterson has expertise on this. Wondering if there is a thread/ tutorial / steps that are a recommended read.
Quote:
Originally Posted by Faterson View Post
PS: If, despite the above warning, you decide to go ahead, I recommend not to use Calibre for PDF conversion. I've had better results using the old Mobipocket Reader (killed by Amazon similarly to Stanza). Here is the install file for Windows. The result of the automatic, one-click conversion will still be very poor, but that's just the way it is.

For optimal quality of conversion from PDF to EPUB, you need to sacrifice all those extra hours of manual work, and use top-quality OCR software such as FineReader. I create a HTML file in FineReader, then fine-tune that HTML file by manual coding (using the EditPlus plain-text editor for Windows) until its code is approved by W3C's validator. No fluff must be left inside the file -- the CSS must be minimalistic. Finally, I convert the HTML file to EPUB in Calibre, and that's it.
Quote:
Originally Posted by Faterson View Post
I indeed have expertise on that, and that expertise says: don't bother!

It's sad but true. My daily reading is split roughly evenly between Marvin and GoodReader, precisely because converting PDFs to EPUBs is often a hopeless undertaking. And, many books (especially old, scanned editions) are only available as PDF files -- or the EPUB versions of the same texts are of such ridiculously bad quality, they are unreadable. (Yes, I'm talking about you, archive.org.)

Only this weekend, I was converting a short novel (novelette), 25 thousand words, 45 pages of PDF source file, from PDF to EPUB, precisely so that I could enjoy reading it in Marvin, rather than GoodReader.

I used the best available OCR software for the conversion, which is FineReader.

Even so, it took me nearly 5 hours (!) to convert the PDF file so that I was satisfied with the EPUB result. It's just impractical. However, this was a novelette I deeply cared about, so I was willing to sacrifice the 5 hours of my time for the conversion. I would, of course, not be ready to do that on a regular basis, because my remuneration for the work was exactly 0 cents. The only reward I'll get will be the pleasure of reading that file in Marvin. Hell, that's enough for me (in this special case).
Quote:
Originally Posted by Jessica Lares View Post
I will add to that too and also give the same opinion. PDFs are usually designed to be printed and are made in programs like InDesign, Quark, and Acrobat which pretty much work as WYSIWYG (what you see is what you get) editors.

Most of the text is done in individual boxes, one for the heading, one for each paragraph, column, etc. And they're layered, so you're just hoping that the writer did add them one after another, which is never the case. This becomes apparent when you're making selections and something else is being highlighted.

Stick any PDF document into Adobe's Acrobat editor, and you literally see how awful the setup is.

I would think OCR would work better with a flattened image, as long as it was 300dpi or more.
Thoughts?

Last edited by crashnburn; 05-04-2013 at 03:58 PM.
crashnburn is offline   Reply With Quote
Old 05-04-2013, 04:03 PM   #2
crashnburn
Groupie
crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.
 
Posts: 154
Karma: 2160280
Join Date: Jul 2009
Device: iPad1 iOS 5.1.1b, iPhone 4
Attempt 1:

I placed this PDF into Acrobat XI and as I move the text cursor down, its moving up and down almost like a Word Document through the few pages.

PDF has no security / locks on it.

Save as Other > Word & Word 03 - Both give some kind of failed message.

Save As failed to process this document.

I will try in some BlueBeam PDF as well.
crashnburn is offline   Reply With Quote
Advert
Old 05-04-2013, 04:09 PM   #3
crashnburn
Groupie
crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.
 
Posts: 154
Karma: 2160280
Join Date: Jul 2009
Device: iPad1 iOS 5.1.1b, iPhone 4
BB PDF was able to start but fired and Error which shows some info about how PDF was converted.

Code:
Description:
  Stopped working

Problem signature:
  Problem Event Name:	CLR20r3
  Problem Signature 01:	bluebeam.exporter.exe
  Problem Signature 02:	9.0.4241.25966
  Problem Signature 03:	4e45aa32
  Problem Signature 04:	SolidFramework
  Problem Signature 05:	7.0.1285.0
  Problem Signature 06:	4d06172a
  Problem Signature 07:	2be
  Problem Signature 08:	e1
  Problem Signature 09:	System.InvalidOperationException
  OS Version:	6.1.7601.2.1.0.256.1
  Locale ID:	1033

Read our privacy statement online:
  http://go.microsoft.com/fwlink/?linkid=104288&clcid=0x0409

If the online privacy statement is not available, please read our privacy statement offline:
  C:\Windows\system32\en-US\erofflps.txt
crashnburn is offline   Reply With Quote
Old 05-04-2013, 04:15 PM   #4
crashnburn
Groupie
crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.crashnburn ought to be getting tired of karma fortunes by now.
 
Posts: 154
Karma: 2160280
Join Date: Jul 2009
Device: iPad1 iOS 5.1.1b, iPhone 4
Ok. I installed MobiPocket Reader as per your suggestion. I can open it inside it, how do I get an ePub or Mobi from it that I can put into Marvn or so?
crashnburn is offline   Reply With Quote
Old 05-04-2013, 05:31 PM   #5
DSpider
Evangelist
DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.
 
DSpider's Avatar
 
Posts: 450
Karma: 343115
Join Date: Nov 2009
Location: Romania
Device: PW2 2014
Get the latest version of FineReader, OCR it, proofread it, save it as HTML, import into Sigil. Or, I guess that using the latest version (FineReader 11) you could export it directly as an ePub, but the styles will probably be all over the place.

I would export it as DOC/DOCX, run my custom Word macro to "dumb" the text down to its core components (bolds, italics, etc) and then redo the layout either in InDesign, or apply quick styles in Word and then save as a Filtered HTML and then import into Sigil. Hmmm... There are a few routes and software programs that you can use, but just make sure that you proofread the final product. Proofreading is damn important.

Oh, and one last thing, FineReader will OCR the PDFs as a bunch of images, so the images that it saves as "Pictures" are pretty much screenshots of a screenshot, sort of speaking. You're probably better off exporting the images from Adobe Acrobat or something like that, so that you don't lose quality.
DSpider is offline   Reply With Quote
Advert
Old 05-04-2013, 05:52 PM   #6
Byrdie
Walking Library
Byrdie ought to be getting tired of karma fortunes by now.Byrdie ought to be getting tired of karma fortunes by now.Byrdie ought to be getting tired of karma fortunes by now.Byrdie ought to be getting tired of karma fortunes by now.Byrdie ought to be getting tired of karma fortunes by now.Byrdie ought to be getting tired of karma fortunes by now.Byrdie ought to be getting tired of karma fortunes by now.Byrdie ought to be getting tired of karma fortunes by now.Byrdie ought to be getting tired of karma fortunes by now.Byrdie ought to be getting tired of karma fortunes by now.Byrdie ought to be getting tired of karma fortunes by now.
 
Posts: 253
Karma: 3869938
Join Date: Sep 2012
Location: Canada
Device: Kobo Libra H20, Kindle Paperwhite 5, 16 gb version
I just got PDFtoEpub for free here: http://www.pdftoepub.com/authorspromotion.asp and so far it has done very well converting some books I had that even Mobipocket Reader/Creator couldn't handle. The resulting epubs may still need a few corrections in Sigil, however, to fix things like spacing or the infamous "missing" apostrophes, and on occasion, the apostrophes that somehow get switched to reversed quotation marks but to be fair, many programs do that when converting .pdf files, not just this one. So far I like it, you may want to give it a try.
Byrdie is offline   Reply With Quote
Old 05-04-2013, 07:50 PM   #7
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
I use PDF converter professional by Nuance to convert PDF to word and then Atlantis can easily convert word to ePub.

Dale
DaleDe is offline   Reply With Quote
Old 05-07-2013, 09:12 AM   #8
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
pdftoepub does not allow redistribution.
mrmikel is offline   Reply With Quote
Old 05-07-2013, 12:11 PM   #9
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 79,665
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
The only way to convert PDF > ePub is to A/B compare the ePub to the PDF. That means every character, every space, every punctuation mark, every graphic, every bold, every italic and anything else in the PDF. There is no way to convert a novel length PDF that won't have any error.

Yes, it will take time to properly A/B. But that's the only way to do it.
JSWolf is offline   Reply With Quote
Old 05-07-2013, 05:23 PM   #10
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
Any conversion from columns is especially difficult because it can go along being perfect for 10-20 pages and then jumble things up, putting parts of paragraphs above or below invisible to casual scanning. It gets especially bad near any illustration or small blocks of text.

The removal of extra colors and font sizes not in the original text is another playground of the devil in search and replaces that can go wrong too.
mrmikel is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
trouble when converting many epubs to epubs comet Conversion 13 03-21-2012 01:57 AM
Convert PDFs into readable EPUBs skinnymojo Conversion 3 01-23-2012 03:06 PM
Converting cyrillic files to epub, best practices? Fking Calibre 6 01-09-2011 06:06 AM
Whats the best reader for ePubs and PDFs? BIG45-70 Which one should I buy? 3 07-28-2010 01:35 PM
designing epubs for ipad? (my thoughts) hapax legomenon ePub 18 02-17-2010 08:21 PM


All times are GMT -4. The time now is 04:30 PM.


MobileRead.com is a privately owned, operated and funded community.