Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 04-06-2019, 11:59 AM   #1
Ubiquity
Member
Ubiquity began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Apr 2019
Device: Android phone
Most accurate method to convert PDF -> ePub

Hello, give me please suggestion how to do best in converting retail PDF ebook (ie. no scanned pages as bitmap, but a true selectable text) to ePub. Tried Calibre's internal convertor but that doesn't produce satisfactory result (resulting ebook contains nonsense characters by places and seems to be missing parts of the text). Tried also ABBYY PDF Transformer, but the result is yet worse. Of course I need a readable ebook that reflows the paging and preserves the original text formatting/structure as closely as possible.
Ubiquity is offline   Reply With Quote
Old 04-06-2019, 05:20 PM   #2
BetterRed
null operator
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 14,009
Karma: 11423372
Join Date: Mar 2012
Location: Sydney Australia
Device: none
In my experience there's no single most accurate method; one invariably has to do some post conversion editing.

Some folks are prepared to spend time tweaking and trying different converters to find the best, yet still imperfect, settings for individual PDFs.

Whilst others, use one or two methods to do 'rough and ready' conversions and then deal with the issues such as broken paragraphs, botched ligatures etc in the output, using saved searches, epub editor plugins, addons for Word etc. I'm one of these.

BR
BetterRed is offline   Reply With Quote
Old 04-10-2019, 08:41 AM   #3
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,057
Karma: 8500441
Join Date: Jun 2011
Location: California
Device: iPad
My page on the topic.
willus is offline   Reply With Quote
Old 04-10-2019, 07:52 PM   #4
Pablo
Guru
Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.
 
Pablo's Avatar
 
Posts: 959
Karma: 4100001
Join Date: Mar 2009
Location: Rosario, Argentina
Device: SONY PRS-505, PRS-T2
Quote:
Originally Posted by willus View Post
My page on the topic.
Interesting page! Thank you.
Pablo is offline   Reply With Quote
Old 04-10-2019, 07:56 PM   #5
Pablo
Guru
Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.
 
Pablo's Avatar
 
Posts: 959
Karma: 4100001
Join Date: Mar 2009
Location: Rosario, Argentina
Device: SONY PRS-505, PRS-T2
My usual workflow:

1. Crop with Briss to do away with headers and footers, if necessary.
2. Convert with Mobipocket Creator.
3. Load the HTML with Sigil.
4. Edit during endless hours.

In willus' page you can find links to the Briss download page. Mobipocket Creator site is down for ever, but I suppose you can get the installer somewhere else if you look hard enough.

I have yet to try loading a pdf with Word and see how it converts, as willus suggests.

Problems normally found when converting pdf files:

1. Broken paragraphs, specially at page breaks.
2. Lost scene changes.
3. Lost formatting: italics, bold, font size changes.

Good luck!

Last edited by Pablo; 04-10-2019 at 08:04 PM.
Pablo is offline   Reply With Quote
Old 04-11-2019, 06:58 PM   #6
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,239
Karma: 12666678
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
On of the great things about converting to word either by importing into word or using Acrobat that can now convert to word is that you are left with a file that is easy to edit and powerful to create CSS using styles automatically when converted to ePub. Word itself is page oriented so it is easy to compare and yet word flows easily as well.

Dale
DaleDe is offline   Reply With Quote
Old 04-14-2019, 02:17 PM   #7
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 7,764
Karma: 75464863
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, and NookColor. 2 Droid, 1 Win8 ePUB rdrs
Quote:
Originally Posted by DaleDe View Post
On of the great things about converting to word either by importing into word or using Acrobat that can now convert to word is that you are left with a file that is easy to edit and powerful to create CSS using styles automatically when converted to ePub. Word itself is page oriented so it is easy to compare and yet word flows easily as well.

Dale
BUT...if that really worked, commercial firms like mine would do that. And we don't.

To this day, we still use ABBYYFineReader, and "convert" the PDF via scanning & OCR. Now...that's a lot of time, effort and money to spend, if "save as Word" genuinely worked worth a damn.

The cruft underneath that's created, using ANY "save as Word" or one of the ubiquitous websites, that all pretty mjch use Calibre's API, is mind-boggling.

In short, IMHO, there's no "good" way to convert a PDF to ePUB. It's a lot of steps and a lot of work.

Hitch
Hitch is offline   Reply With Quote
Old 04-14-2019, 03:11 PM   #8
Tarana
Wizard
Tarana ought to be getting tired of karma fortunes by now.Tarana ought to be getting tired of karma fortunes by now.Tarana ought to be getting tired of karma fortunes by now.Tarana ought to be getting tired of karma fortunes by now.Tarana ought to be getting tired of karma fortunes by now.Tarana ought to be getting tired of karma fortunes by now.Tarana ought to be getting tired of karma fortunes by now.Tarana ought to be getting tired of karma fortunes by now.Tarana ought to be getting tired of karma fortunes by now.Tarana ought to be getting tired of karma fortunes by now.Tarana ought to be getting tired of karma fortunes by now.
 
Tarana's Avatar
 
Posts: 2,344
Karma: 20302328
Join Date: Sep 2012
Location: Minneapolis
Device: Voyage, K3, PPW 2 3, HDX, KBasic 5, 7 & 8, Nook Glo3, Echos, Nanos
I just have to say that I'm really surprised that ABBYY Transformer did a crappy job. I use that all the time to convert pdf files created from scanned books (I can't read paper books any longer) which are nearly all novels. Is what you are converting a textbook or something with lots of graphics? Cookbooks often don't convert well, for instance with all those fractions and formatting complexities.
Tarana is offline   Reply With Quote
Old 04-24-2019, 02:54 PM   #9
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,521
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-300, PRS-T1
As far as I know, there is only one real method. OCR (preferably with ABBYY), edit the outcome either in a Word processor as ePUB to the fullest and correct *ALL* OCR errors including missing comma's and alike. Proofread the book at least 3 times and then you will have caught most errors.

Now, my Word add-in can help you a lot in catching OCR errors, but for sure not all (unless the source is superb, which it never is).
Toxaris is offline   Reply With Quote
Old 04-25-2019, 09:19 AM   #10
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 7,764
Karma: 75464863
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, and NookColor. 2 Droid, 1 Win8 ePUB rdrs
Quote:
Originally Posted by Toxaris View Post
As far as I know, there is only one real method. OCR (preferably with ABBYY), edit the outcome either in a Word processor as ePUB to the fullest and correct *ALL* OCR errors including missing comma's and alike. Proofread the book at least 3 times and then you will have caught most errors.

Now, my Word add-in can help you a lot in catching OCR errors, but for sure not all (unless the source is superb, which it never is).
And that is exactly right. We've NEVER found a better way, and God knows, I wish we could. It's onerous to have to do that for every bloody book that shows up at my shop, in PDF, but...there is no faster, easier, and MORE ACCURATE way than that. To get all three, you're stuck with scanning & OCR.

Hitch
Hitch is offline   Reply With Quote
Old 04-25-2019, 10:44 AM   #11
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 52,799
Karma: 47787659
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Aura H2O, Sony PRS-650, Sony PRS-T1, nook STR, iPad 4, iPhone 5
There is only one accurate way to convert a novel sized PDF > ePub. That is to convert it however you want. Here comes the accurate part. You have to compare the PDF to the ePub. You have to compare every space, every punctuation mark, every word, everything to make sure the ePub matches the PDF. That's the most accurate way of converting a PDF. There is no automate way to do it. You have to do the comparing no matter how you convert.
JSWolf is offline   Reply With Quote
Old 04-25-2019, 04:51 PM   #12
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 7,764
Karma: 75464863
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, and NookColor. 2 Droid, 1 Win8 ePUB rdrs
Quote:
Originally Posted by JSWolf View Post
There is only one accurate way to convert a novel sized PDF > ePub. That is to convert it however you want. Here comes the accurate part. You have to compare the PDF to the ePub. You have to compare every space, every punctuation mark, every word, everything to make sure the ePub matches the PDF. That's the most accurate way of converting a PDF. There is no automate way to do it. You have to do the comparing no matter how you convert.
Wolfie:

Yes, pretty much, although we do our comparing BEFORE we make the ePUB. SSDD, though. The comparison has to be done, you are absolutely right about that.

Hitch
Hitch is offline   Reply With Quote
Old 04-29-2019, 01:28 PM   #13
lumpynose
Guru
lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.
 
Posts: 938
Karma: 4444444
Join Date: Jul 2012
Device: Palm Pilot M105
Quote:
Originally Posted by willus View Post
My page on the topic.
Thanks for the link to Sumatra. I used to install an alternative to Adobe's Reader but never found one I really liked.

The current size of Adobe Reader DC on Windows 64 is 306 MB. I had no idea it was so bloated. (Not that it really matters these days with today's drive sizes, but let's get serious.)
lumpynose is online now   Reply With Quote
Old 04-30-2019, 12:19 PM   #14
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 7,764
Karma: 75464863
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, and NookColor. 2 Droid, 1 Win8 ePUB rdrs
I wanted to add an interesting experience here, that I just dealt with, yesterday.

A longish story as short as possible--I was contacted by a prospective client, who had typed his Opus on a Brother dedicated WP, back in the mid-'90s. Had it in print, so had that scanned, a few years back. Of course, it turned out that the pdf is an image-only PDF, no text layer.

I'd started to tell the client that we'd have to OCR it again, yadda, but...on a flyer, I ran it through OCR, in Acrobat Pro. Then, I exported the new PDF, which now had a text layer, to Word.

And I'll be dammed, but the resulting file is NOT horrible. I mean, with a modicum of cleanup--not beyond the regular person--it could be entirely usable. I was pretty gobsmacked because the source PDF was not wonderful. It wasn't the worst I've ever seen (a scanned copy of a multiply-faxed document--that was the worst), but it wasn't crisp, either, and the pages were not wildly straight. But it worked, and the resulting Word file was not bad at all.

So...there are, sometimes, shortcuts to the Abbyy scan/OCR process that can work. I would not have ever thought that they existed; in a decade, I've never seen it work before, but it did this time. I would then suggest that you at least try the shortcut methods, to see if you can pull one out of the hat, too. It's worth the 5-10 minutes' of time, compared to what the longer routes take.

Hitch
Hitch is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Convert an epub to a pdf from another pdf sample file SvenSND Conversion 3 09-02-2016 04:29 PM
Convert epub to pdf, with notes with main text in the pdf? 8140david ePub 1 06-18-2015 01:13 PM
Convert epub to pdf, with notes with main text in the pdf? 8140david Conversion 1 06-18-2015 11:02 AM


All times are GMT -4. The time now is 12:05 AM.


MobileRead.com is a privately owned, operated and funded community.