|02-23-2013, 01:39 PM||#16|
Join Date: Nov 2009
Device: PW2 2014
That's what happens with PDF files that contain text (FineReader converts them into images), so I'm a bit sceptical that it somehow extracts the JPGs from the PDF without further processing...
|02-23-2013, 03:44 PM||#17|
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-300, PRS-T1
It is a structurally different PDF, so I would not be surprised. The tools to extract the images/jpg's from the PDF are easy to get and open source. Why would ABBYY not incorporate those algorithms? It is not that hard.
|01-23-2014, 05:41 PM||#18|
Join Date: Nov 2010
Device: Kobo Aura HD, Sony PRS (T1,T2), PocketBook 902
Therefore I do this:
OCR in FineReader -> save as odt -> open in OpenOffice -> run PerfectEpub (after possibly other cleaning with regex find/replace, etc.) -> writer2ePub (or save as odt or as html and then use Calibre converter - whichever works better) -> SIGIL (where you again can do regex find/replace, merge/split if necessary, etc.)
However it is better to get rid of any page numbers / headers before PerfectEpub.
FineReader 11 is pretty good at recognizing headers/footers so they are not much of a problem.
PerfectEpub joins wrongly split lines (paragraphs) with one click and also splits wrongly joined lines, etc. I don't understand why FineReader can't do this itself, though. If it can, I need to find out how...
I use PerfectEpub on already made epubs and other formats too, if they have wrongly split lines or wrongly joined lines in them.
For an epub, I do this: epub -> htmlz -> extract files -> open in OpenOffice -> run PerfectEpub -> save back to html (or run writer2ePub)
The line joining / splitting in such cases when the information about the original pages is no longer available can be done with regex find/replace in sigil directly, but it requires multiple regex expressions to be used and different for pretty much each epub, so PerfectEpub is a much quicker solution.
Last edited by parkher; 01-23-2014 at 05:47 PM.
|04-19-2014, 01:40 AM||#19|
Join Date: Mar 2014
Device: Kobo Aura
For me, I have tried so many, many pdf to epub readers that I despaired of trying one more. However, after reading the stuff on the internet, I decided to look up ABBYY FineReader and lo and behold, they actually have a direct pdf to epub converter called ABBYY PDF Converter.
After trying the trial (converts 100 pages max), I decided to plonk down the money for the full version.
End result; I think I converted > 40 books and ONLY one didn't convert properly. Most converted with graphics intact and > 30% had their TOC links done properly!
I regretted spending all that time with all those other pdf to epub converters; you name it, I'd have tried it (Calibre, EPUB Converter, Doremisoft, 3DPageFlip, Vibosoft, PDFMate, iStonsoft, Go4ePub - this is an online site...) and for some reason, a lot of them look suspiciously alike so either they had the same underlying product and they just customized the look and feel or some of them pirate it from a main source and spinned it off on their own.
The reason I gave ABBYY a chance is because they're a reputable OCR software vendor too so I figured if they can do OCR well, they surely can do something about the pesky PDF internal structure/markups.
I absolutely have no other business interest in ABBYY other than wonder why they didn't market this product well - if you guys are still vexing over the conversion, you'd take a look.
|04-22-2014, 04:05 PM||#20|
Bookmaker & Cat Slave
Join Date: Apr 2010
Location: Phoenix, AZ
Device: Kindle2, iPad, KindleFire and NookColor
One of the things that Abbyy is very good about is making everything look pretty good on the surface, but once you delve into it--lo, here there be dragons. Have you spent a lot of time looking at the coding of the ePUBS that you've created, just out of curiosity?
|Thread Tools||Search this Thread|
|Thread||Thread Starter||Forum||Replies||Last Post|
|A workflow for generating epub files from InDesign||Man Eating Duck||ePub||5||01-27-2013 08:47 AM|
|Workflow - XHTML to mobi to ePub||lissie||Workshop||7||01-23-2013 04:22 AM|
|Persisting html-to-epub workflow||Chaihana Joe||Calibre||2||01-28-2012 06:37 PM|
|Smooth workflow from HTML to Sigil epub||useroo||Sigil||1||07-04-2011 01:31 AM|
|Opinion on workflow (and enhancing it) - research-type workflow||TheDarkTrumpet||Which one should I buy?||8||03-02-2009 11:41 AM|