02-23-2013, 12:39 PM | #16 |
Evangelist
Posts: 450
Karma: 343115
Join Date: Nov 2009
Location: Romania
Device: PW2 2014
|
That's what happens with PDF files that contain text (FineReader converts them into images), so I'm a bit sceptical that it somehow extracts the JPGs from the PDF without further processing...
|
02-23-2013, 02:44 PM | #17 |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
It is a structurally different PDF, so I would not be surprised. The tools to extract the images/jpg's from the PDF are easy to get and open source. Why would ABBYY not incorporate those algorithms? It is not that hard.
|
01-23-2014, 04:41 PM | #18 |
Evangelist
Posts: 467
Karma: 369018
Join Date: Nov 2010
Device: BL Alita/Mimas/Ares, OB Note2/Note, KA One/H2O/HD, S PRS T2/T1, PB 902
|
So far I was not able to find anything better than PerfectEpub extension for OpenOffice.
Therefore I do this: OCR in FineReader -> save as odt -> open in OpenOffice -> run PerfectEpub (after possibly other cleaning with regex find/replace, etc.) -> writer2ePub (or save as odt or as html and then use Calibre converter - whichever works better) -> SIGIL (where you again can do regex find/replace, merge/split if necessary, etc.) However it is better to get rid of any page numbers / headers before PerfectEpub. FineReader 11 is pretty good at recognizing headers/footers so they are not much of a problem. PerfectEpub joins wrongly split lines (paragraphs) with one click and also splits wrongly joined lines, etc. I don't understand why FineReader can't do this itself, though. If it can, I need to find out how... I use PerfectEpub on already made epubs and other formats too, if they have wrongly split lines or wrongly joined lines in them. For an epub, I do this: epub -> htmlz -> extract files -> open in OpenOffice -> run PerfectEpub -> save back to html (or run writer2ePub) The line joining / splitting in such cases when the information about the original pages is no longer available can be done with regex find/replace in sigil directly, but it requires multiple regex expressions to be used and different for pretty much each epub, so PerfectEpub is a much quicker solution. Last edited by parkher; 01-23-2014 at 04:47 PM. |
04-19-2014, 12:40 AM | #19 |
Junior Member
Posts: 3
Karma: 10
Join Date: Mar 2014
Device: Kobo Aura
|
For me, I have tried so many, many pdf to epub readers that I despaired of trying one more. However, after reading the stuff on the internet, I decided to look up ABBYY FineReader and lo and behold, they actually have a direct pdf to epub converter called ABBYY PDF Converter.
After trying the trial (converts 100 pages max), I decided to plonk down the money for the full version. End result; I think I converted > 40 books and ONLY one didn't convert properly. Most converted with graphics intact and > 30% had their TOC links done properly! I regretted spending all that time with all those other pdf to epub converters; you name it, I'd have tried it (Calibre, EPUB Converter, Doremisoft, 3DPageFlip, Vibosoft, PDFMate, iStonsoft, Go4ePub - this is an online site...) and for some reason, a lot of them look suspiciously alike so either they had the same underlying product and they just customized the look and feel or some of them pirate it from a main source and spinned it off on their own. The reason I gave ABBYY a chance is because they're a reputable OCR software vendor too so I figured if they can do OCR well, they surely can do something about the pesky PDF internal structure/markups. I absolutely have no other business interest in ABBYY other than wonder why they didn't market this product well - if you guys are still vexing over the conversion, you'd take a look. |
04-22-2014, 03:05 PM | #20 | |
Bookmaker & Cat Slave
Posts: 11,462
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
Quote:
One of the things that Abbyy is very good about is making everything look pretty good on the surface, but once you delve into it--lo, here there be dragons. Have you spent a lot of time looking at the coding of the ePUBS that you've created, just out of curiosity? Hitch |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
A workflow for generating epub files from InDesign | Man Eating Duck | ePub | 5 | 01-27-2013 07:47 AM |
Workflow - XHTML to mobi to ePub | lissie | Workshop | 7 | 01-23-2013 03:22 AM |
Persisting html-to-epub workflow | Chaihana Joe | Calibre | 2 | 01-28-2012 05:37 PM |
Smooth workflow from HTML to Sigil epub | useroo | Sigil | 1 | 07-04-2011 12:31 AM |
Opinion on workflow (and enhancing it) - research-type workflow | TheDarkTrumpet | Which one should I buy? | 8 | 03-02-2009 10:41 AM |