Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 02-23-2013, 12:39 PM   #16
DSpider
Addict
DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.
 
DSpider's Avatar
 
Posts: 399
Karma: 326969
Join Date: Nov 2009
Location: Romania
Device: iPod touch 2G (16 GB)
That's what happens with PDF files that contain text (FineReader converts them into images), so I'm a bit sceptical that it somehow extracts the JPGs from the PDF without further processing...
DSpider is offline   Reply With Quote
Old 02-23-2013, 02:44 PM   #17
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 2,740
Karma: 2117255
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-300, PRS-T1
It is a structurally different PDF, so I would not be surprised. The tools to extract the images/jpg's from the PDF are easy to get and open source. Why would ABBYY not incorporate those algorithms? It is not that hard.
Toxaris is offline   Reply With Quote
Old 01-23-2014, 04:41 PM   #18
parkher
Addict
parkher ought to be getting tired of karma fortunes by now.parkher ought to be getting tired of karma fortunes by now.parkher ought to be getting tired of karma fortunes by now.parkher ought to be getting tired of karma fortunes by now.parkher ought to be getting tired of karma fortunes by now.parkher ought to be getting tired of karma fortunes by now.parkher ought to be getting tired of karma fortunes by now.parkher ought to be getting tired of karma fortunes by now.parkher ought to be getting tired of karma fortunes by now.parkher ought to be getting tired of karma fortunes by now.parkher ought to be getting tired of karma fortunes by now.
 
Posts: 300
Karma: 361752
Join Date: Nov 2010
Device: Kobo Aura HD, Sony PRS (T1,T2), PocketBook 902
So far I was not able to find anything better than PerfectEpub extension for OpenOffice.
Therefore I do this:

OCR in FineReader -> save as odt -> open in OpenOffice -> run PerfectEpub (after possibly other cleaning with regex find/replace, etc.) -> writer2ePub (or save as odt or as html and then use Calibre converter - whichever works better) -> SIGIL (where you again can do regex find/replace, merge/split if necessary, etc.)

However it is better to get rid of any page numbers / headers before PerfectEpub.
FineReader 11 is pretty good at recognizing headers/footers so they are not much of a problem.
PerfectEpub joins wrongly split lines (paragraphs) with one click and also splits wrongly joined lines, etc. I don't understand why FineReader can't do this itself, though. If it can, I need to find out how...

I use PerfectEpub on already made epubs and other formats too, if they have wrongly split lines or wrongly joined lines in them.
For an epub, I do this: epub -> htmlz -> extract files -> open in OpenOffice -> run PerfectEpub -> save back to html (or run writer2ePub)

The line joining / splitting in such cases when the information about the original pages is no longer available can be done with regex find/replace in sigil directly, but it requires multiple regex expressions to be used and different for pretty much each epub, so PerfectEpub is a much quicker solution.

Last edited by parkher; 01-23-2014 at 04:47 PM.
parkher is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
A workflow for generating epub files from InDesign Man Eating Duck ePub 5 01-27-2013 07:47 AM
Workflow - XHTML to mobi to ePub lissie Workshop 7 01-23-2013 03:22 AM
Persisting html-to-epub workflow Chaihana Joe Calibre 2 01-28-2012 05:37 PM
Smooth workflow from HTML to Sigil epub useroo Sigil 1 07-04-2011 12:31 AM
Opinion on workflow (and enhancing it) - research-type workflow TheDarkTrumpet Which one should I buy? 8 03-02-2009 10:41 AM


All times are GMT -4. The time now is 10:57 PM.


MobileRead.com is a privately owned, operated and funded community.