Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 02-23-2013, 12:39 PM   #16
DSpider
Evangelist
DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.
 
DSpider's Avatar
 
Posts: 450
Karma: 343115
Join Date: Nov 2009
Location: Romania
Device: PW2 2014
That's what happens with PDF files that contain text (FineReader converts them into images), so I'm a bit sceptical that it somehow extracts the JPGs from the PDF without further processing...
DSpider is offline   Reply With Quote
Old 02-23-2013, 02:44 PM   #17
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
It is a structurally different PDF, so I would not be surprised. The tools to extract the images/jpg's from the PDF are easy to get and open source. Why would ABBYY not incorporate those algorithms? It is not that hard.
Toxaris is offline   Reply With Quote
Old 01-23-2014, 04:41 PM   #18
parkher
Evangelist
parkher ought to be getting tired of karma fortunes by now.parkher ought to be getting tired of karma fortunes by now.parkher ought to be getting tired of karma fortunes by now.parkher ought to be getting tired of karma fortunes by now.parkher ought to be getting tired of karma fortunes by now.parkher ought to be getting tired of karma fortunes by now.parkher ought to be getting tired of karma fortunes by now.parkher ought to be getting tired of karma fortunes by now.parkher ought to be getting tired of karma fortunes by now.parkher ought to be getting tired of karma fortunes by now.parkher ought to be getting tired of karma fortunes by now.
 
Posts: 467
Karma: 369018
Join Date: Nov 2010
Device: BL Alita/Mimas/Ares, OB Note2/Note, KA One/H2O/HD, S PRS T2/T1, PB 902
So far I was not able to find anything better than PerfectEpub extension for OpenOffice.
Therefore I do this:

OCR in FineReader -> save as odt -> open in OpenOffice -> run PerfectEpub (after possibly other cleaning with regex find/replace, etc.) -> writer2ePub (or save as odt or as html and then use Calibre converter - whichever works better) -> SIGIL (where you again can do regex find/replace, merge/split if necessary, etc.)

However it is better to get rid of any page numbers / headers before PerfectEpub.
FineReader 11 is pretty good at recognizing headers/footers so they are not much of a problem.
PerfectEpub joins wrongly split lines (paragraphs) with one click and also splits wrongly joined lines, etc. I don't understand why FineReader can't do this itself, though. If it can, I need to find out how...

I use PerfectEpub on already made epubs and other formats too, if they have wrongly split lines or wrongly joined lines in them.
For an epub, I do this: epub -> htmlz -> extract files -> open in OpenOffice -> run PerfectEpub -> save back to html (or run writer2ePub)

The line joining / splitting in such cases when the information about the original pages is no longer available can be done with regex find/replace in sigil directly, but it requires multiple regex expressions to be used and different for pretty much each epub, so PerfectEpub is a much quicker solution.

Last edited by parkher; 01-23-2014 at 04:47 PM.
parkher is offline   Reply With Quote
Old 04-19-2014, 12:40 AM   #19
mav8rick
Junior Member
mav8rick began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Mar 2014
Device: Kobo Aura
For me, I have tried so many, many pdf to epub readers that I despaired of trying one more. However, after reading the stuff on the internet, I decided to look up ABBYY FineReader and lo and behold, they actually have a direct pdf to epub converter called ABBYY PDF Converter.
After trying the trial (converts 100 pages max), I decided to plonk down the money for the full version.

End result; I think I converted > 40 books and ONLY one didn't convert properly. Most converted with graphics intact and > 30% had their TOC links done properly!

I regretted spending all that time with all those other pdf to epub converters; you name it, I'd have tried it (Calibre, EPUB Converter, Doremisoft, 3DPageFlip, Vibosoft, PDFMate, iStonsoft, Go4ePub - this is an online site...) and for some reason, a lot of them look suspiciously alike so either they had the same underlying product and they just customized the look and feel or some of them pirate it from a main source and spinned it off on their own.

The reason I gave ABBYY a chance is because they're a reputable OCR software vendor too so I figured if they can do OCR well, they surely can do something about the pesky PDF internal structure/markups.

I absolutely have no other business interest in ABBYY other than wonder why they didn't market this product well - if you guys are still vexing over the conversion, you'd take a look.
mav8rick is offline   Reply With Quote
Old 04-22-2014, 03:05 PM   #20
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 11,462
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
Quote:
Originally Posted by mav8rick View Post
For me, I have tried so many, many pdf to epub readers that I despaired of trying one more. However, after reading the stuff on the internet, I decided to look up ABBYY FineReader and lo and behold, they actually have a direct pdf to epub converter called ABBYY PDF Converter.
After trying the trial (converts 100 pages max), I decided to plonk down the money for the full version.

End result; I think I converted > 40 books and ONLY one didn't convert properly. Most converted with graphics intact and > 30% had their TOC links done properly!

I regretted spending all that time with all those other pdf to epub converters; you name it, I'd have tried it (Calibre, EPUB Converter, Doremisoft, 3DPageFlip, Vibosoft, PDFMate, iStonsoft, Go4ePub - this is an online site...) and for some reason, a lot of them look suspiciously alike so either they had the same underlying product and they just customized the look and feel or some of them pirate it from a main source and spinned it off on their own.

The reason I gave ABBYY a chance is because they're a reputable OCR software vendor too so I figured if they can do OCR well, they surely can do something about the pesky PDF internal structure/markups.

I absolutely have no other business interest in ABBYY other than wonder why they didn't market this product well - if you guys are still vexing over the conversion, you'd take a look.
Well...pretty much everyone here that does this either in large quantities, or professionally, as I do, already uses Abbyy (Fine Reader). While it's viable for the initial conversion, the "cruft" left underneath the file, in the coding, isn't very attractive, and requires a fair amount of clean-up. I don't have any reason to think that Abbyy's "ABBYY PDF Converter" is anything different than the PDF-->Converter that's in AFR 11, and I'd be surprised if it were.

One of the things that Abbyy is very good about is making everything look pretty good on the surface, but once you delve into it--lo, here there be dragons. Have you spent a lot of time looking at the coding of the ePUBS that you've created, just out of curiosity?

Hitch
Hitch is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
A workflow for generating epub files from InDesign Man Eating Duck ePub 5 01-27-2013 07:47 AM
Workflow - XHTML to mobi to ePub lissie Workshop 7 01-23-2013 03:22 AM
Persisting html-to-epub workflow Chaihana Joe Calibre 2 01-28-2012 05:37 PM
Smooth workflow from HTML to Sigil epub useroo Sigil 1 07-04-2011 12:31 AM
Opinion on workflow (and enhancing it) - research-type workflow TheDarkTrumpet Which one should I buy? 8 03-02-2009 10:41 AM


All times are GMT -4. The time now is 11:26 PM.


MobileRead.com is a privately owned, operated and funded community.