Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 02-23-2013, 12:39 PM   #16
DSpider
Evangelist
DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.
 
DSpider's Avatar
 
Posts: 407
Karma: 326969
Join Date: Nov 2009
Location: Romania
Device: iPod touch 2G (16 GB)
That's what happens with PDF files that contain text (FineReader converts them into images), so I'm a bit sceptical that it somehow extracts the JPGs from the PDF without further processing...
DSpider is online now   Reply With Quote
Old 02-23-2013, 02:44 PM   #17
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 2,906
Karma: 2909045
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-300, PRS-T1
It is a structurally different PDF, so I would not be surprised. The tools to extract the images/jpg's from the PDF are easy to get and open source. Why would ABBYY not incorporate those algorithms? It is not that hard.
Toxaris is offline   Reply With Quote
Old 01-23-2014, 04:41 PM   #18
parkher
Addict
parkher ought to be getting tired of karma fortunes by now.parkher ought to be getting tired of karma fortunes by now.parkher ought to be getting tired of karma fortunes by now.parkher ought to be getting tired of karma fortunes by now.parkher ought to be getting tired of karma fortunes by now.parkher ought to be getting tired of karma fortunes by now.parkher ought to be getting tired of karma fortunes by now.parkher ought to be getting tired of karma fortunes by now.parkher ought to be getting tired of karma fortunes by now.parkher ought to be getting tired of karma fortunes by now.parkher ought to be getting tired of karma fortunes by now.
 
Posts: 300
Karma: 361752
Join Date: Nov 2010
Device: Kobo Aura HD, Sony PRS (T1,T2), PocketBook 902
So far I was not able to find anything better than PerfectEpub extension for OpenOffice.
Therefore I do this:

OCR in FineReader -> save as odt -> open in OpenOffice -> run PerfectEpub (after possibly other cleaning with regex find/replace, etc.) -> writer2ePub (or save as odt or as html and then use Calibre converter - whichever works better) -> SIGIL (where you again can do regex find/replace, merge/split if necessary, etc.)

However it is better to get rid of any page numbers / headers before PerfectEpub.
FineReader 11 is pretty good at recognizing headers/footers so they are not much of a problem.
PerfectEpub joins wrongly split lines (paragraphs) with one click and also splits wrongly joined lines, etc. I don't understand why FineReader can't do this itself, though. If it can, I need to find out how...

I use PerfectEpub on already made epubs and other formats too, if they have wrongly split lines or wrongly joined lines in them.
For an epub, I do this: epub -> htmlz -> extract files -> open in OpenOffice -> run PerfectEpub -> save back to html (or run writer2ePub)

The line joining / splitting in such cases when the information about the original pages is no longer available can be done with regex find/replace in sigil directly, but it requires multiple regex expressions to be used and different for pretty much each epub, so PerfectEpub is a much quicker solution.

Last edited by parkher; 01-23-2014 at 04:47 PM.
parkher is offline   Reply With Quote
Old 04-19-2014, 12:40 AM   #19
mav8rick
Junior Member
mav8rick began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Mar 2014
Device: Kobo Aura
For me, I have tried so many, many pdf to epub readers that I despaired of trying one more. However, after reading the stuff on the internet, I decided to look up ABBYY FineReader and lo and behold, they actually have a direct pdf to epub converter called ABBYY PDF Converter.
After trying the trial (converts 100 pages max), I decided to plonk down the money for the full version.

End result; I think I converted > 40 books and ONLY one didn't convert properly. Most converted with graphics intact and > 30% had their TOC links done properly!

I regretted spending all that time with all those other pdf to epub converters; you name it, I'd have tried it (Calibre, EPUB Converter, Doremisoft, 3DPageFlip, Vibosoft, PDFMate, iStonsoft, Go4ePub - this is an online site...) and for some reason, a lot of them look suspiciously alike so either they had the same underlying product and they just customized the look and feel or some of them pirate it from a main source and spinned it off on their own.

The reason I gave ABBYY a chance is because they're a reputable OCR software vendor too so I figured if they can do OCR well, they surely can do something about the pesky PDF internal structure/markups.

I absolutely have no other business interest in ABBYY other than wonder why they didn't market this product well - if you guys are still vexing over the conversion, you'd take a look.
mav8rick is offline   Reply With Quote
Old 04-22-2014, 03:05 PM   #20
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 2,317
Karma: 12005829
Join Date: Apr 2010
Location: Phoenix, AZ
Device: Kindle2, iPad, KindleFire and NookColor
Quote:
Originally Posted by mav8rick View Post
For me, I have tried so many, many pdf to epub readers that I despaired of trying one more. However, after reading the stuff on the internet, I decided to look up ABBYY FineReader and lo and behold, they actually have a direct pdf to epub converter called ABBYY PDF Converter.
After trying the trial (converts 100 pages max), I decided to plonk down the money for the full version.

End result; I think I converted > 40 books and ONLY one didn't convert properly. Most converted with graphics intact and > 30% had their TOC links done properly!

I regretted spending all that time with all those other pdf to epub converters; you name it, I'd have tried it (Calibre, EPUB Converter, Doremisoft, 3DPageFlip, Vibosoft, PDFMate, iStonsoft, Go4ePub - this is an online site...) and for some reason, a lot of them look suspiciously alike so either they had the same underlying product and they just customized the look and feel or some of them pirate it from a main source and spinned it off on their own.

The reason I gave ABBYY a chance is because they're a reputable OCR software vendor too so I figured if they can do OCR well, they surely can do something about the pesky PDF internal structure/markups.

I absolutely have no other business interest in ABBYY other than wonder why they didn't market this product well - if you guys are still vexing over the conversion, you'd take a look.
Well...pretty much everyone here that does this either in large quantities, or professionally, as I do, already uses Abbyy (Fine Reader). While it's viable for the initial conversion, the "cruft" left underneath the file, in the coding, isn't very attractive, and requires a fair amount of clean-up. I don't have any reason to think that Abbyy's "ABBYY PDF Converter" is anything different than the PDF-->Converter that's in AFR 11, and I'd be surprised if it were.

One of the things that Abbyy is very good about is making everything look pretty good on the surface, but once you delve into it--lo, here there be dragons. Have you spent a lot of time looking at the coding of the ePUBS that you've created, just out of curiosity?

Hitch
Hitch is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
A workflow for generating epub files from InDesign Man Eating Duck ePub 5 01-27-2013 07:47 AM
Workflow - XHTML to mobi to ePub lissie Workshop 7 01-23-2013 03:22 AM
Persisting html-to-epub workflow Chaihana Joe Calibre 2 01-28-2012 05:37 PM
Smooth workflow from HTML to Sigil epub useroo Sigil 1 07-04-2011 12:31 AM
Opinion on workflow (and enhancing it) - research-type workflow TheDarkTrumpet Which one should I buy? 8 03-02-2009 10:41 AM


All times are GMT -4. The time now is 04:06 AM.


MobileRead.com is a privately owned, operated and funded community.