View Single Post
Old 01-05-2011, 12:17 PM   #3
uhkp
Junior Member
uhkp began at the beginning.
 
Posts: 3
Karma: 14
Join Date: Jan 2011
Location: Germany
Device: PocketBook 302
Quote:
Originally Posted by elemenoP View Post
Have you tried converting PDFs with it? How well does it work?

eP
Downloaded it some time ago. Epub is not the only option you get. You can also convert to html and mobi.

I only tried html. It's better than most converters out in the wild.

It's a 6 step experience.

1. Import the pdf file.

2. Select the area which you want to have extracted from all pages.
This selection can only be made for the whole book, not per page.
But I love this feature, nothing worth than sidenumbers and similiar information through ahtml document.

3. Fix characters in extracted text.
I had no problems with the recognition, but that might depend on the font type. Every glyph that is found is shown with it's corresponding character and you can change any errors you find.
(BTW a, a and a are different glyphs and come with format information on extraction)

4. Rule sets
Haven't played with this one yet. Stuck to the standard.

5. Rules again, you can change the rule set you choose in 4, maybe change the language etc.

6. Convert

The outcome is an html file images and direct internal formatting. That's what bothers me most, I'm not a fan of /br and  

But the html file provides the same look as the pdf and I found no major problems with it. You don't have a picture behind each side and a complete html file, not multiple ones.
I figure I always can use a script to change the formatting of the html file to my preferences.


Checked the epub out of curiosity. Same procedure as for the html file, but you can add an front and back cover.

While looking through the result I found a few pagebreaks inside the chapters that left half the page empty, and a few formatting errors (center instead of right on some lines).


It's not the answer to all questions, but a pretty decent tool if you can't or don't want to read pdf's on your reader. And I prefered the result to calibre.

__________

You can keep all errors you find in this text - english is not my first language.
uhkp is offline   Reply With Quote