View Full Version : Converting from PDF to ePub using Abbyy Fine Reader


Mr Davo
06-18-2013, 11:55 PM
Hi Everyone,

I am new to the forums and am not sure if this is where I should post a question about converting a PDF to an ePub (so please bear with me).

At the moment I am using Abbyy Fine Reader (v11) to convert "The Tibetan Book of Living & Dying" from PDF to ePub. The process is relatively straight forward, however I am stuck on a few points -

1). Page 2 shows a little yellow exclamation point in the bottom right hand corner of the page thumbnail.

http://www.systemcontrol.com.au/images/captures/Page%202%20Thumbnail.JPG

and it then goes on to state "Page Not Recognized"

http://www.systemcontrol.com.au/images/captures/Page%202%20not%20recognized.JPG

so I am wondering how to overcome this issue.

2). Whilst the PDF includes the cover of the book when I export to ePub the cover is no longer present. Once again the "cover page" states "Page Not Recognized", however this time there is no yellow exclamation point next to the page thumbnail.

3). Even when I go into properties and set the Author, as pictured below,

http://www.systemcontrol.com.au/images/captures/Document%20Options.JPG

when I export to ePub format the Author information is not retained.

If anybody can offer any help with any of these issues it will be greatly appreciated.

Kind Regards,

Davo

Toxaris
06-19-2013, 02:31 AM
The resulting ePUB is not a good start. The document will contain a lot of errors and mistakes due to the OCR process.
You will be better of choosing another export format and clean the source before creating the ePUB or clean the ePUB itself.

eskimo49
06-19-2013, 07:09 AM
You could try using Calibre, which has a number of different import formats (including PDF) and also outputs to most common e-publishing formats including EPUB and Mobi.

mrmikel
06-19-2013, 07:29 AM
This book is available in Kindle format from Amazon. That might be a better starting place. It is also most definitely in copyright, being published in 1994, so distributing the results of your work would not be legal in any country.

The source book for this book, the Tibetan Book of the Dead, though published in 1927, is not out of copyright even in Canada, since the author died in 1965.

Toxaris
06-19-2013, 09:16 AM
You could try using Calibre, which has a number of different import formats (including PDF) and also outputs to most common e-publishing formats including EPUB and Mobi.

That is making it even worse...

DSpider
06-19-2013, 09:27 AM
ABBYY FineReader isn't very good at exporting ePub directly... But I guess it works fine, considering that the feature was just added in version 11. Try updating it to the latest build from the official website. Otherwise, you're gonna need Sigil to add a cover and metadata info.

The part about retaining the "Author information" in the ePub sounds like a bug in FineReader. It's most likely added to the file if you export it as a DOC, DOCX, or PDF, but not ePub. This was probably fixed in later builds, so make sure that you have the latest one.

patrik
06-19-2013, 09:42 AM
I have just started to play around createing epubs, so I am sure there are much better ways to do things than what I currently do.

What I do is this:

- Export cover from pdf to a single file
- Open and ocr pdf in Finereader 11
- "Verification" through the whole book to fix ocr errors
- Adjust area for figures/graphs/illustrations
- Save as html
- Open html in Sigil
- (Here you can spend as much time as you like formatting, fixing typos, etc.)
- Create Chapters (separate files), creating toc
- Import cover from the cover-file
- Add metadata (author, title, published date, etc.)
- Save as epub
- Import in Calibre
- Send to device
- Read the book, either fix errors directly in Sigil or highlight in the book and fix later
- Be happy :-)


I would be interested to hear from all of you using Word/OO between Finereader and Sigil, what do you do that is not easily done in Sigil?

Notjohn
06-19-2013, 11:23 AM
How do you "Save as html"? Using the Save As / Web Page (whatever) in Word?

Toxaris
06-19-2013, 11:59 AM
I use Word for the following steps:
- "Verification" through the whole book to fix ocr errors
- (Here you can spend as much time as you like formatting, fixing typos, etc.)
- Import cover from the cover-file
- Add metadata (author, title, published date, etc.)
- Save as epub

After that I open it in Sigil and make the final touches like some formatting and TOC.

patrik
06-19-2013, 02:43 PM
How do you "Save as html"? Using the Save As / Web Page (whatever) in Word?
If the question was to me, I choose "File"->"Save document as"->"html" from Finereader.

DSpider
06-20-2013, 05:59 AM
I would be interested to hear from all of you using Word/OO between Finereader and Sigil, what do you do that is not easily done in Sigil?

You know how FineReader creates styles for bolds and italics? Yeah, I hate those, so I run a custom Word 2010 macro that will turn the document into plain text (yes, that's right) with formatting intact and then have it come back squeaky clean. Then I go through the whole thing and recreate the layout.

I do not recommend converting. PDF isn't the most friendly format out there, and if it wasn't saved as a tagged PDF (i.e. if you select some random text, the selection should NOT look like there are several letters and groups of letters separated; then it's not a tagged PDF), like over 90% of PDFs out there are, then it's really not worth trying to convert using Mobipocket or whatever. OCR it. Because the software will have to approximate the location of paragraphs (since each of those groups have individual coordinates, like on a blank piece of paper) and it may result in paragraphs within paragraphs, or a paragraph placed before a wrong paragraph, and so on. No, thanks!

The title of this thread is wrong. You do not convert with ABBYY FineReader. You OCR with it, and then manually tweak the stuffing out of it with some other software. Think of FineReader as an extraction tool. You extract text from images, and that's it. There are no layout options in FineReader.

Notjohn
06-20-2013, 06:30 AM
If the question was to me, I choose "File"->"Save document as"->"html" from Finereader.

Ah! Thank you. (And yes, the question was directed to you.) Does Finereader do a good job of the html?

patrik
06-20-2013, 01:44 PM
Ah! Thank you. (And yes, the question was directed to you.) Does Finereader do a good job of the html?
I am not sure I have enough experience telling about the html-quality. It looks mostly fine to me. There are, of course, various errors and as you can see some even prefer to remove all formatting and do it all over themselves.
But for doing ocr, IMHO Finereader is really, really good.

I tried to export directly to epub from Finereader, but that was a bit of a mess. Much better, IMHO again, to export as html and let Sigil + some editing do the conversion to epub.

But I am a fairly new at this epub-creating thing, I may do something completely different in a couple of months. :D
I am slowly reading through the forums here and learning all the time.

Toxaris
06-20-2013, 03:43 PM
The HTML export of ABBYY is usually full of internal styling, making it cluttered.