Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Readers > More E-Book Readers > Bookeen

Notices

Reply
 
Thread Tools Search this Thread
Old 05-30-2009, 05:13 AM   #1
Kino
Enthusiast
Kino began at the beginning.
 
Posts: 41
Karma: 10
Join Date: May 2009
Device: CyBook
Backing up\converting hard copy to eBook

Hi,

I have many hard- and paperback books in my collection and quite a few are showing signs of tanning and wear & tear.

I would like to convert these to an electronic format and be able to read them on the CyBook.

I have just purchased a book scanner (OpticBook 3600) but my initial results are rather disappointing...

The scanner output (jpg or tif) is not much better than any other scanner but it does have the advantage of auto-rotating the scanned images for odd & even pages.

A one-page scan as a jpg image results in a file 280 - 350 kB in size (this will obviously vary depending on the size of the book).

The OCR software I have tried is too labour-intensive, it has to be proof-read line by line.

Has anybody successfully created an ebook using a scanner such as the above? If so, would they care to share their knowledge?

If I have to create a jpg image of each page is there a way to combine them into a book (other than pasting into a document)?

Thanks for your help.
Kino is offline   Reply With Quote
Old 05-30-2009, 05:18 AM   #2
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 62,464
Karma: 39917965
Join Date: Nov 2006
Location: UK
Device: PW2, iPad Retina Mini, iPhone 4, MS Surface Pro, Onyx T68, N7,
I'm afraid that any book created by OCR has to be proof-read line by line - there are no "shortcuts". It's hard, labour-intensive work. It generally takes me around 100-150 hours work to properly proof-read one of the books that I upload to MR. However, that's to make a "perfect" book. "Raw" OCR will generally give you good enough results for casual reading.

Last edited by HarryT; 05-30-2009 at 05:38 AM.
HarryT is online now   Reply With Quote
 
Enthusiast
Old 05-30-2009, 09:19 AM   #3
Kino
Enthusiast
Kino began at the beginning.
 
Posts: 41
Karma: 10
Join Date: May 2009
Device: CyBook
Thanks for the reply.

I have tried OCR and I'm prepared to do some that way but the problem is that the text, when saved and re-loaded into a word-processor or text editor, loses all the paragraph structure. I'd be better off typing the whole thing in manually!

I can achieve a reasonably good result by creating jpg images but they then lose readability by pasting into a document. I've tried playing around with images sizes and paper layout, removing margins, etc., but I'm struggling to find the optimum conditions.

Given the CyBook is approx. the size of a paperback I'm sure there is a solution.

p.s. I've found that the CyBook will not read pdf files created with anything other than Adobe. This means that software like openoffice or other freeware pdf creators are, annoyingly, not catered for.
Kino is offline   Reply With Quote
Old 05-30-2009, 09:48 AM   #4
junior
Connoisseur
junior began at the beginning.
 
junior's Avatar
 
Posts: 85
Karma: 10
Join Date: Apr 2008
Location: Fortaleza, Brasil
Device: Palm TX, CyBook Gen3
Hello Kino.

I use the software ABBYY Fine Reader (OCR), he does an excellent job. Have the option to keep the original formatting of the book (source, paragraphs, bold ...).

Recognizes text, images and tables.

In 3 hours I have a book of 250 pages for my Cybook using my G2710 scanner HP with ABBYY Fine Reader.

The license of ABBYY Fine Reader is USD 199, it is worth, it recognizes 184 languages.

http://finereader.abbyy.com/
junior is offline   Reply With Quote
Old 05-30-2009, 10:07 AM   #5
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 62,464
Karma: 39917965
Join Date: Nov 2006
Location: UK
Device: PW2, iPad Retina Mini, iPhone 4, MS Surface Pro, Onyx T68, N7,
Quote:
Originally Posted by junior View Post
Hello Kino.

I use the software ABBYY Fine Reader (OCR), he does an excellent job. Have the option to keep the original formatting of the book (source, paragraphs, bold ...).

Recognizes text, images and tables.

In 3 hours I have a book of 250 pages for my Cybook using my G2710 scanner HP with ABBYY Fine Reader.

The license of ABBYY Fine Reader is USD 199, it is worth, it recognizes 184 languages.

http://finereader.abbyy.com/
Even the "sprint" version of Fine Reader, which I got free with my scanner, does a remarkably good job. It perfectly preserves the format of the original - paragraphs, italics, etc. The only thing it falls down on is that it doesn't "do" accented letters - they come out without accents.
HarryT is online now   Reply With Quote
Old 05-30-2009, 10:30 AM   #6
junior
Connoisseur
junior began at the beginning.
 
junior's Avatar
 
Posts: 85
Karma: 10
Join Date: Apr 2008
Location: Fortaleza, Brasil
Device: Palm TX, CyBook Gen3
Quote:
Originally Posted by HarryT View Post
Even the "sprint" version of Fine Reader, which I got free with my scanner, does a remarkably good job. It perfectly preserves the format of the original - paragraphs, italics, etc. The only thing it falls down on is that it doesn't "do" accented letters - they come out without accents.

In the "PRO" version he acknowledges beautifully accented letters, just choose the language.

As in Portuguese: à, á, ê, ç, ã, ü.
junior is offline   Reply With Quote
Old 05-30-2009, 10:52 AM   #7
slayda
Retired & reading more!
slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.
 
slayda's Avatar
 
Posts: 2,738
Karma: 884247
Join Date: Sep 2006
Location: North Alabama, USA
Device: Kindle 1, iPad 4, iPhone 5
The problem on those accented letters is when languages are mixed, e.g. an English book with some French included. I have just discovered that the Finereader Pro 9.0 has the capability under the Tools menu (Language Editor) to "Automatically select languages from the following list" and you can supply the list. This works very well for such included "other" languages.

Kino, Finereader also allows PDF as input. I generally save the output as RTF and do my editing there. HarryT works hard at a perfectly proofed output so he puts more time and effort into his work. I just do a best "readability" effort. There may be a few errors still in it but it is readable. Once I've finished "Spell Checking" in MS Word, I proof the RTF file on my reader, bookmarking where I still find OCR errors and then go back into the RTF to correct them. I'll spend only 4 - 5 labor hours until the "on reader" proof reading. Bottom line is less time and a less perfect (but satisfactory to me) final result.

Finereader Pro also has the ability to split and reorient pages if you're scanning both of the open pages at once. Cropping out the page number and title/author is also relativelly easy but manual.
slayda is offline   Reply With Quote
Old 05-30-2009, 11:02 AM   #8
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 62,464
Karma: 39917965
Join Date: Nov 2006
Location: UK
Device: PW2, iPad Retina Mini, iPhone 4, MS Surface Pro, Onyx T68, N7,
Quote:
Originally Posted by slayda View Post
The problem on those accented letters is when languages are mixed, e.g. an English book with some French included. I have just discovered that the Finereader Pro 9.0 has the capability under the Tools menu (Language Editor) to "Automatically select languages from the following list" and you can supply the list. This works very well for such included "other" languages.
Thanks - that's good to know. Looks like I need to put Finereader on my "shopping list!".

Quote:
Kino, Finereader also allows PDF as input. I generally save the output as RTF and do my editing there. HarryT works hard at a perfectly proofed output so he puts more time and effort into his work. I just do a best "readability" effort. There may be a few errors still in it but it is readable. Once I've finished "Spell Checking" in MS Word, I proof the RTF file on my reader, bookmarking where I still find OCR errors and then go back into the RTF to correct them. I'll spend only 4 - 5 labor hours until the "on reader" proof reading. Bottom line is less time and a less perfect (but satisfactory to me) final result.
As I said in an earlier post, Fine Reader is so good that even its "raw" output would be perfectly acceptable for "reading versions" of most novels, I think. It's only for my "serious" authors that I'm going to the trouble of doing "real" proof-reading. As you say, a spell-check will find the vast majority of OCR errors.

Quote:
Finereader Pro also has the ability to split and reorient pages if you're scanning both of the open pages at once. Cropping out the page number and title/author is also relativelly easy but manual.
That's good to know. My "freebie" version creates Word docs with two columns if I do that. It's no big deal to copy/paste each page back into its proper sequence afterwards, but the ability to do it automatically would be good.
HarryT is online now   Reply With Quote
Old 06-12-2009, 12:44 PM   #9
Santa Fe Painter
Junior Member
Santa Fe Painter began at the beginning.
 
Santa Fe Painter's Avatar
 
Posts: 8
Karma: 10
Join Date: May 2009
Device: Sony PRS-505, Kindle2
I purchased FineReader 9.0 Pro. It recognizes headers and footers, which can be omitted in the saved document (e.g. .rtf .doc) I am still in the learning and refining stages of converting 250 page books to Sony ebooks. I found that saving the FineReader output to .rtf/formatted text creates a much cleaner input for Book Designer. It retains the formatting, but does not create unwanted paragraphs and hard returns.

I continue to be amazed with the accuracy of FineReader. I am using the Pattern learning options for some misread characters. An example is the italized "I" being misinterpreted as a slash, "/" With the Pattern learning, I now get the correct "I"

FineReader can also automatically split dual pages at the time of the initial scan. You do not have to manually split them afterward.

A big reminder I found on Amazon (a customer review) is that you can buy the FineReader Pro 9.0 Upgrade if you have any other OCR licensed software. It saves quite a bit of money.

Last edited by Santa Fe Painter; 06-12-2009 at 12:48 PM.
Santa Fe Painter is offline   Reply With Quote
Reply

Tags
scanner

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Having a very hard time converting a comic and keeping it in order! branchedout Reading and Management 2 08-18-2010 11:13 AM
Hard copy vs EBOOK rabbie General Discussions 65 07-21-2010 01:07 PM
Backing up Ebook collections Amalthia Sony Reader 5 01-14-2008 01:55 PM
soft copy vs. hard copy no more. smokey News 4 12-02-2007 02:57 PM
Is it possible to print a hard copy from the Iliad? Boris iRex 15 09-24-2007 12:55 PM


All times are GMT -4. The time now is 05:24 AM.


MobileRead.com is a privately owned, operated and funded community.