Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 06-29-2010, 05:52 AM   #1
crackhammer
Enthusiast
crackhammer began at the beginning.
 
Posts: 38
Karma: 10
Join Date: Jun 2009
Device: Nook touch, iPad, Xoom
Ebook readers - should you OCR or not?

Hello folks,

At the moment, I am in the process of scanning almost all of my books before donating them. I plan to buy an ebook reader in a very recent future, haven't decided which one yet. I have Adobe Acrobat 9.0 (which is not a great OCR software, still)

I would like to know if OCRing scanned book is a good idea or not if I plan to read them on ebook readers.

Thanks in advance.
crackhammer is offline   Reply With Quote
Old 06-29-2010, 06:24 AM   #2
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 14,859
Karma: 5654321
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by crackhammer View Post
Hello folks,

At the moment, I am in the process of scanning almost all of my books before donating them. I plan to buy an ebook reader in a very recent future, haven't decided which one yet. I have Adobe Acrobat 9.0 (which is not a great OCR software, still)

I would like to know if OCRing scanned book is a good idea or not if I plan to read them on ebook readers.

Thanks in advance.
Your choice.

OCR takes a lot of touch-up to make a really great book.
Messed up punctuation marks (lost), run-together letters, like (in is OCR'd as m) need to be corrected.
Pictures and dingbats repaired or replaced.
Page headings and footers need to be removed
Footnotes



Only lightly touching on the Legality of making and having a copy of someone else's book (if you give away the master, it is NOW someone else's book).
theducks is offline   Reply With Quote
 
Enthusiast
Old 06-29-2010, 09:13 AM   #3
crackhammer
Enthusiast
crackhammer began at the beginning.
 
Posts: 38
Karma: 10
Join Date: Jun 2009
Device: Nook touch, iPad, Xoom
I asked this question just because I don't know if OCRed pdf appears differently on ebook reader than a scanned pdf.

Quote:
Originally Posted by theducks View Post
Only lightly touching on the Legality of making and having a copy of someone else's book (if you give away the master, it is NOW someone else's book).
The only reason I am scanning all these books is, I am moving from one country to the other (Europe to the US) and carrying these heavy books is a very expensive affair. I have paid for the books so I guess, I don't care about the legal issue (as I don't plan to share these books with anyone)

I wonder how many people think of legality when they photocopy...
crackhammer is offline   Reply With Quote
Old 06-29-2010, 09:28 AM   #4
omk3
Wizard
omk3 can name that ebook in five wordsomk3 can name that ebook in five wordsomk3 can name that ebook in five wordsomk3 can name that ebook in five wordsomk3 can name that ebook in five wordsomk3 can name that ebook in five wordsomk3 can name that ebook in five wordsomk3 can name that ebook in five wordsomk3 can name that ebook in five wordsomk3 can name that ebook in five wordsomk3 can name that ebook in five words
 
omk3's Avatar
 
Posts: 1,454
Karma: 37243
Join Date: Dec 2009
Location: Europe
Device: pocketbook 360, kindle 4
If you don't OCR, your book will actually be a series of images. This will make the filesize rather large, and may result in slower page turns. It may also be difficult to read, if your reader doesn't have a big enough screen. You won't be able to change the font size, and though many readers support zooming, it is not very practical to have to pan a zoomed page to read.

OCRing is a lot of work, as theducks pointed out. But if you feel like doing it, you will then have pure text. You can change format (pdf is not the optimal format for ebook readers), you can change font size, you can look up words in the dictionary and search if your reader supports it.

I find it very frustrating to read pdf's on my reader, either image or text, and actively avoid it. But ultimately, it's up to you.
omk3 is offline   Reply With Quote
Old 06-29-2010, 09:36 AM   #5
crackhammer
Enthusiast
crackhammer began at the beginning.
 
Posts: 38
Karma: 10
Join Date: Jun 2009
Device: Nook touch, iPad, Xoom
Quote:
Originally Posted by omk3 View Post
I find it very frustrating to read pdf's on my reader, either image or text, and actively avoid it. But ultimately, it's up to you.
Why? I thought many ebooks are in pdf format. So if you don't read image or text on your reader, what else do you read? I am sorry, I didn't quite understand your statement.
crackhammer is offline   Reply With Quote
Old 06-29-2010, 09:51 AM   #6
omk3
Wizard
omk3 can name that ebook in five wordsomk3 can name that ebook in five wordsomk3 can name that ebook in five wordsomk3 can name that ebook in five wordsomk3 can name that ebook in five wordsomk3 can name that ebook in five wordsomk3 can name that ebook in five wordsomk3 can name that ebook in five wordsomk3 can name that ebook in five wordsomk3 can name that ebook in five wordsomk3 can name that ebook in five words
 
omk3's Avatar
 
Posts: 1,454
Karma: 37243
Join Date: Dec 2009
Location: Europe
Device: pocketbook 360, kindle 4
Of course I read text, just not in pdf format. Pdf is good for fixed formatting. But on your reader you have to have more control on the way the text appears. Maybe the font is too small for you. Ebook readers give you the option to change font size, and it is a very important option to have, especially if the pdf is based on a page size larger than your screen. Pdf ebooks have very unexpected behaviour when you change the font size. Sometimes the formatting becomes downright weird and unpleasant when you do this. I think some readers that don't support reflow, don't let you change the font size of a pdf at all.
A good and popular ebook format is ePub. If you end up buying a kindle, you could convert your ebooks to mobi (prc). Once you have successfully OCRed, you can convert your book to anything you want. There are a lot more formats, but these two are the most frequently used, and they do their job better than pdf.
omk3 is offline   Reply With Quote
Old 06-29-2010, 10:22 AM   #7
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 14,859
Karma: 5654321
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by crackhammer View Post
Why? I thought many ebooks are in pdf format. So if you don't read image or text on your reader, what else do you read? I am sorry, I didn't quite understand your statement.
PDF is all about making the document appear the same for everyone.
TRhe problem is that most documents ARE NOT designed fora 600x800 e-reader screen, but something way larger, Standard copy paper.
All e-readers try tricks and "reflow" what they get to fit the screen.
Sometimes it is OK, other times there are definite artifacts. Some to the point of un-readability.
theducks is offline   Reply With Quote
Old 06-29-2010, 10:27 AM   #8
Lady Fitzgerald
Wizard
Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.
 
Lady Fitzgerald's Avatar
 
Posts: 2,013
Karma: 251649
Join Date: Apr 2010
Location: Tempe, AZ, USA, Earth
Device: JetBook Lite (away from home) + 1 spare, 32" TV (at home)
When you just scan a book and store it in a PDF, what you are storing are images of each page, similar to if you were to store them as graphic files, such as JPEGs. OCR will "read" those images and convert them to text but, since your computer is not as bright as you are, it will often "misread" the image and make errors. Then you have to go into the new text and correct those errors. Running a massive document like a book through an OCR program is very time consuming. Editing the results is even more time consuming.

Most e-book readers read text better than images. Images are fixed objects that cannot be reflowed and, to make them fit on the screen of an e-ink reader, the readers will usually display them in a reduced size. Text on the other hand, is like each character in the text is an individual image that can be enlarged or shrunk and displayed sequentially across the screen. Most PDF e-books for sale have the content as text instead of whole page images so they can have the characters enlarged or reduced as needed. PDF text, based on what I've read on the mobileread forums, does not always work well with e-book readers so another format, such as epub would be preferable for storing your text.

I'm doing almost exactly what you are doing except (I'm assuming) instead of leaving the books intact and scanning them, I'm cutting off the spines then running the pages through an ADF (automatic document feed) scanner, saving the scanned pages as a PDF, and destroying the original book (as if cutting off the spine hadn't already done so). Because of the sheer volume of books I have to do (over 1100 estimated), I simply do not have time to bother with OCR and the needed editing afterward so have opted not to. The only problem I have had with that is finding a suitably sized e-ink reader that can zoom the page images to a readable size that is also affordable. I haven't had much luck there but the technology keeps improving and the prices keep coming down so it's just a matter of time before I will be able to get the reader I need.

There is a possible legal issue with copying your books and giving them away. Depending on where you live, it is probably illegal. Most (but not all) legal jurisdictions' copyright laws will allow you to legally make a copy of a book for your own use for archival purposes. They will also allow you to change media (again, most but not all; I think the U.K. doesn't permit it). However, when you copy a book and then give away the original, you have essentially stolen the contents of the book because you have deprived the creator of the content (or the current copyright holder) of the potential income they could have derived had the recipient of the book payed for it. Giving away a book without retaining a copy of it is not the same as what you are proposing since giving away the book is merely transfering ownership whereas copying the book and giving it away has created two books with two owners but the creator of the content of the book has been compensated for only one. Even though the content of the copy is essentially an arrangements of 1s and 0s, it is still property (albeit now physically intangible). Making matters worse, copyright laws between nations vary somewhat (or, occasionally, dramatically) and have both not kept up with advancing technology and have been corrupted to go beyond the original intent, which was to protect the authors' interests, to protecting big corporations' essentially eternal stranglehold on new literature.

It's up to you to decide if you want to follow the letter of the law where you live, take the moral route and just not distribute copies, thus not essentially stealing, or do what you jolly well please.
Lady Fitzgerald is offline   Reply With Quote
Old 06-29-2010, 10:30 AM   #9
Worldwalker
Curmudgeon
Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.
 
Posts: 3,087
Karma: 722357
Join Date: Feb 2010
Device: PRS-505
Quote:
Originally Posted by crackhammer View Post
Why? I thought many ebooks are in pdf format. So if you don't read image or text on your reader, what else do you read? I am sorry, I didn't quite understand your statement.
We read books in formats other than pdf.

As several people have said, the main purpose of PDF is to reproduce a letter-sized document exactly like the original. If you're using it for that intended purpose, this is a very good thing. If you're trying to read that document on a minuscule e-reader screen, that can be a very bad thing. Other formats (epub, prc, etc.) work much better on ebook readers because they reflow the document to fit the screen space.

As to who cares about the legality when photocopying ... me. Contrary to what the media, slavishly printing scare stories fed to them by the advocates of greater technological restrictions and more and harsher laws, would have you believe, not everybody goes around seeking copyrights to violate and cackling joyfully as they do so.
Worldwalker is offline   Reply With Quote
Old 06-29-2010, 05:23 PM   #10
garyyoung
Member
garyyoung has a complete set of Star Wars action figures.garyyoung has a complete set of Star Wars action figures.garyyoung has a complete set of Star Wars action figures.garyyoung has a complete set of Star Wars action figures.
 
Posts: 14
Karma: 380
Join Date: May 2010
Device: Kindle Paperwhite
Quote:
Originally Posted by crackhammer View Post
Hello folks,

At the moment, I am in the process of scanning almost all of my books before donating them. I plan to buy an ebook reader in a very recent future, haven't decided which one yet. I have Adobe Acrobat 9.0 (which is not a great OCR software, still)

I would like to know if OCRing scanned book is a good idea or not if I plan to read them on ebook readers.

Thanks in advance.
This isn't an either/or situation. You won't need the original books to do OCR in the future as long as the pdf's are reasonably legible. You could just create the pdf's, try them on an ereader, and then perform OCR on the pdf's if the results aren't satisfactory.
garyyoung is offline   Reply With Quote
Old 09-05-2010, 06:08 PM   #11
Acousticvillage
Member
Acousticvillage once ate a cherry pie in a record 7 seconds.Acousticvillage once ate a cherry pie in a record 7 seconds.Acousticvillage once ate a cherry pie in a record 7 seconds.Acousticvillage once ate a cherry pie in a record 7 seconds.Acousticvillage once ate a cherry pie in a record 7 seconds.Acousticvillage once ate a cherry pie in a record 7 seconds.Acousticvillage once ate a cherry pie in a record 7 seconds.Acousticvillage once ate a cherry pie in a record 7 seconds.Acousticvillage once ate a cherry pie in a record 7 seconds.Acousticvillage once ate a cherry pie in a record 7 seconds.Acousticvillage once ate a cherry pie in a record 7 seconds.
 
Posts: 15
Karma: 1660
Join Date: Sep 2010
Device: none
I have found the above comments really interesting, and it has prompted me to ask...as a potential buyer of an ereader....are ebooks which are sold and DRM'ed really as properly set out as printed books? The ignorant in me feels that at times an ebook is often the produce of some person scanning a book which comes out at 2 or 3 MB and then gets sold for the same as the paperback version.

Are e-books as "nice" as paperbacks or do they feel just like blocks of text on a screen? I am asking this not to be a troll, but because I have looked at adverts for various e-books - most recently the iRiver 2nd Story. One of the images has the ereader next to a paperback. The text in the paperback looks like a nice font, paragraphs are indented. The e-book shows the sort of pagination and font you would get in an office report. So while the words are there, does it "read" like a book? I'm not sure I think £4 - £15 is a fair price for a 3MB download of a file that looks as if it has simply been copied from a business draft of the original.

I realise you might be able to change fonts, pagination, etc to suit eyesight, and would be grateful for reassurance that people feel they are getting a product specifically designed for them and not just a rudimentary format/version of a better original. I know I know...I am a ditherer and a philistine when it comes to some new technology. I am 52 and I do love my MP3 collection!!! Thanks in advance for any thoughts.
Acousticvillage is offline   Reply With Quote
Old 09-05-2010, 06:34 PM   #12
Lady Fitzgerald
Wizard
Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.Lady Fitzgerald ought to be getting tired of karma fortunes by now.
 
Lady Fitzgerald's Avatar
 
Posts: 2,013
Karma: 251649
Join Date: Apr 2010
Location: Tempe, AZ, USA, Earth
Device: JetBook Lite (away from home) + 1 spare, 32" TV (at home)
Reading e-books is a bit of a difference from reading p-books. How well books get formatted in e-books depends on how thorough a job was done making the e-book and the kind of reader you are using. Not all e-books will work on all readers. My suggestion is to try to view some books on various demo readers at various stores. You can also arrange here at MobileRead to meet up with owners of readers to see what their readers are like (it's the only way to view a Kindle since Amazon doesn't have brick and mortar stores where they can be seen). I also suggest reading through the General Discussion forum to see what formats are available and the problems with the various formats, including DRM.

E-books are rarely presented in the way they would in a paper book but it shouldn't be a problem to most readers. An advange to most e-books is they can have the fonts enlarged for easier reading, something really handy in bad light or if the old eyes are going (like mine, you young whippersnapper; I'm 61).
Lady Fitzgerald is offline   Reply With Quote
Old 09-05-2010, 08:48 PM   #13
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 14,859
Karma: 5654321
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by Acousticvillage View Post
I have found the above comments really interesting, and it has prompted me to ask...as a potential buyer of an ereader....are ebooks which are sold and DRM'ed really as properly set out as printed books? The ignorant in me feels that at times an ebook is often the produce of some person scanning a book which comes out at 2 or 3 MB and then gets sold for the same as the paperback version.

Are e-books as "nice" as paperbacks or do they feel just like blocks of text on a screen? I am asking this not to be a troll, but because I have looked at adverts for various e-books - most recently the iRiver 2nd Story. One of the images has the ereader next to a paperback. The text in the paperback looks like a nice font, paragraphs are indented. The e-book shows the sort of pagination and font you would get in an office report. So while the words are there, does it "read" like a book? I'm not sure I think £4 - £15 is a fair price for a 3MB download of a file that looks as if it has simply been copied from a business draft of the original.

I realise you might be able to change fonts, pagination, etc to suit eyesight, and would be grateful for reassurance that people feel they are getting a product specifically designed for them and not just a rudimentary format/version of a better original. I know I know...I am a ditherer and a philistine when it comes to some new technology. I am 52 and I do love my MP3 collection!!! Thanks in advance for any thoughts.
My 2 cents

GIGO...There are some limits to the individual reader (software) that prevents what you might see as correctly formatted on the PC screen from looking the same (down sized, not scaled ). on your device.... Test what you create.

Toss re-flow into the mix and what you get may not even be close.

Hike over to the EPUB section of MR and you can see what some of the Wizards can do. (Hint: That did not happen with click and pray software. TLC, Lots of it, and it shows )

I have massage a number of files so they render on my E-INK screen very close to what was on the page of the paperback on my shelf. The main difference, is that I trim margins to use the whole screen (the device bezel serves as the margins).

In very few cases, have I resorted to a page scan image (jpg) because I just could not render a complex page (usually contained uniquly displayed text).
theducks is offline   Reply With Quote
Old 09-06-2010, 02:32 AM   #14
itimpi
Wizard
itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.
 
Posts: 4,071
Karma: 777825
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
A point that might be worth making is that most of the modern e-book formats such as epub are based on HTML at their heart. If you think of what happens if you bring up an image in a browser and try to resize it you will get a feel for the problem with images of text. A page that is text based on the other hand will re-adjust the text as you change the window size. This is basically the way that ebook readers behave.
itimpi is offline   Reply With Quote
Reply

Tags
ocr, scanned

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
What ebook readers don't do Elfwreck General Discussions 18 09-08-2010 12:47 PM
Hi. I am New and I need help with ebook readers! lauranchad03 Introduce Yourself 11 03-15-2010 11:54 AM
New to eBook Readers solotc Sony Reader 4 02-01-2010 09:13 PM
Hi eBook Readers AdulteBookShop Introduce Yourself 1 05-02-2009 10:58 AM
New to ebook readers fopath Introduce Yourself 2 12-01-2008 11:49 AM


All times are GMT -4. The time now is 11:34 AM.


MobileRead.com is a privately owned, operated and funded community.