Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Readers > Sony Reader

Notices

Reply
 
Thread Tools Search this Thread
Old 02-20-2007, 03:40 PM   #1
slayda
Retired & reading more!
slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.
 
slayda's Avatar
 
Posts: 2,743
Karma: 884247
Join Date: Sep 2006
Location: North Alabama, USA
Device: Kindle 1, iPad 4, iPhone 5
Created first ebook

Well, I bought the Snapscan S500 based on comments in this Forum & this weekend (since my Reader was forgotten last Friday & spent the weekend at the office) I created my first ebook.

I made several wrong moves beginning with using the default settings - Normal (Automatic color detection & lowest resolution - 150 DPI for color & 300 DPI for B&W). Since the book was old & yellowed, color was detected so the images were 150 DPI with a lot of OCR errors. After a day & a half of proofing/correcting, I managed to create an RTF which went into BookDesigner to create an LRF. Then I decided to read the Snapscan manual - so next time should be smoother.

Step 1 - It was fairly easy to de-page the paper back book. I simply pulled the cover off (it was already partly off). Then I removed the pages, one at a time by gently pulling them off. Every 6 - 10 pages I needed to trim the excess glue off so the pages were easier to remove. Total time - about 20 minutes.

Step 2 - (Should have been to read the Snapscan manual.) Scan the pages. This was the easiest part of the procedure. Total time - about 20 minutes.

Step 3 - Using the included ABBYY to OCR the book & create a "searchable PDF". Total time - about 10 minutes.

Step 4 - Discovered that the included ABBYY OCR wasn't as good as my regular ABBYY OCR version 6.0 so re-OCRed the book with it. Total time - about 10 minutes.

Step 5 - Checking spelling, making corrections, removing page numbers etc. Total time - about 10 hours. (Remember I didn't have my reader so I needed some entertainment.)

Step 5a - Used the Adobe Acrobat (included with Snapscan) to create a RTF file. Some of step 5 was prior to 5a and some post 5a. Total time - about 1 minute.

Step 6 - Used Book designer to create an LRF. Total time - about 3 minutes.


Post-op steps

(Step 7) - Ordered the ABBYY version 8.0 upgrade which is more accurate & should reduce the step 5 time.

(Step 8) - Read the Snapscan manual & found out I should have selected B&W, either Better mode - 400 DPI or Best mode - 600 DPI.

Notes of interest;
  1. The book was "Soul Rider - Book One, Spirits of Flux & Anchor" by Jack Chalker
  2. This is the first in a series of five paper back books that are old and in poor condition. They are also one of my favorite series of books.
  3. I plan to do better on the next volume & will provide status if anyone is interested.
slayda is offline   Reply With Quote
Old 02-20-2007, 04:24 PM   #2
RWood
Technogeezer
RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.
 
RWood's Avatar
 
Posts: 7,233
Karma: 1601464
Join Date: Nov 2006
Location: Virginia, USA
Device: Sony PRS-500
Congratulations! I remember the first time I created an ebook -- the feeling of pride, the sense of acomplishment. It is a wonderful sensation. Thank you for sharing and stiring (good) old memories.

Yes, please tell us of the future activities. We do want to know.
RWood is offline   Reply With Quote
 
Advertisement
Old 02-20-2007, 04:33 PM   #3
Bob Russell
Recovering Gadget Addict
Bob Russell ought to be getting tired of karma fortunes by now.Bob Russell ought to be getting tired of karma fortunes by now.Bob Russell ought to be getting tired of karma fortunes by now.Bob Russell ought to be getting tired of karma fortunes by now.Bob Russell ought to be getting tired of karma fortunes by now.Bob Russell ought to be getting tired of karma fortunes by now.Bob Russell ought to be getting tired of karma fortunes by now.Bob Russell ought to be getting tired of karma fortunes by now.Bob Russell ought to be getting tired of karma fortunes by now.Bob Russell ought to be getting tired of karma fortunes by now.Bob Russell ought to be getting tired of karma fortunes by now.
 
Bob Russell's Avatar
 
Posts: 5,337
Karma: 590871
Join Date: May 2004
Location: Pittsburgh, PA
Device: Note3, MBA, DVP11
Thanks Slayda! I'd love to hear more when you do your next volume!

I'm curious... did you have misfeeds, or accidentally ever put sets of pages in the wrong way or in the wrong order? If so, how hard is it to handle that situation?

Also, does it scan both sides at once, or do some kind of paper movement and scan twice.

And, does the included software make it easy to scan to RTF?

Looks like it's a valid option to the OpticBook if you can create the loose pages. I.e. good for books you are truly converting to e-book and throwing away the paper version. (Obviously if you want to keep/protect the book, something like OpticBook is a better choice, but is slower and more manual work.)
Bob Russell is offline   Reply With Quote
Old 02-20-2007, 05:15 PM   #4
slayda
Retired & reading more!
slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.
 
slayda's Avatar
 
Posts: 2,743
Karma: 884247
Join Date: Sep 2006
Location: North Alabama, USA
Device: Kindle 1, iPad 4, iPhone 5
Quote:
Originally Posted by Bob Russell
I'm curious... did you have misfeeds, or accidentally ever put sets of pages in the wrong way or in the wrong order? If so, how hard is it to handle that situation?
Had no misfeeds, i.e. no HW problems. Had one time where I put the pages in wrong, i.e. operator error. Since I was near the beginning, I just deleted & started over, being more cautious.

The manual says you can set the scanner to "continue", i.e. if it runs out of pages, it will wait for more or you tell it to finish. I found that it was easy (when you pay attention & are cautious) to add more pages as the input queue becomes nearly empty and remove scanned pages from the output queue.

Quote:
Originally Posted by Bob Russell
Also, does it scan both sides at once, or do some kind of paper movement and scan twice.
Yes, it does scan both sides at once. You can set in to scan only one side (simplex mode instead of duplex mode.)

Quote:
Originally Posted by Bob Russell
And, does the included software make it easy to scan to RTF?
Scans directly only to 1) single page PDF, 2) multi-page PDF{default}, or 3) JPEG. However you can set the application to "ABBYY Scan2Word". The manual says this will automatically start up the ABBYY OCR application once the scan is complete. The SW is fairly straight forward though. I just did the move to OCR manually.


Quote:
Originally Posted by Bob Russell
Looks like it's a valid option to the OpticBook if you can create the loose pages. I.e. good for books you are truly converting to e-book and throwing away the paper version. (Obviously if you want to keep/protect the book, something like OpticBook is a better choice, but is slower and more manual work.)
With the SW automatically scanning & creating a multipage PDF makes it worthwhile for me to destroy the book when it is an old beat up paper back. I wouldn't want to destroy my "collector's" hard backs though.
slayda is offline   Reply With Quote
Old 02-20-2007, 06:21 PM   #5
RSaunders
Groupie
RSaunders knows what time it isRSaunders knows what time it isRSaunders knows what time it isRSaunders knows what time it isRSaunders knows what time it isRSaunders knows what time it isRSaunders knows what time it isRSaunders knows what time it isRSaunders knows what time it isRSaunders knows what time it isRSaunders knows what time it is
 
Posts: 161
Karma: 2054
Join Date: Jan 2007
Device: Sony PRS-500
Quote:
Originally Posted by slayda
Post-op steps

(Step 7) - Ordered the ABBYY version 8.0 upgrade which is more accurate & should reduce the step 5 time.

I've used ABBYY 8.0 several times, and it is a truly great program. It has a couple of quirky substitutions, like [y,'] to [y/], but you can almost add macros to fix them. It makes less than 3 mistakes a page, and it detects and highlights 90% of its mistakes. After processing its detections, the errors found by MSWord are usually typos on the paper.
RSaunders is offline   Reply With Quote
Old 02-21-2007, 09:04 AM   #6
readingaloud
Enthusiast
readingaloud is on a distinguished road
 
Posts: 34
Karma: 55
Join Date: Jan 2007
Location: Switzerland
Device: iRex iLiad; Sony Reader; Amazon Kindle
I'm in about the same place, having received my SnapScan500M (the Mac version) on Monday, and being part of the way through creating my third book.

Here's my workflow:

1. take the book apart with a box cutter. I cut it into chunks of about 50 leaves (that is, about 100 numbered pages). (5 minutes.)

2. trim the glued part of the chunk off. I use a mat-cutter, which is not the right tool for the job, but I just happen to have one, and it works fine. (2 minutes.)

3. scan the pages, in batches of about 25 leaves. (Yes, it is a good idea to get the settings right--I scan direct to PDF, black and white, auto de-skew, auto blank page deletion, one level lighter than the middle setting). (about a minute per batch, and fun to watch--the scanner seems to just inhale the books) So far I've had only one mis-feed.

4. Open all of the files together in OmniPage, and recognize them. (I don't know how long this takes because I go do something else for a while, but it's a long time--more than an hour).

5. Use the editing enviroment in OmniPage to review the file. I only do the review of the possible errors that OmniPage flags, and even of those there are some errors I don't correct because I know I can get them more easily later. (About an hour, depending on how difficult the OCRing was.)

6. Save the file plain text, and then place it into an InDesign document. Here's where, with the aid of reasonably good search and replace functionality, I fix most of the remaining errors. Some problems have to be looked at one by one--for example, I set the find/change function to replace hyphens with em-dashes, but have to look at each one to see whether it needs to be replaced or not. This is an example of something I don't bother doing in Omni-Page--I don't replace hyphens with em-dashes then because I know that I'll do it more quickly in InDesign. (This takes a couple of hours--you could spend more time getting things perfect, or a good deal less if you didn't bother with anything that you have to do case by case). This is also when I make real footnotes, and generally make the book look the way I want it to look.

7. Create a PDF and move it to my waiting iLiad. (1 minute)

8. Read the book. Feel smug. Notice errors I should have corrected. Feel less smug. (time varies.)
readingaloud is offline   Reply With Quote
Old 02-21-2007, 09:12 AM   #7
readingaloud
Enthusiast
readingaloud is on a distinguished road
 
Posts: 34
Karma: 55
Join Date: Jan 2007
Location: Switzerland
Device: iRex iLiad; Sony Reader; Amazon Kindle
what's the right dpi setting?

One other issue--I see that Slayda recommends higher resolutions for OCR, but I've been following the recommendation of the OmniPage manual, which says that 300 dpi is ideal, and that while you might want to try 400 dpi for very small text, higher resolutions that that actually hurt OCR accuracy.

It makes sense to me that higher resolutions should do better, so I'll run a comparative test when I get the leisure. Does anyone else have some experience on this issue?
readingaloud is offline   Reply With Quote
Old 02-21-2007, 12:29 PM   #8
slayda
Retired & reading more!
slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.
 
slayda's Avatar
 
Posts: 2,743
Karma: 884247
Join Date: Sep 2006
Location: North Alabama, USA
Device: Kindle 1, iPad 4, iPhone 5
Quote:
Originally Posted by readingaloud
One other issue--I see that Slayda recommends higher resolutions for OCR, but I've been following the recommendation of the OmniPage manual, which says that 300 dpi is ideal, and that while you might want to try 400 dpi for very small text, higher resolutions that that actually hurt OCR accuracy.

It makes sense to me that higher resolutions should do better, so I'll run a comparative test when I get the leisure. Does anyone else have some experience on this issue?
The "auto color select" that I used on my first attemp, apparently saw the "yellowed" pages as color so the scanning was at 150 DPI. Had I forced it to BW, it would have been 300 DPI.

I plan to compare various resolutions and see how it affects speed and OCR accuracy.

At the 150 DPI, I had a lot of confusion between t, l, 1, i, f, & sometimes J. Also "cl" was sometimes seen as "d" & "ib" as "th". I think the 300 DPI should take care of most of these. Some interesting confusion, like "lot" being seen as "tol".

More later.
slayda is offline   Reply With Quote
Old 02-21-2007, 02:33 PM   #9
ThomWill
Enthusiast
ThomWill is on a distinguished road
 
Posts: 31
Karma: 68
Join Date: Nov 2006
Device: PRS-500 (Sony Reader)
As an aside, I read those books years ago and loved them! I know my daughter would love them too. Thanks for reminding me.

Thom
ThomWill is offline   Reply With Quote
Old 02-22-2007, 05:51 AM   #10
readingaloud
Enthusiast
readingaloud is on a distinguished road
 
Posts: 34
Karma: 55
Join Date: Jan 2007
Location: Switzerland
Device: iRex iLiad; Sony Reader; Amazon Kindle
get the exposure right!

Getting the exposure right makes all the difference!

I fed my system a 700-page novel last night. The introduction, which was set in a light face, was too light, and had so many errors that I'll have to re-scan it. But the main body of the novel, scanned at 400 dpi, was amazing--I kept a tally part of the time as I proofed it, and for one 100-page section, I had 383 words flagged for review, and only 18 OCR errors!--that's less than one in every five pages. I actually made 21 corrections, however: the system flagged 3 things that were typos in the printed book.

Part of what's happening is that the book was printed better than the others I've tried, and it didn't have zillions of double quotation marks, ellipses, and em-dashes, or extended passages set italic, all of which seem to drive OmniPage nuts.

But I think that one of the main reasons to use a SnapScan for this purpose is to make it easy to rescan. Getting the exposure right makes all the difference, but if the scanning process is a pain in the neck, you're more likely to grit your teeth and work with the bad images.
readingaloud is offline   Reply With Quote
Old 02-22-2007, 01:18 PM   #11
slayda
Retired & reading more!
slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.
 
slayda's Avatar
 
Posts: 2,743
Karma: 884247
Join Date: Sep 2006
Location: North Alabama, USA
Device: Kindle 1, iPad 4, iPhone 5
What a difference resolution makes

Did some test runs last night for resolution. On a single page, my test included;
  1. Auto color select (i.e. color) 150 DPI....84/10/6 (my original resolution)
  2. B&W 300 DPI....................................65/0/1
  3. B&W 400 DPI....................................57/1/0
  4. B&W 600 DPI..................................10/0/0
  5. B&W 1200 DPI...................................4/0/0
where the n1/n2/n3 are;
n1 = the number of places marked as questionable by the OCR
n2 = the number of actual typos marked as questionable by the OCR
n3 = the number of actual typos not marked as questionable by the OCR (i.e. found through manual search)

I realize that this does not represent a statistically significant sample but it does to indicate that 600 DPI is the way I need to go with my scanning.
slayda is offline   Reply With Quote
Old 02-22-2007, 02:15 PM   #12
Bob Russell
Recovering Gadget Addict
Bob Russell ought to be getting tired of karma fortunes by now.Bob Russell ought to be getting tired of karma fortunes by now.Bob Russell ought to be getting tired of karma fortunes by now.Bob Russell ought to be getting tired of karma fortunes by now.Bob Russell ought to be getting tired of karma fortunes by now.Bob Russell ought to be getting tired of karma fortunes by now.Bob Russell ought to be getting tired of karma fortunes by now.Bob Russell ought to be getting tired of karma fortunes by now.Bob Russell ought to be getting tired of karma fortunes by now.Bob Russell ought to be getting tired of karma fortunes by now.Bob Russell ought to be getting tired of karma fortunes by now.
 
Bob Russell's Avatar
 
Posts: 5,337
Karma: 590871
Join Date: May 2004
Location: Pittsburgh, PA
Device: Note3, MBA, DVP11
Just wanted to include a link to a short ScanSnap review.
I think this is the one that you are using... the Fujitsu ScanSnap S500
http://www.pcmag.com/article2/0,1895,1990588,00.asp
Bob Russell is offline   Reply With Quote
Old 02-23-2007, 12:00 PM   #13
slayda
Retired & reading more!
slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.
 
slayda's Avatar
 
Posts: 2,743
Karma: 884247
Join Date: Sep 2006
Location: North Alabama, USA
Device: Kindle 1, iPad 4, iPhone 5
Quote:
Originally Posted by Bob Russell
Just wanted to include a link to a short ScanSnap review.
I think this is the one that you are using... the Fujitsu ScanSnap S500
http://www.pcmag.com/article2/0,1895,1990588,00.asp

Yes. Thanks Bob.
slayda is offline   Reply With Quote
Old 02-23-2007, 12:35 PM   #14
UncleDuke
books & doughnuts
UncleDuke can name that ebook in five wordsUncleDuke can name that ebook in five wordsUncleDuke can name that ebook in five wordsUncleDuke can name that ebook in five wordsUncleDuke can name that ebook in five wordsUncleDuke can name that ebook in five wordsUncleDuke can name that ebook in five wordsUncleDuke can name that ebook in five wordsUncleDuke can name that ebook in five wordsUncleDuke can name that ebook in five wordsUncleDuke can name that ebook in five words
 
UncleDuke's Avatar
 
Posts: 882
Karma: 37857
Join Date: Jan 2007
Location: usa
Device: sony reader, kindle2
Thanks. I just made my first one too.
UncleDuke is offline   Reply With Quote
Old 02-26-2007, 12:32 PM   #15
slayda
Retired & reading more!
slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.
 
slayda's Avatar
 
Posts: 2,743
Karma: 884247
Join Date: Sep 2006
Location: North Alabama, USA
Device: Kindle 1, iPad 4, iPhone 5
Second scanned book

Well I did it again but this time using an HP flat bed scanner, an old hard back book and non-destructively. The book was just the size to scan the fully opened book, 2 pages. With ABBYY you can automatically separate the two pages and orient them properly.

The scanning was slower & probably would not work as well with a newer, or thicker, or larger book. But it had the advantages that I didn't spend the time de-paging the book and my pbook is still in reasonably good shape in my library.

I also found that scanning in gray scale, rather than black & white resulted in fewer OCR errors. Even though the area near the center, since it still curved slightly away from the scanner, caused some added OCR errors.

All things considered, both the flat bed HP experience and the Fujitsu page scanner experience were bot satisfactory. However I would not do this hust for any book. It will be used only for my favorite "keepers".

Thanks for the encouragement and interest from mobilereaders.
slayda is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Wish to edit ebook created by Calibre in Dreamweaver CS5 purcelljf Calibre 1 08-09-2010 12:18 AM
TOC not showing in eBook created by InDesign gabrieleale Sony Reader 0 07-30-2010 05:02 PM
ebook created from news feed - chapter 1 is off on Nook flyash Calibre 0 06-27-2010 11:22 AM
created spreadsheet of 6 inch ebook readers ribcookie News 1 01-01-2009 01:30 PM
ABBYY (AKA - "created first ebook") slayda Workshop 3 02-27-2007 11:02 AM


All times are GMT -4. The time now is 12:04 AM.


MobileRead.com is a privately owned, operated and funded community.