![]() |
#1 |
Retired & reading more!
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,764
Karma: 1884247
Join Date: Sep 2006
Location: North Alabama, USA
Device: Kindle 1, iPad Air 2, iPhone 6S+, Kobo Aura One
|
Created first ebook
Well, I bought the Snapscan S500 based on comments in this Forum & this weekend (since my Reader was forgotten last Friday & spent the weekend at the office) I created my first ebook.
I made several wrong moves beginning with using the default settings - Normal (Automatic color detection & lowest resolution - 150 DPI for color & 300 DPI for B&W). Since the book was old & yellowed, color was detected so the images were 150 DPI with a lot of OCR errors. After a day & a half of proofing/correcting, I managed to create an RTF which went into BookDesigner to create an LRF. Then I decided to read the Snapscan manual - so next time should be smoother. Step 1 - It was fairly easy to de-page the paper back book. I simply pulled the cover off (it was already partly off). Then I removed the pages, one at a time by gently pulling them off. Every 6 - 10 pages I needed to trim the excess glue off so the pages were easier to remove. Total time - about 20 minutes. Step 2 - (Should have been to read the Snapscan manual.) Scan the pages. This was the easiest part of the procedure. Total time - about 20 minutes. Step 3 - Using the included ABBYY to OCR the book & create a "searchable PDF". Total time - about 10 minutes. Step 4 - Discovered that the included ABBYY OCR wasn't as good as my regular ABBYY OCR version 6.0 so re-OCRed the book with it. Total time - about 10 minutes. Step 5 - Checking spelling, making corrections, removing page numbers etc. Total time - about 10 hours. (Remember I didn't have my reader so I needed some entertainment.) Step 5a - Used the Adobe Acrobat (included with Snapscan) to create a RTF file. Some of step 5 was prior to 5a and some post 5a. Total time - about 1 minute. Step 6 - Used Book designer to create an LRF. Total time - about 3 minutes. Post-op steps (Step 7) - Ordered the ABBYY version 8.0 upgrade which is more accurate & should reduce the step 5 time. (Step 8) - Read the Snapscan manual & found out I should have selected B&W, either Better mode - 400 DPI or Best mode - 600 DPI. Notes of interest;
![]() |
![]() |
![]() |
![]() |
#2 |
Technogeezer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,233
Karma: 1601464
Join Date: Nov 2006
Location: Virginia, USA
Device: Sony PRS-500
|
Congratulations! I remember the first time I created an ebook -- the feeling of pride, the sense of acomplishment. It is a wonderful sensation. Thank you for sharing and stiring (good) old memories.
Yes, please tell us of the future activities. We do want to know. ![]() |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Recovering Gadget Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,381
Karma: 676161
Join Date: May 2004
Location: Pittsburgh, PA
Device: iPad
|
Thanks Slayda! I'd love to hear more when you do your next volume!
I'm curious... did you have misfeeds, or accidentally ever put sets of pages in the wrong way or in the wrong order? If so, how hard is it to handle that situation? Also, does it scan both sides at once, or do some kind of paper movement and scan twice. And, does the included software make it easy to scan to RTF? Looks like it's a valid option to the OpticBook if you can create the loose pages. I.e. good for books you are truly converting to e-book and throwing away the paper version. (Obviously if you want to keep/protect the book, something like OpticBook is a better choice, but is slower and more manual work.) |
![]() |
![]() |
![]() |
#4 | ||||
Retired & reading more!
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,764
Karma: 1884247
Join Date: Sep 2006
Location: North Alabama, USA
Device: Kindle 1, iPad Air 2, iPhone 6S+, Kobo Aura One
|
Quote:
The manual says you can set the scanner to "continue", i.e. if it runs out of pages, it will wait for more or you tell it to finish. I found that it was easy (when you pay attention & are cautious) to add more pages as the input queue becomes nearly empty and remove scanned pages from the output queue. Quote:
Quote:
Quote:
|
||||
![]() |
![]() |
![]() |
#5 | |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 162
Karma: 2054
Join Date: Jan 2007
Device: Sony PRS-500
|
Quote:
|
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Enthusiast
![]() Posts: 34
Karma: 55
Join Date: Jan 2007
Location: Switzerland
Device: iRex iLiad; Sony Reader; Amazon Kindle
|
I'm in about the same place, having received my SnapScan500M (the Mac version) on Monday, and being part of the way through creating my third book.
Here's my workflow: 1. take the book apart with a box cutter. I cut it into chunks of about 50 leaves (that is, about 100 numbered pages). (5 minutes.) 2. trim the glued part of the chunk off. I use a mat-cutter, which is not the right tool for the job, but I just happen to have one, and it works fine. (2 minutes.) 3. scan the pages, in batches of about 25 leaves. (Yes, it is a good idea to get the settings right--I scan direct to PDF, black and white, auto de-skew, auto blank page deletion, one level lighter than the middle setting). (about a minute per batch, and fun to watch--the scanner seems to just inhale the books) So far I've had only one mis-feed. 4. Open all of the files together in OmniPage, and recognize them. (I don't know how long this takes because I go do something else for a while, but it's a long time--more than an hour). 5. Use the editing enviroment in OmniPage to review the file. I only do the review of the possible errors that OmniPage flags, and even of those there are some errors I don't correct because I know I can get them more easily later. (About an hour, depending on how difficult the OCRing was.) 6. Save the file plain text, and then place it into an InDesign document. Here's where, with the aid of reasonably good search and replace functionality, I fix most of the remaining errors. Some problems have to be looked at one by one--for example, I set the find/change function to replace hyphens with em-dashes, but have to look at each one to see whether it needs to be replaced or not. This is an example of something I don't bother doing in Omni-Page--I don't replace hyphens with em-dashes then because I know that I'll do it more quickly in InDesign. (This takes a couple of hours--you could spend more time getting things perfect, or a good deal less if you didn't bother with anything that you have to do case by case). This is also when I make real footnotes, and generally make the book look the way I want it to look. 7. Create a PDF and move it to my waiting iLiad. (1 minute) 8. Read the book. Feel smug. Notice errors I should have corrected. Feel less smug. (time varies.) |
![]() |
![]() |
![]() |
#7 |
Enthusiast
![]() Posts: 34
Karma: 55
Join Date: Jan 2007
Location: Switzerland
Device: iRex iLiad; Sony Reader; Amazon Kindle
|
what's the right dpi setting?
One other issue--I see that Slayda recommends higher resolutions for OCR, but I've been following the recommendation of the OmniPage manual, which says that 300 dpi is ideal, and that while you might want to try 400 dpi for very small text, higher resolutions that that actually hurt OCR accuracy.
It makes sense to me that higher resolutions should do better, so I'll run a comparative test when I get the leisure. Does anyone else have some experience on this issue? |
![]() |
![]() |
![]() |
#8 | |
Retired & reading more!
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,764
Karma: 1884247
Join Date: Sep 2006
Location: North Alabama, USA
Device: Kindle 1, iPad Air 2, iPhone 6S+, Kobo Aura One
|
Quote:
I plan to compare various resolutions and see how it affects speed and OCR accuracy. At the 150 DPI, I had a lot of confusion between t, l, 1, i, f, & sometimes J. Also "cl" was sometimes seen as "d" & "ib" as "th". I think the 300 DPI should take care of most of these. Some interesting confusion, like "lot" being seen as "tol". More later. |
|
![]() |
![]() |
![]() |
#9 |
Enthusiast
![]() Posts: 31
Karma: 68
Join Date: Nov 2006
Device: PRS-500 (Sony Reader)
|
As an aside, I read those books years ago and loved them! I know my daughter would love them too. Thanks for reminding me.
Thom |
![]() |
![]() |
![]() |
#10 |
Enthusiast
![]() Posts: 34
Karma: 55
Join Date: Jan 2007
Location: Switzerland
Device: iRex iLiad; Sony Reader; Amazon Kindle
|
get the exposure right!
Getting the exposure right makes all the difference!
I fed my system a 700-page novel last night. The introduction, which was set in a light face, was too light, and had so many errors that I'll have to re-scan it. But the main body of the novel, scanned at 400 dpi, was amazing--I kept a tally part of the time as I proofed it, and for one 100-page section, I had 383 words flagged for review, and only 18 OCR errors!--that's less than one in every five pages. I actually made 21 corrections, however: the system flagged 3 things that were typos in the printed book. Part of what's happening is that the book was printed better than the others I've tried, and it didn't have zillions of double quotation marks, ellipses, and em-dashes, or extended passages set italic, all of which seem to drive OmniPage nuts. But I think that one of the main reasons to use a SnapScan for this purpose is to make it easy to rescan. Getting the exposure right makes all the difference, but if the scanning process is a pain in the neck, you're more likely to grit your teeth and work with the bad images. |
![]() |
![]() |
![]() |
#11 |
Retired & reading more!
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,764
Karma: 1884247
Join Date: Sep 2006
Location: North Alabama, USA
Device: Kindle 1, iPad Air 2, iPhone 6S+, Kobo Aura One
|
What a difference resolution makes
Did some test runs last night for resolution. On a single page, my test included;
n1 = the number of places marked as questionable by the OCR n2 = the number of actual typos marked as questionable by the OCR n3 = the number of actual typos not marked as questionable by the OCR (i.e. found through manual search) I realize that this does not represent a statistically significant sample but it does to indicate that 600 DPI is the way I need to go with my scanning. |
![]() |
![]() |
![]() |
#12 |
Recovering Gadget Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,381
Karma: 676161
Join Date: May 2004
Location: Pittsburgh, PA
Device: iPad
|
Just wanted to include a link to a short ScanSnap review.
I think this is the one that you are using... the Fujitsu ScanSnap S500 http://www.pcmag.com/article2/0,1895,1990588,00.asp |
![]() |
![]() |
![]() |
#13 | |
Retired & reading more!
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,764
Karma: 1884247
Join Date: Sep 2006
Location: North Alabama, USA
Device: Kindle 1, iPad Air 2, iPhone 6S+, Kobo Aura One
|
Quote:
Yes. Thanks Bob. |
|
![]() |
![]() |
![]() |
#14 |
books & doughnuts
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 882
Karma: 37857
Join Date: Jan 2007
Location: usa
Device: sony reader, kindle2
|
Thanks. I just made my first one too.
|
![]() |
![]() |
![]() |
#15 |
Retired & reading more!
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,764
Karma: 1884247
Join Date: Sep 2006
Location: North Alabama, USA
Device: Kindle 1, iPad Air 2, iPhone 6S+, Kobo Aura One
|
Second scanned book
Well I did it again but this time using an HP flat bed scanner, an old hard back book and non-destructively. The book was just the size to scan the fully opened book, 2 pages. With ABBYY you can automatically separate the two pages and orient them properly.
The scanning was slower & probably would not work as well with a newer, or thicker, or larger book. But it had the advantages that I didn't spend the time de-paging the book and my pbook is still in reasonably good shape in my library. I also found that scanning in gray scale, rather than black & white resulted in fewer OCR errors. Even though the area near the center, since it still curved slightly away from the scanner, caused some added OCR errors. All things considered, both the flat bed HP experience and the Fujitsu page scanner experience were bot satisfactory. However I would not do this hust for any book. It will be used only for my favorite "keepers". Thanks for the encouragement and interest from mobilereaders. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Wish to edit ebook created by Calibre in Dreamweaver CS5 | purcelljf | Calibre | 1 | 08-08-2010 11:18 PM |
TOC not showing in eBook created by InDesign | gabrieleale | Sony Reader | 0 | 07-30-2010 04:02 PM |
ebook created from news feed - chapter 1 is off on Nook | flyash | Calibre | 0 | 06-27-2010 10:22 AM |
created spreadsheet of 6 inch ebook readers | ribcookie | News | 1 | 01-01-2009 12:30 PM |
ABBYY (AKA - "created first ebook") | slayda | Workshop | 3 | 02-27-2007 10:02 AM |