Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 03-15-2008, 06:36 AM   #1
moz
Addict
moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.
 
moz's Avatar
 
Posts: 368
Karma: 1553
Join Date: Feb 2008
Location: Melbun
Device: Sony PRS-505
Ripping pbooks - should we have a wiki page?

I've just started ripping pbooks and am making all the usual discoveries. So I'm thinking we should have a wiki section for this, or at least a single page. ideally with a range of topics:
  1. OCR software - links to reviews plus recommendations
  2. using a scanner - tips and tricks
  3. using a camera - tips and tricks
  4. arranging pages and sections when ripping
  5. proof reading tips
  6. output - what format and software to use, how to lay it out

I'm sure there are other things. The one that made my life much easier was deciding to focus on ezarets comment "lighting is critical". I cobbled together a light and diffuser onto my tripod and that hugely increased both accuracy and consistency of the OCR, so the last 200 pages of "Matter" took less time than the previous 100.

Should I just start the wiki page or should we sort things out a bit here first?
Attached Thumbnails
Click image for larger version

Name:	book-scanner-moz.jpg
Views:	215
Size:	150.6 KB
ID:	11534  
moz is offline   Reply With Quote
Old 03-15-2008, 04:42 PM   #2
Halk
Fanatic
Halk ought to be getting tired of karma fortunes by now.Halk ought to be getting tired of karma fortunes by now.Halk ought to be getting tired of karma fortunes by now.Halk ought to be getting tired of karma fortunes by now.Halk ought to be getting tired of karma fortunes by now.Halk ought to be getting tired of karma fortunes by now.Halk ought to be getting tired of karma fortunes by now.Halk ought to be getting tired of karma fortunes by now.Halk ought to be getting tired of karma fortunes by now.Halk ought to be getting tired of karma fortunes by now.Halk ought to be getting tired of karma fortunes by now.
 
Halk's Avatar
 
Posts: 513
Karma: 469999
Join Date: Feb 2008
Location: Scotland
Device: Sony PRS-650 (PRS+ alpha - thanks Kartu!)
I've no intention of doing it myself, but just out of interest, now that you're adept at it, how much time are you talking about to "rip" a book?
Halk is offline   Reply With Quote
Old 03-15-2008, 05:27 PM   #3
slayda
Retired & reading more!
slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.
 
slayda's Avatar
 
Posts: 2,738
Karma: 884247
Join Date: Sep 2006
Location: North Alabama, USA
Device: Kindle 1, iPad 4, iPhone 5
Quote:
Originally Posted by Halk View Post
I've no intention of doing it myself, but just out of interest, now that you're adept at it, how much time are you talking about to "rip" a book?
For my method (i.e. cutting the spine off & feeding it through my double side scanner with autometic feed & using ABBYY Finereader) it usually takes about 20 minutes to cut & scan. The OCR varies based on the page size & print quality - and to a lesser to the print font. For a typical paperback book about 2 - 3 hours finding and correcting OCR mistakes. They don't all show up with spell checker & I've found some common OCR errors to look for. These can be fixed with a global find/replace.
slayda is offline   Reply With Quote
Old 03-15-2008, 07:12 PM   #4
moz
Addict
moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.
 
moz's Avatar
 
Posts: 368
Karma: 1553
Join Date: Feb 2008
Location: Melbun
Device: Sony PRS-505
It takes me longer than that to photograph each page, but I still have a perfectly good book afterwards. That matters to me a little, as I can still share the book with friends. I don't know the actual time, the first book is probably a lot slower than future ones. I'll do another one this week I hope and see how I go. I'm guessing about 10 hours all up, but a lot of that time was spent fixing OCR errors when I had the lighting wrong.

The standard errors... definitely. In "Matter" about 80% of the "Sarl" came out "Sari", which might be a dictionary-based fix, but "vou" and "vour" never got dictionary-corrected.

Now that I'm reading the ebook I'm finding even more errors...
moz is offline   Reply With Quote
Old 03-15-2008, 10:27 PM   #5
slayda
Retired & reading more!
slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.
 
slayda's Avatar
 
Posts: 2,738
Karma: 884247
Join Date: Sep 2006
Location: North Alabama, USA
Device: Kindle 1, iPad 4, iPhone 5
Quote:
Originally Posted by moz View Post
Now that I'm reading the ebook I'm finding even more errors...
I have to admit that I also find several this way. That's why I'd like my Cybook to be able to highlight a word or phrase for later correction. Often I use my Palm TX just for that purpose.

Also I have used a flatbed scanner when I want to keep the book.

I'm not a wiki-literate person but would like to see a wiki for this activity.
slayda is offline   Reply With Quote
Old 03-16-2008, 08:59 PM   #6
moz
Addict
moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.
 
moz's Avatar
 
Posts: 368
Karma: 1553
Join Date: Feb 2008
Location: Melbun
Device: Sony PRS-505
http://wiki.mobileread.com/wiki/Digi...ooks_to_Ebooks
moz is offline   Reply With Quote
Old 03-18-2008, 06:41 AM   #7
moz
Addict
moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.moz once ate a cherry pie in a record 7 seconds.
 
moz's Avatar
 
Posts: 368
Karma: 1553
Join Date: Feb 2008
Location: Melbun
Device: Sony PRS-505
OK, I just ripped "Pushing Ice" by Alistair Reynolds, and it took ~40 minutes to take 237 page photographs, then about 100 minutes to proof-read the result. I expect to do more proofing as I read it, but that's about 140 minutes for 280 pages, or 2 pages a minute (with two pages per image). I was running Finereader while taking photos, so I could start proofreading almost immediately after I finished taking photos.

New conclusion: a book stand and two cameras might work better, but one camera with a flat sheet of glass does actually work. Lighting seems to be more important than utter flatness.
moz is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
about MobileRead page in the wiki Nate the great Feedback 10 03-16-2009 08:02 AM
Where to buy page for wiki? bbusybookworm Lounge 11 10-05-2008 04:25 PM
RSS feed of Wiki page daffy4u Lounge 6 07-21-2008 02:52 PM


All times are GMT -4. The time now is 03:43 PM.


MobileRead.com is a privately owned, operated and funded community.