03-15-2008, 06:36 AM | #1 |
Addict
Posts: 370
Karma: 1553
Join Date: Feb 2008
Location: Melbun
Device: Kobo H2O
|
Ripping pbooks - should we have a wiki page?
I've just started ripping pbooks and am making all the usual discoveries. So I'm thinking we should have a wiki section for this, or at least a single page. ideally with a range of topics:
I'm sure there are other things. The one that made my life much easier was deciding to focus on ezarets comment "lighting is critical". I cobbled together a light and diffuser onto my tripod and that hugely increased both accuracy and consistency of the OCR, so the last 200 pages of "Matter" took less time than the previous 100. Should I just start the wiki page or should we sort things out a bit here first? |
03-15-2008, 04:42 PM | #2 |
Fanatic
Posts: 534
Karma: 469999
Join Date: Feb 2008
Location: Scotland
Device: Sony PRS-650 (PRS+ alpha - thanks Kartu!)
|
I've no intention of doing it myself, but just out of interest, now that you're adept at it, how much time are you talking about to "rip" a book?
|
Advert | |
|
03-15-2008, 05:27 PM | #3 |
Retired & reading more!
Posts: 2,764
Karma: 1884247
Join Date: Sep 2006
Location: North Alabama, USA
Device: Kindle 1, iPad Air 2, iPhone 6S+, Kobo Aura One
|
For my method (i.e. cutting the spine off & feeding it through my double side scanner with autometic feed & using ABBYY Finereader) it usually takes about 20 minutes to cut & scan. The OCR varies based on the page size & print quality - and to a lesser to the print font. For a typical paperback book about 2 - 3 hours finding and correcting OCR mistakes. They don't all show up with spell checker & I've found some common OCR errors to look for. These can be fixed with a global find/replace.
|
03-15-2008, 07:12 PM | #4 |
Addict
Posts: 370
Karma: 1553
Join Date: Feb 2008
Location: Melbun
Device: Kobo H2O
|
It takes me longer than that to photograph each page, but I still have a perfectly good book afterwards. That matters to me a little, as I can still share the book with friends. I don't know the actual time, the first book is probably a lot slower than future ones. I'll do another one this week I hope and see how I go. I'm guessing about 10 hours all up, but a lot of that time was spent fixing OCR errors when I had the lighting wrong.
The standard errors... definitely. In "Matter" about 80% of the "Sarl" came out "Sari", which might be a dictionary-based fix, but "vou" and "vour" never got dictionary-corrected. Now that I'm reading the ebook I'm finding even more errors... |
03-15-2008, 10:27 PM | #5 |
Retired & reading more!
Posts: 2,764
Karma: 1884247
Join Date: Sep 2006
Location: North Alabama, USA
Device: Kindle 1, iPad Air 2, iPhone 6S+, Kobo Aura One
|
I have to admit that I also find several this way. That's why I'd like my Cybook to be able to highlight a word or phrase for later correction. Often I use my Palm TX just for that purpose.
Also I have used a flatbed scanner when I want to keep the book. I'm not a wiki-literate person but would like to see a wiki for this activity. |
Advert | |
|
03-16-2008, 08:59 PM | #6 |
Addict
Posts: 370
Karma: 1553
Join Date: Feb 2008
Location: Melbun
Device: Kobo H2O
|
|
03-18-2008, 06:41 AM | #7 |
Addict
Posts: 370
Karma: 1553
Join Date: Feb 2008
Location: Melbun
Device: Kobo H2O
|
OK, I just ripped "Pushing Ice" by Alistair Reynolds, and it took ~40 minutes to take 237 page photographs, then about 100 minutes to proof-read the result. I expect to do more proofing as I read it, but that's about 140 minutes for 280 pages, or 2 pages a minute (with two pages per image). I was running Finereader while taking photos, so I could start proofreading almost immediately after I finished taking photos.
New conclusion: a book stand and two cameras might work better, but one camera with a flat sheet of glass does actually work. Lighting seems to be more important than utter flatness. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
about MobileRead page in the wiki | Nate the great | Feedback | 10 | 03-16-2009 08:02 AM |
Where to buy page for wiki? | bbusybookworm | Lounge | 11 | 10-05-2008 04:25 PM |
RSS feed of Wiki page | daffy4u | Lounge | 6 | 07-21-2008 02:52 PM |