09-11-2010, 02:38 AM | #1 |
Basculocolpic
Posts: 4,356
Karma: 20181319
Join Date: Jul 2010
Location: Sweden
Device: Kindle 3 WiFi, Kindle 4SO, Kindle for Android, Sony PRS-350 and PRS-T1
|
Scanning project
Kindle 3 owner.
Have just started to dibble in Calibre, iow complete newbie. I have a bunch of P-books and thought that I could have a multi-year project in turning them into E-books using a scanner and Calibre. Basically I would do two or three books a week. Here is my problem, P-books have headers and page numbering. I realize that I can select scanning zones in the scanning software, that however is fairly time consuming. I would prefer to set up a single pan-optimized zone and then have footers and page numbers omitted in the conversion process. Is it possible to do that inside Calibre? |
09-11-2010, 02:58 AM | #2 |
Wizard
Posts: 2,013
Karma: 251649
Join Date: Apr 2010
Location: Tempe, AZ, USA, Earth
Device: JetBook Lite (away from home) + 1 spare, 32" TV (at home)
|
I don't know the answer to your question but have you checked the Workshop forum under E-Book Formats in MobileRead? There has been a lot of discussion on scanning p-books to make e-books. You might find something there or someone might be able to help you.
|
Advert | |
|
09-11-2010, 05:27 AM | #3 |
Wizard
Posts: 4,552
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
|
Calibre will not be any help in the OCR side of this process in turning the images into text.
Calibre does have the facility of defining regex expressions for text (typically from headers and/or footers) that is to be omitted when converting a book. How well that would work in your case is likely to depend on the quality of the scanning/OCR process so that the text is predictable enough that a regex expression can be written to match the header/footer text. |
09-11-2010, 06:00 AM | #4 | |
Basculocolpic
Posts: 4,356
Karma: 20181319
Join Date: Jul 2010
Location: Sweden
Device: Kindle 3 WiFi, Kindle 4SO, Kindle for Android, Sony PRS-350 and PRS-T1
|
Quote:
|
|
09-11-2010, 08:34 AM | #5 |
Fanatic
Posts: 551
Karma: 1121392
Join Date: May 2008
Location: USA
Device: HTC One M8
|
What works for me is to OCR with FineReader 9.0 and then save to Word. There's an option buried in FR's Word save options to omit headers/footers. Then I convert the Word file to whatever format I ultimately want. Itworks for me, although I don't know whether it would fit with anyone else's workflow.
Last edited by wayrad; 09-11-2010 at 08:37 AM. |
Advert | |
|
09-11-2010, 09:02 AM | #6 | |
Wizard
Posts: 2,013
Karma: 251649
Join Date: Apr 2010
Location: Tempe, AZ, USA, Earth
Device: JetBook Lite (away from home) + 1 spare, 32" TV (at home)
|
Quote:
Since you don't mind destroying the paper books, a guillotine is the way to go. The kind Iain has is same as the first one I had (the one he has is not the same as the one pictured in his blog unless he changed it recently). Mine broke after about 300-400 books averaging 1" in thickness and, when I found the guillotine I bought was probably a counterfeit of the original design, I jumped through a few hoops and was able to get a full refund. I recently replaced it with a better designed one that is easier and faster to use and seems to be much better made. It's a Perfect G12 Pro. It cost about 50% more than the genuine version of the first scanner I got would have cost but it is well worth it. I've had mine only three days and have cut only 32 books with it so time will tell how well it will hold up. There was a bit of a learning curve finding out the best way to adjust the fence to easily and safely position the book properly for cutting off the spine. I shared with Iain via PMs how to do it and he is now using a variation of my method. Since there seems to an interest, I probably should post a copy of that on the Workshop forum. How were you planning on scanning your books after cutting off the spines? Last edited by Lady Fitzgerald; 09-11-2010 at 09:06 AM. |
|
09-11-2010, 09:31 AM | #7 |
Basculocolpic
Posts: 4,356
Karma: 20181319
Join Date: Jul 2010
Location: Sweden
Device: Kindle 3 WiFi, Kindle 4SO, Kindle for Android, Sony PRS-350 and PRS-T1
|
That is one serious guillotine! It could take off a hand unless you are careful. I fear I might end up as a cautionary tale on one of those ER doctor forums. I have two flatbed scanners, one is an old SCSI HP scanner with ADF. I thought that could be setup for specialty use for a book scanning project. I also have OmniPage Pro 12 which should be good enough for OCR needs. Since this is limited to personal use a few misreads shouldn't be a major problem. |
09-11-2010, 10:19 AM | #8 | |
Wizard
Posts: 2,013
Karma: 251649
Join Date: Apr 2010
Location: Tempe, AZ, USA, Earth
Device: JetBook Lite (away from home) + 1 spare, 32" TV (at home)
|
Quote:
After I tried parting company with my thumb, I worked out a procedure using shims that allowed me to quickly and accurately position the books for cutting and still keep my hands far away from the blade. It's worked out well so I won't need to change my name to Frodo. |
|
09-11-2010, 10:49 AM | #9 |
Wizard
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
Not sure if you're aware of some of the alternatives to slicing up the book:
http://diybookscanner.org/ http://bkrpr.org/doku.php Seems faster, less chance of mixed up pages, and also non-destructive... A lot of software/workflow discussions on the first site as well. |
09-11-2010, 11:31 AM | #10 | |
Wizard
Posts: 2,013
Karma: 251649
Join Date: Apr 2010
Location: Tempe, AZ, USA, Earth
Device: JetBook Lite (away from home) + 1 spare, 32" TV (at home)
|
Quote:
|
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Scanning books - New need help | Sporadic | Workshop | 9 | 04-19-2009 01:11 PM |
Microsoft joins Cornell U in mass book-scanning project | Alexander Turcic | News | 9 | 08-20-2008 07:49 AM |
on scanning | Paul Moews | iRex | 9 | 10-17-2007 01:42 AM |
Book scanning | kusmi | iRex | 33 | 10-09-2007 05:34 AM |
Win for Google book-scanning project in Germany | Alexander Turcic | News | 0 | 07-01-2006 08:54 AM |