|03-04-2012, 09:10 PM||#16|
Join Date: Oct 2007
Device: HDX 8.9, AuraHD, Nook HD+, Kindle 2,3,T , Opus, Nexus7, iPhone 6, etc
|03-04-2012, 10:11 PM||#17|
Join Date: Aug 2011
Location: Wouldn't you like to know.
Device: Sony PRS-350:Sony PRS-T1:Rooted Nook Tablet
That is probably the number of people that are truly active on this forum...either way HarryT responded it was basically okay given those circumstances and I merely asked why go to the trouble when one of the boards 'highest rated' members gives it a green light.
I wonder how many of the TOTAL members of this forum have the Harry Potter series in some e-format...
Last edited by jmaejr; 03-04-2012 at 10:32 PM.
|04-03-2012, 06:22 PM||#18|
Join Date: Feb 2012
Location: Florida USA
Device: Kindle 4 SO (Died), Kindle Fire HD 7"
As the OP, I'd like to give an update:
Finished my first book a couple of weeks ago. It's a paperback of which there is no e-copy available (BTW folks, in this instance, scanning a book which you already own isn't piracy, it is fair use and legal. Same as making a backup copy of a music CD you own, or ripping said CD to MP3.).
I scanned all pages to TIFs, using an ancient Lexmark X1100 series AIO scanner I have here (I was very careful with the book, as I don't like flattening it out on that flatbed scanner). Pages were run through ScanTailor to straighten out any misaligned scans and to cut the double pages apart. Pages were then run through Adobe Acrobat 9 Enhanced's OCR function, with Clear Scan enabled. The OCR output was saved as html, as I didn't know how to save as xhtml then (do now). Files were then opened in Sigil, for editing, proofreading, etc.
I have to say that for this particular book, Acrobat's OCR engine sucks. It took me probably 36 hours of proofing to fix everything, as I had to read and re-read the book to catch all of the errors - everything from a single wrong letter in a word, to entire sentences missing from the text. Forget about italics, they were always wrong or nonexistent.
A few things I'd like to change:
Sigil did a good job formatting the things I thought it would choke on, such as the map at the beginning of the book. It did choke on line drawings at the beginning of each chapter, though, so I had to cut n' paste one from one of the original scans as a bitmap and use that for each chapter. Ugly, but worked.
The gobs of extra lines in the text has to go. Thankfully, I found out how to deal with this in Calibre. Along with paragraph indentation. Sigil has no capacity for this, and it's a serious oversight, as it's touted as a friggin' editor! In this day and age, one shouldn't need to go into the code to do such obvious tweaking.
Sigil changes things in the book once you save it. I saved changes to Chapter Two FOUR TIMES (a simple justify center of the word "TWO" in the beginning of the file). Each time when I opened the book on my device, "TWO" was justify left instead of center. As it is the last noticeable error in the book, I said "screw it" and am leaving it as-is, as I'm not going to mess with it anymore.
If I didn't have the paper copy of the book here to proof the OCR against, I couldn't have finished this sane (and this was only a 250 page paperback!). If I was working solely from original scans, on only a laptop and not a multiple monitor setup, the constant flipping back and forth would have driven me nuts. The next few books are going to be much more challenging, with triple column text on each page, and/or lots of inserted line art or photos. The fonts are a lot older as well, which will (I'm sure) give Acrobat's OCR even more fits. I gotta either figure out how to improve Acrobat's accuracy, or get a different OCR engine.
I am very, very proud of the job I did on this e-book, though. It is as attractive to look at and to read as any commercially published e-book I've read.
Suggestions as to better software or changes to workflow are quite welcome. I'm starting on my second project very soon.
|04-03-2012, 10:47 PM||#19|
Join Date: Aug 2009
Device: Kobo Mini (4GB), Nook Classic wi-fi, iPod Touch (Bluefire Reader)
For scanning, I use digital camera based rigs like those described here, one for hardcovers and one for paperbacks and small hardcovers. I then batch crop the images with JPEGCrops, then process the images with Scan Tailor, OCR with Finereader, export the text as html, clean all the junk code that FineReader can add (and I'm sure Acrobat does too) with Toxaris's excellent Word macro. Then I format the cleaned html into an epub with Sigil.
|04-04-2012, 03:28 AM||#20|
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-300, PRS-T1
I personally save two formats. One ePUB, since it is an archive with files in an open format and one PDF/A. The PDF/A contains both the scanned file and the OCR-ed text. That makes it easy to search for text and being able to see the original image.
|Thread Tools||Search this Thread|
|Thread||Thread Starter||Forum||Replies||Last Post|
|Story HD and Google Books scanned free books||wilsonch||iRiver Story||8||12-14-2011 11:23 PM|
|Scanned books to Epub, best software?||Student1||Workshop||4||02-27-2009 04:08 PM|
|Small scanned books||Paul Moews||iRex||22||02-05-2009 06:58 PM|
|Ok I have scanned pdf books....but||DeathtoToasters||Sony Reader||38||11-04-2008 08:51 PM|
|Scanned books - a rant||FuzzyGamer||Sony Reader||31||04-01-2008 04:39 PM|