Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 03-04-2012, 09:10 PM   #16
AnemicOak
Bookaholic
AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.
 
AnemicOak's Avatar
 
Posts: 10,429
Karma: 28936355
Join Date: Oct 2007
Location: Minnesota
Device: HDX 8.9, AuraHD, Nook HD+, Kindle 2,3,T , Opus, Nexus7, iPhone5, etc
Quote:
Originally Posted by jmaejr View Post
That SEEMS to be the consensus...at least the majority opinion here.
Just keep in ming only 195 people participated in that poll.
AnemicOak is offline   Reply With Quote
Old 03-04-2012, 10:11 PM   #17
jmaejr
Banned
jmaejr ought to be getting tired of karma fortunes by now.jmaejr ought to be getting tired of karma fortunes by now.jmaejr ought to be getting tired of karma fortunes by now.jmaejr ought to be getting tired of karma fortunes by now.jmaejr ought to be getting tired of karma fortunes by now.jmaejr ought to be getting tired of karma fortunes by now.jmaejr ought to be getting tired of karma fortunes by now.jmaejr ought to be getting tired of karma fortunes by now.jmaejr ought to be getting tired of karma fortunes by now.jmaejr ought to be getting tired of karma fortunes by now.jmaejr ought to be getting tired of karma fortunes by now.
 
Posts: 132
Karma: 566638
Join Date: Aug 2011
Location: Wouldn't you like to know.
Device: Sony PRS-350:Sony PRS-T1:Rooted Nook Tablet
That is probably the number of people that are truly active on this forum...either way HarryT responded it was basically okay given those circumstances and I merely asked why go to the trouble when one of the boards 'highest rated' members gives it a green light.

I wonder how many of the TOTAL members of this forum have the Harry Potter series in some e-format...

Last edited by jmaejr; 03-04-2012 at 10:32 PM.
jmaejr is offline   Reply With Quote
Old 04-03-2012, 06:22 PM   #18
TechSarge
Junior Member
TechSarge began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Feb 2012
Location: Florida USA
Device: Kindle 4 SO (Died), Kindle Fire HD 7"
As the OP, I'd like to give an update:

Finished my first book a couple of weeks ago. It's a paperback of which there is no e-copy available (BTW folks, in this instance, scanning a book which you already own isn't piracy, it is fair use and legal. Same as making a backup copy of a music CD you own, or ripping said CD to MP3.).

I scanned all pages to TIFs, using an ancient Lexmark X1100 series AIO scanner I have here (I was very careful with the book, as I don't like flattening it out on that flatbed scanner). Pages were run through ScanTailor to straighten out any misaligned scans and to cut the double pages apart. Pages were then run through Adobe Acrobat 9 Enhanced's OCR function, with Clear Scan enabled. The OCR output was saved as html, as I didn't know how to save as xhtml then (do now). Files were then opened in Sigil, for editing, proofreading, etc.

I have to say that for this particular book, Acrobat's OCR engine sucks. It took me probably 36 hours of proofing to fix everything, as I had to read and re-read the book to catch all of the errors - everything from a single wrong letter in a word, to entire sentences missing from the text. Forget about italics, they were always wrong or nonexistent.

A few things I'd like to change:

Sigil did a good job formatting the things I thought it would choke on, such as the map at the beginning of the book. It did choke on line drawings at the beginning of each chapter, though, so I had to cut n' paste one from one of the original scans as a bitmap and use that for each chapter. Ugly, but worked.

The gobs of extra lines in the text has to go. Thankfully, I found out how to deal with this in Calibre. Along with paragraph indentation. Sigil has no capacity for this, and it's a serious oversight, as it's touted as a friggin' editor! In this day and age, one shouldn't need to go into the code to do such obvious tweaking.

Sigil changes things in the book once you save it. I saved changes to Chapter Two FOUR TIMES (a simple justify center of the word "TWO" in the beginning of the file). Each time when I opened the book on my device, "TWO" was justify left instead of center. As it is the last noticeable error in the book, I said "screw it" and am leaving it as-is, as I'm not going to mess with it anymore.

If I didn't have the paper copy of the book here to proof the OCR against, I couldn't have finished this sane (and this was only a 250 page paperback!). If I was working solely from original scans, on only a laptop and not a multiple monitor setup, the constant flipping back and forth would have driven me nuts. The next few books are going to be much more challenging, with triple column text on each page, and/or lots of inserted line art or photos. The fonts are a lot older as well, which will (I'm sure) give Acrobat's OCR even more fits. I gotta either figure out how to improve Acrobat's accuracy, or get a different OCR engine.

I am very, very proud of the job I did on this e-book, though. It is as attractive to look at and to read as any commercially published e-book I've read.

Suggestions as to better software or changes to workflow are quite welcome. I'm starting on my second project very soon.
TechSarge is offline   Reply With Quote
Old 04-03-2012, 10:47 PM   #19
Keroberos
Zealot
Keroberos can program the VCR without an owner's manual.Keroberos can program the VCR without an owner's manual.Keroberos can program the VCR without an owner's manual.Keroberos can program the VCR without an owner's manual.Keroberos can program the VCR without an owner's manual.Keroberos can program the VCR without an owner's manual.Keroberos can program the VCR without an owner's manual.Keroberos can program the VCR without an owner's manual.Keroberos can program the VCR without an owner's manual.Keroberos can program the VCR without an owner's manual.Keroberos can program the VCR without an owner's manual.
 
Keroberos's Avatar
 
Posts: 127
Karma: 194002
Join Date: Aug 2009
Device: Kobo Mini (4GB), Nook Classic wi-fi, iPod Touch (Bluefire Reader)
Quote:
Suggestions as to better software or changes to workflow are quite welcome. I'm starting on my second project very soon.
I would definitely recommend switching OCR software (Acrobat's OCR sucks). I use ABBYY FineReader Professional--$170, but worth every penny in my opinion (with training, I don't think I spend more than an hour or two spell checking). They have a cheaper express version for $50, but I don't know how good it is. There are free OCR programs out there, can't say how good or user friendly they are (I tried Tesseract with a GUI front-end but gave up).

For scanning, I use digital camera based rigs like those described here, one for hardcovers and one for paperbacks and small hardcovers. I then batch crop the images with JPEGCrops, then process the images with Scan Tailor, OCR with Finereader, export the text as html, clean all the junk code that FineReader can add (and I'm sure Acrobat does too) with Toxaris's excellent Word macro. Then I format the cleaned html into an epub with Sigil.
Keroberos is offline   Reply With Quote
Old 04-04-2012, 03:28 AM   #20
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 3,101
Karma: 5861069
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-300, PRS-T1
Quote:
Originally Posted by TechSarge View Post
Sigil changes things in the book once you save it. I saved changes to Chapter Two FOUR TIMES (a simple justify center of the word "TWO" in the beginning of the file). Each time when I opened the book on my device, "TWO" was justify left instead of center. As it is the last noticeable error in the book, I said "screw it" and am leaving it as-is, as I'm not going to mess with it anymore.
Centering text can be a bit cumbersome sometimes, but usually that is due to the reading software. Sigil does some sanity checks before saving. If you use a style with the attribute 'text-align: center' it should work.

I personally save two formats. One ePUB, since it is an archive with files in an open format and one PDF/A. The PDF/A contains both the scanned file and the OCR-ed text. That makes it easy to search for text and being able to see the original image.
Toxaris is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Story HD and Google Books scanned free books wilsonch iRiver Story 8 12-14-2011 11:23 PM
Scanned books to Epub, best software? Student1 Workshop 4 02-27-2009 04:08 PM
Small scanned books Paul Moews iRex 22 02-05-2009 06:58 PM
Ok I have scanned pdf books....but DeathtoToasters Sony Reader 38 11-04-2008 08:51 PM
Scanned books - a rant FuzzyGamer Sony Reader 31 04-01-2008 04:39 PM


All times are GMT -4. The time now is 05:00 AM.


MobileRead.com is a privately owned, operated and funded community.