Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 11-25-2012, 04:17 AM   #31
neuvivlio
Member
neuvivlio began at the beginning.
 
Posts: 15
Karma: 10
Join Date: Nov 2012
Device: none
haven't bean on mirc in years, actually.. but why do you co-relate lousy punctuation with irc?

i'm going to broadcast a bulletin across darknet & undernet, tell them what you said, include your name here, and point to the forum


Quote:
Originally Posted by DSpider View Post
FineReader doesn't do layout. If you rely on it to look good, you're going to get disappointed sooner or later. Think of this program as an extraction tool, because that's really what it's good for. Then you need to come in and redo the layout (InDesign, Word) after matching the fonts, cleaning up the graphics (maybe vectorizing some of them), etc. Don't expect to simply export as ePub and look good. It's not there yet.

I'd say you have a lot to learn. Judging from your posting style (lack of capitalization and punctuation), you probably don't have an eye for detail. This isn't mIRC, you know.
neuvivlio is offline   Reply With Quote
Old 11-27-2012, 03:21 PM   #32
grumbles
Addict
grumbles ought to be getting tired of karma fortunes by now.grumbles ought to be getting tired of karma fortunes by now.grumbles ought to be getting tired of karma fortunes by now.grumbles ought to be getting tired of karma fortunes by now.grumbles ought to be getting tired of karma fortunes by now.grumbles ought to be getting tired of karma fortunes by now.grumbles ought to be getting tired of karma fortunes by now.grumbles ought to be getting tired of karma fortunes by now.grumbles ought to be getting tired of karma fortunes by now.grumbles ought to be getting tired of karma fortunes by now.grumbles ought to be getting tired of karma fortunes by now.
 
grumbles's Avatar
 
Posts: 238
Karma: 1500000
Join Date: Nov 2009
Location: Toronto
Device: Pandigital Novel (Black), T-2 and 3, Nexus 7
I would highly recommend using Scan Tailor to preprocess the images. It will deskew the images (it will also rotate and split two up images as well), select the text area and convert the images to 2bit (black & white) ready for ocr. A very useful program.
grumbles is offline   Reply With Quote
Advert
Old 11-27-2012, 07:29 PM   #33
DSpider
Evangelist
DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.
 
DSpider's Avatar
 
Posts: 450
Karma: 343115
Join Date: Nov 2009
Location: Romania
Device: PW2 2014
I think you mean 1 bit images (black).
DSpider is offline   Reply With Quote
Old 11-28-2012, 05:21 PM   #34
grumbles
Addict
grumbles ought to be getting tired of karma fortunes by now.grumbles ought to be getting tired of karma fortunes by now.grumbles ought to be getting tired of karma fortunes by now.grumbles ought to be getting tired of karma fortunes by now.grumbles ought to be getting tired of karma fortunes by now.grumbles ought to be getting tired of karma fortunes by now.grumbles ought to be getting tired of karma fortunes by now.grumbles ought to be getting tired of karma fortunes by now.grumbles ought to be getting tired of karma fortunes by now.grumbles ought to be getting tired of karma fortunes by now.grumbles ought to be getting tired of karma fortunes by now.
 
grumbles's Avatar
 
Posts: 238
Karma: 1500000
Join Date: Nov 2009
Location: Toronto
Device: Pandigital Novel (Black), T-2 and 3, Nexus 7
Yes, 2 levels (black/white), 1 bit. Anyway, Scan Tailor does a great job. I haven't looked at the source (I don't speak C++) but it appears to look at the relative brightness and contrast. It handles uneven backgrounds quite well.
grumbles is offline   Reply With Quote
Old 11-29-2012, 09:05 AM   #35
ath
Addict
ath doesn't litterath doesn't litter
 
Posts: 222
Karma: 110
Join Date: Jun 2006
Location: Malmo, Sweden
Device: iLiad, Sony PRS-505, Kindle Paperwhite & Oasis
Quote:
Originally Posted by neuvivlio View Post
how could i take this roughly scanned book, and convert the text into nice clean, legible text?
Noone seems to have said much about process, so let me add a few points.

Transcription. This is basically what OCR does, but you need to ensure that it does the right thing. (Look at recent Kindle versions of Ian Banks novels, particularly on pages where conversation-between-Minds is presented: the printed books do indentation in a way similar to epost; many eBooks make this into a mess.) That is, you need to look at typographical presentation in the original, and how you plan to represent this is the resulting text: italics, small caps, quotations, anything.

OCR may mess up end-of-line hyphenation: decide on how you want to handle it: keep it as is, or 'restore' the hyphenated words?

Proofreading of transcription. Exactly what it sounds like. Proofread in a 'good' typeface, where you can clearly see the differences between '1', 'I', and 'l', and so on. (I like Palatino and related faces.) Also look for invisible things: two spaces in a row, confusion between 'O' and '0', dashes of the wrong length, hard line breaks coinciding with visual line breaks, etc. If you worked with your text in a Word-like environment, check paragraph and character formatting, as OCR-produced formats can be off a few points here and there. Spelling-checking tends to go here, but you may need to take the original text into account: old texts don't always use modern spelling.

Layout and formatting. This is where you do your own stuff.

Just don't imagine you won't need to proofread everything again: a book (the result of a publishing process) needs to 'work' in different ways than a text (the result of a transcription process). You may need to add discretionary hyphens and no-break-spaces in critical places, to get what you want, for example.

If you are really careful, you do copy-editing as well, verifying that your source spells and hyphenates words consistently throughout, and perhaps even change old-fashioned 'Mr.' and 'Dr.' etc. to more modern style (without the periods), and so on.

Of course, it all depends on what you plan. If you don't plan for anyone else to read the result, do what you like, But if you do plan for other readers ...

I recently read a recent reissue of a novel by Eric Ambler (The Schirmer Inheritance) on the Kindle . I am still dismayed by the amount of OCR errors that had been allowed to remain in the text. Some were obvious: 'li' where the original had a 'h', 'rn' where the original had an 'm', and so on. And the number of these increased as the book progressed: the proofreaders probably got tired ... It's not an eBook I will return to.
ath is offline   Reply With Quote
Advert
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Page blank before and after book image page osiris12 Sigil 12 05-28-2015 04:27 PM
Need help w/very simple task: page of Word text > Kindle text I can share w/friends kearnine Conversion 1 10-17-2012 08:25 PM
PRS-T1 fist book page when comming out of sleep mode text is faint Tinderbox (UK) Sony Reader 8 01-17-2012 08:13 AM
image on separate page without half-page text next Toxaris ePub 2 01-26-2011 03:32 AM
Question Regarding 2-page Pdf (scanned book) Mholtmeier PDF 7 09-01-2009 06:47 PM


All times are GMT -4. The time now is 02:10 AM.


MobileRead.com is a privately owned, operated and funded community.