Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 09-30-2014, 09:04 AM   #16
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
That is in addition to getting the css straight, modifying it so it will fit in the space of an ereader, shrinking pictures so they don't overlap pages etc. All of which shows that the output of PDF to epub places can't be useable, even at extremely low wages.

For me it is really hard to stick with it for more than an hour or so. My mind begins to wander. (or wonder why I am putting myself through this!) (LOL) I often work a couple of hours in the morning and move on to other things. It takes several weeks.
mrmikel is offline   Reply With Quote
Old 09-30-2014, 09:05 AM   #17
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383099
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Quote:
Originally Posted by mrmikel View Post
For me it is really hard to stick with it for more than an hour or so. My mind begins to wander. (or wonder why I am putting myself through this!) (LOL) I often work a couple of hours in the morning and move on to other things. It takes several weeks.
Absolutely. I do about an hour a day. No more than that. It's impossible to maintain concentration for longer, for me.
HarryT is offline   Reply With Quote
Old 09-30-2014, 10:06 AM   #18
rkomar
Wizard
rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.
 
Posts: 3,054
Karma: 18821071
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
Quote:
Originally Posted by HarryT View Post
Precisely. Errors such as "dock" instead of "clock", "comer" instead of "corner", etc, are commonplace, and spell-checkers won't find them. The only way to find such errors (and I must politely disagree with Hitch's assertion that nobody does so ) is to do a word by word manual comparison of the original document with the OCR'd text. This is extremely labour-intensive: I've had years of practice at it, and I reckon I can proof-read around about 15 pages an hour with a typical novel, so that would be about 33h work for a 500-page book.
I wonder if speed readers could catch those kinds of errors efficiently? I'm a slow reader, and those errors make my comprehension engine go off the rails and put a halt to the reading process immediately. It would be interesting to know if speed readers experience the same crash, or if their comprehension engine doesn't run on a single track like mine.

There was a thread recently about using text to speech set at a high rate to quickly get through books. Perhaps that would be useful for finding wrong words in the text speedily.

P.S. I made up the phrase "comprehension engine" because I have no idea what that part of the process is called.

Last edited by rkomar; 09-30-2014 at 10:08 AM.
rkomar is offline   Reply With Quote
Old 09-30-2014, 10:12 AM   #19
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
I don't think so. I suspect that unless they have a "photographic" memory, speed readers take in whole words without really looking at them. Good enough for reading many things, but not good enough for proofreading. There are a number of words that are very similar in appearance, but not the same in meaning.

There might be something to be said for recognition by different OCR engines and comparing them, but then again maybe not.
mrmikel is offline   Reply With Quote
Old 09-30-2014, 06:52 PM   #20
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 11,503
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
Quote:
Originally Posted by HarryT View Post
Precisely. Errors such as "dock" instead of "clock", "comer" instead of "corner", etc, are commonplace, and spell-checkers won't find them. The only way to find such errors (and I must politely disagree with Hitch's assertion that nobody does so ) is to do a word by word manual comparison of the original document with the OCR'd text. This is extremely labour-intensive: I've had years of practice at it, and I reckon I can proof-read around about 15 pages an hour with a typical novel, so that would be about 33h work for a 500-page book.
Harry, you wound me.

I didn't mean, the REGULARS. (Or, in the case of this group, The Irregulars). I meant, the drop-ins. {sniffle, wounded feelings}. OF COURSE The MR Irregulars do it!!! I meant, those that hit-and-run here, looking for the silver bullet.

Hitch
Hitch is offline   Reply With Quote
Old 10-01-2014, 02:16 AM   #21
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383099
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
I know, Hitch. Just pulling your leg .
HarryT is offline   Reply With Quote
Old 10-01-2014, 02:44 AM   #22
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 11,503
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
Quote:
Originally Posted by HarryT View Post
I know, Hitch. Just pulling your leg .
Ditto.

Hitch
Hitch is offline   Reply With Quote
Old 10-01-2014, 03:36 AM   #23
Ghitulescu
Fanatic
Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.
 
Posts: 563
Karma: 403106
Join Date: Aug 2014
Device: PRS-T1
Quote:
Originally Posted by Hitch View Post
ITake a high-quality scan, a good A/B, run it through Toxaris' program, and you have a very, very high quality starting place.
Again my friend you're talking about English and its 26 letters. Have you ever, to give an example, tried to scan Polish or Hungarian books? I am sure no. And even there there are errors that need human proofing, like I and l (capital i and small L). I know there are programs that can learn the characters/glyphs but still have the English rules (there are languages where "i" is written as such, for instance).
Ghitulescu is offline   Reply With Quote
Old 10-01-2014, 04:11 AM   #24
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
Quote:
Originally Posted by Ghitulescu View Post
Again my friend you're talking about English and its 26 letters. Have you ever, to give an example, tried to scan Polish or Hungarian books? I am sure no. And even there there are errors that need human proofing, like I and l (capital i and small L). I know there are programs that can learn the characters/glyphs but still have the English rules (there are languages where "i" is written as such, for instance).
If you take the time to let the OCR program learn, the quality goes up big time. If I have a new OCR program, I usually put it in learn mode for at least 3-5 pages for each book. After about 10 books, it has learned enough and the number of OCR errors are few. For diacritics it is of course important that not only the scan is of reasonable quality, but it is also very dependent on the source. That is why I scan at 400 dpi (we have diacritics, but not so much). My program that Hitch is talking about will help you catch a lot of OCR errors, regardless of language. You can easily add your own S/R actions for common OCR errors for that procedure. There are much more procedures and checks in the tool to help you more.
Toxaris is offline   Reply With Quote
Old 10-01-2014, 04:20 AM   #25
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383099
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Quote:
Originally Posted by Ghitulescu View Post
Again my friend you're talking about English and its 26 letters. Have you ever, to give an example, tried to scan Polish or Hungarian books? I am sure no. And even there there are errors that need human proofing, like I and l (capital i and small L). I know there are programs that can learn the characters/glyphs but still have the English rules (there are languages where "i" is written as such, for instance).
Which is precisely why a good OCR program, like Abbyy FineReader, knows about different languages: you tell it what language the book you're scanning is written in, and it adapts its interpretation of the text accordingly.

You have to accept the fact that no OCR program is perfect, and that proof-reading is always going to be essential, but good OCR programs are very good indeed, and should certainly (for novels at least) get you to the "one error every few pages" level of accuracy, which is really all that you can hope for.
HarryT is offline   Reply With Quote
Old 10-01-2014, 04:28 AM   #26
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 11,503
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
Quote:
Originally Posted by Ghitulescu View Post
Again my friend you're talking about English and its 26 letters. Have you ever, to give an example, tried to scan Polish or Hungarian books? I am sure no. And even there there are errors that need human proofing, like I and l (capital i and small L). I know there are programs that can learn the characters/glyphs but still have the English rules (there are languages where "i" is written as such, for instance).
In fact, I have. Romanian, for one. No, no scan program is "perfect," but Abbyy Finereader comes as close as it gets, in my humble opinion. All that program "knows" is characters that it LEARNS. It's not perfect the first time you use it, not even in English. But...nothing is, as Harry says. And I know that Tox scans in languages other than English all the time.

Hitch
Hitch is offline   Reply With Quote
Old 10-02-2014, 01:23 AM   #27
adrenaline
Enthusiast
adrenaline solves Fermat’s last theorem while doing the crossword.adrenaline solves Fermat’s last theorem while doing the crossword.adrenaline solves Fermat’s last theorem while doing the crossword.adrenaline solves Fermat’s last theorem while doing the crossword.adrenaline solves Fermat’s last theorem while doing the crossword.adrenaline solves Fermat’s last theorem while doing the crossword.adrenaline solves Fermat’s last theorem while doing the crossword.adrenaline solves Fermat’s last theorem while doing the crossword.adrenaline solves Fermat’s last theorem while doing the crossword.adrenaline solves Fermat’s last theorem while doing the crossword.adrenaline solves Fermat’s last theorem while doing the crossword.
 
Posts: 43
Karma: 28554
Join Date: Mar 2013
Device: Kindle Keyboard, KPW2
Didn't expect these many responses.. Thanks so much everyone.

Just to confirm these:

1. Can I continue with 1dollarscan's 600 dpi scans for my books that are similar?

2. Anything beyond that dpi is an overkill, right?

Thanks again for your time..
adrenaline is offline   Reply With Quote
Old 10-02-2014, 07:23 AM   #28
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
BTW Finereader's Tools, Options, Read allows you to set just what you want it to do in terms of learning.
mrmikel is offline   Reply With Quote
Old 10-02-2014, 09:48 PM   #29
rkomar
Wizard
rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.
 
Posts: 3,054
Karma: 18821071
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
Quote:
Originally Posted by adrenaline View Post
Didn't expect these many responses.. Thanks so much everyone.

Just to confirm these:

1. Can I continue with 1dollarscan's 600 dpi scans for my books that are similar?

2. Anything beyond that dpi is an overkill, right?

Thanks again for your time..
It depends on the text being scanned. A small font would benefit from a 1200 dpi scanning resolution, whereas a normal font probably wouldn't. If the text has been printed such that the letters are close together, then you need to scan at a resolution high enough to make sure that the letters remain separated in the images. Letters that end up touching in the scans lead to lots of errors when doing OCR. The majority of books will be fine when scanned at 600 dpi, but the ones with dense printing (to economize on the number of pages) might be better scanned at 1200 dpi. You may want to go through the books you want scanned, and sort the books into 600 and 1200 dpi piles. 1200 dpi for all books would certainly be overkill.
rkomar is offline   Reply With Quote
Old 10-03-2014, 10:06 PM   #30
adrenaline
Enthusiast
adrenaline solves Fermat’s last theorem while doing the crossword.adrenaline solves Fermat’s last theorem while doing the crossword.adrenaline solves Fermat’s last theorem while doing the crossword.adrenaline solves Fermat’s last theorem while doing the crossword.adrenaline solves Fermat’s last theorem while doing the crossword.adrenaline solves Fermat’s last theorem while doing the crossword.adrenaline solves Fermat’s last theorem while doing the crossword.adrenaline solves Fermat’s last theorem while doing the crossword.adrenaline solves Fermat’s last theorem while doing the crossword.adrenaline solves Fermat’s last theorem while doing the crossword.adrenaline solves Fermat’s last theorem while doing the crossword.
 
Posts: 43
Karma: 28554
Join Date: Mar 2013
Device: Kindle Keyboard, KPW2
Quote:
Originally Posted by rkomar View Post
It depends on the text being scanned. A small font would benefit from a 1200 dpi scanning resolution, whereas a normal font probably wouldn't. If the text has been printed such that the letters are close together, then you need to scan at a resolution high enough to make sure that the letters remain separated in the images. Letters that end up touching in the scans lead to lots of errors when doing OCR. The majority of books will be fine when scanned at 600 dpi, but the ones with dense printing (to economize on the number of pages) might be better scanned at 1200 dpi. You may want to go through the books you want scanned, and sort the books into 600 and 1200 dpi piles. 1200 dpi for all books would certainly be overkill.
Thanks a ton, rkomar. You may seen this in the OP but just in case, here's a sample of the 600dpi scan:

https://www.dropbox.com/s/j18r16ed7t...0Page.pdf?dl=0

I feel that this book isn't too dense. Would love to hear your thoughts on this.

Thanks again.
adrenaline is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Any ever use a book scanning service like 1dollarscan.com? apastuszak General Discussions 6 06-22-2014 10:38 AM
Converting large book from azw3 to epub failes gameman Conversion 5 12-15-2013 09:10 AM
truncation problem converting mobi book to epub Joe9O Conversion 3 02-08-2013 10:40 AM
Converting from a 1DollarScan pdf (saved as word doc) BeccaPrice Conversion 4 01-07-2013 08:14 AM
scanned book to epub langmarp General Discussions 3 06-28-2010 08:44 AM


All times are GMT -4. The time now is 02:43 AM.


MobileRead.com is a privately owned, operated and funded community.