Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 02-01-2015, 07:58 AM   #1
rumpumpel1
Junior Member
rumpumpel1 began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Nov 2011
Device: PRS-T1
best HW + SW to convert books to epub

Hi,

what's currently the best Hardware and Software to scan books and to convert them to epub with little effort ?

The books have no fancy layout, no images, just plain text and the books may be taken apart.
rumpumpel1 is offline   Reply With Quote
Old 02-01-2015, 08:09 AM   #2
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383099
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Moved to the "Workshop" forum.
HarryT is offline   Reply With Quote
Advert
Old 02-01-2015, 08:12 AM   #3
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383099
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
If you really mean "best" as in "price is not a consideration", then you should be looking at a device such as this:

http://www.imageaccess.de/?page=Scan...V2Professional

This is the type of device that professional scanning bureaux use.

If price is a consideration, and you can destroy the books, then a scanner with an automated sheet feeder is probably what you want to be looking at. Something like this:

http://www.amazon.co.uk/Fujitsu-Scan.../dp/B001VGJ7JM

Last edited by HarryT; 02-01-2015 at 08:18 AM.
HarryT is offline   Reply With Quote
Old 02-01-2015, 10:02 AM   #4
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
Quote:
Originally Posted by rumpumpel1 View Post
to convert them to epub with little effort ?
No can do. It is either fast and sloppy or slow and thorough. (A lot of) Effort is required.
Toxaris is offline   Reply With Quote
Old 02-01-2015, 10:55 AM   #5
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383099
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Quote:
Originally Posted by Toxaris View Post
No can do. It is either fast and sloppy or slow and thorough. (A lot of) Effort is required.
A decent OCR program like Abbyy FineReader will give you pretty good results (as in perhaps an error per page), and that may be acceptable for casual reading purposes, but you are of course right in saying that proof-reading is essential if you want an error-free book.
HarryT is offline   Reply With Quote
Advert
Old 02-01-2015, 11:16 AM   #6
rumpumpel1
Junior Member
rumpumpel1 began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Nov 2011
Device: PRS-T1
that's amazing: even for simple layouts and plain text there is a remaining error rate of one error per page ? What kind of errors are these ? Can they be corrected with a spell checker of a decent office program?
rumpumpel1 is offline   Reply With Quote
Old 02-01-2015, 11:35 AM   #7
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383099
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Quote:
Originally Posted by rumpumpel1 View Post
that's amazing: even for simple layouts and plain text there is a remaining error rate of one error per page ? What kind of errors are these ? Can they be corrected with a spell checker of a decent office program?
These are "normal" OCR errors, where the shapes of letters look similar, eg "clock" instead of "dock" ("cl" and "d" are very difficult for OCR to tell apart). A spell-checker won't help, because they are real words - just not the right word.

A decent OCR program has an accuracy rate of better than 99.9%, but a typical page has around 2000 characters on it, so that means about 2 character errors per page. Some of these the OCR program's spell-checker will fix for you, but some it will get wrong.
HarryT is offline   Reply With Quote
Old 02-01-2015, 11:37 AM   #8
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
It really depends on the source. Some errors can be found by a simple spell checker, but some not of course. That is one of the reasons I started to create my tools to remove a lot of OCR errors. It can be spelling, but also punctuation that is going wrong. Not to mention styling an others. A lot cannot be found with the standard tools.
The better the source (and scan), the better the results of the OCR program. ABBYY is doing a good job is the scan is good.
Toxaris is offline   Reply With Quote
Old 02-01-2015, 04:57 PM   #9
dwig
Wizard
dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.
 
dwig's Avatar
 
Posts: 1,613
Karma: 6718541
Join Date: Dec 2004
Location: Paradise (Key West, FL)
Device: Current:Surface Go & Kindle 3 - Retired: DellV8p, Clie UX50, ...
Quote:
Originally Posted by Toxaris View Post
It really depends on the source. Some errors can be found by a simple spell checker, but some not of course. ...
Also, the spell checker can't be relied on to auto-correct with more than only a moderately decent accuracy which means manual correction of each found error.

Also, the error rate for OCR software drops significantly when working from scans of books in poor condition (e.g. foxing, stains, yellowing, ...), printed poorly, and/or printed on poor quality paper.
dwig is offline   Reply With Quote
Old 02-02-2015, 12:10 AM   #10
cromag
Surfin the alpha waves ~~
cromag ought to be getting tired of karma fortunes by now.cromag ought to be getting tired of karma fortunes by now.cromag ought to be getting tired of karma fortunes by now.cromag ought to be getting tired of karma fortunes by now.cromag ought to be getting tired of karma fortunes by now.cromag ought to be getting tired of karma fortunes by now.cromag ought to be getting tired of karma fortunes by now.cromag ought to be getting tired of karma fortunes by now.cromag ought to be getting tired of karma fortunes by now.cromag ought to be getting tired of karma fortunes by now.cromag ought to be getting tired of karma fortunes by now.
 
cromag's Avatar
 
Posts: 26,281
Karma: 459765791
Join Date: Dec 2010
Location: New Jersey
Device: Jetbook Lite & Mini, Nook STR, Kobo, Hanvon N516, Kindle 2, Androids
Also, slight imperfections in the paper -- dark spots, a stray fiber, etc. -- can be mistaken for punctuation marks like periods and commas.
cromag is offline   Reply With Quote
Old 02-05-2015, 07:32 PM   #11
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 11,503
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
Quote:
Originally Posted by HarryT View Post
These are "normal" OCR errors, where the shapes of letters look similar, eg "clock" instead of "dock" ("cl" and "d" are very difficult for OCR to tell apart). A spell-checker won't help, because they are real words - just not the right word.

A decent OCR program has an accuracy rate of better than 99.9%, but a typical page has around 2000 characters on it, so that means about 2 character errors per page. Some of these the OCR program's spell-checker will fix for you, but some it will get wrong.
Exactly. Given how well-used Abbyy is, it's almost ALWAYS things like hat for fiat, and the like. No spellchecker will find those. And formatting errors? A BOATLOAD more than 1-2 per page. Almost all the "work" is in fixing the formatting, cleaning up the text, removing spans, and all that.

This part, from rumpumple1:

Quote:
...and to convert them to epub with little effort
Simply doesn't exist. There's no such thing. The only material that can be "convert[ed] to epub with little effort" are those files that are clean to begin with--which means, what you've done between the Scan/OCR and the time you start to actually make the ePUB.

Hitch
Hitch is offline   Reply With Quote
Old 02-06-2015, 03:10 AM   #12
GrannyGrump
Obsessively Dedicated...
GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.
 
GrannyGrump's Avatar
 
Posts: 3,221
Karma: 35037583
Join Date: May 2011
Location: PA {back in the usa!}
Device: Sony PRS-T2, ADE on PC
Unless, of course, you forego OCR entirely, and simply go with images of the printed page only. Of course, that way you lose the capability of reflow, search, annotating, all the things you can do with text that are not possible with images. A mighty gloomy result, I think. (There are quite a number of epubs out in the wild that are made like this, recognizable before reading by their HUGE file size. They are really like a pdf in disguise.)

Last edited by GrannyGrump; 02-06-2015 at 03:12 AM.
GrannyGrump is offline   Reply With Quote
Old 02-06-2015, 03:34 AM   #13
Ghitulescu
Fanatic
Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.Ghitulescu ought to be getting tired of karma fortunes by now.
 
Posts: 563
Karma: 403106
Join Date: Aug 2014
Device: PRS-T1
Quote:
Originally Posted by rumpumpel1 View Post
that's amazing: even for simple layouts and plain text there is a remaining error rate of one error per page ? What kind of errors are these ? Can they be corrected with a spell checker of a decent office program?
Unfortunately, you HAVE TO WORK, unless you're using low standards of quality.
Since most people are different, they have different needs and perceptions, and therefore there are zillions of TVs, computers, cars, etc. because every single human wants a feature more than another feature (like colour red for cars ). Therefore an automatic procedure, designed to fit statistically all customers won't satisfy all.
Ghitulescu is offline   Reply With Quote
Old 02-06-2015, 04:08 AM   #14
GrannyGrump
Obsessively Dedicated...
GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.
 
GrannyGrump's Avatar
 
Posts: 3,221
Karma: 35037583
Join Date: May 2011
Location: PA {back in the usa!}
Device: Sony PRS-T2, ADE on PC
rumpumpel1 queried:
Quote:
What kind of errors are these ? Can they be corrected with a spell checker of a decent office program?
A spell-checker can't help when the OCR interprets a dirt-spot as a colon. It can not help when the OCR gives a legitimately spelled word that is the *wrong* word, as in Harry's example of "clock" versus "dock". Other frequent OCR errors include "die" for "the", digit 1 instead of lower-case L or upper-case I (or any other mis-match combination of these three characters), the letters "rn" (that is lower-case R N ) turn out as "m" (that is lower-case M), etc, etc. Italics add more spice to the stew.

A *grammar-checker*, such as Microsoft Word provides, can help to some extent when it recognizes that a word is blatantly wrong for the containing sentence. Unfortunately, it is far from perfect still; and mostly concentrates on punctuation errors.

To get good results, you will have to physically proof-read the scan results, comparing against the printed page.

Last edited by GrannyGrump; 02-06-2015 at 04:15 AM.
GrannyGrump is offline   Reply With Quote
Old 02-06-2015, 04:18 AM   #15
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383099
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Quote:
Originally Posted by GrannyGrump View Post
the letters "rn" (that is lower-case R N ) turn out as "m" (that is lower-case M), etc, etc.
An example of this is that I'm currently proof-reading the "J G Reeder" detective stories of Edgar Wallace for the MR library, and a speech affectation of his is to use the word "um" a lot to indicate pauses in his speech. On at least half the occasions in the PG text I'm proofing from, "um" is spelt "urn".
HarryT is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Want to automatically convert books to epub on addition pinky62 Library Management 1 11-01-2014 03:58 AM
How well do comic books (CBRs/CBZs) convert to ePUB? mcandre Conversion 1 12-15-2012 08:27 PM
Convert Kindle books to Epub? polli Amazon Kindle 21 03-23-2012 09:00 AM
Convert DRM books to Epub/other tajreed General Discussions 6 03-31-2010 06:27 PM
how to get epub/fb2 books or convert best option M9x3mos ePub 2 02-19-2009 12:13 PM


All times are GMT -4. The time now is 10:56 AM.


MobileRead.com is a privately owned, operated and funded community.