Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book General > News

Notices

Reply
 
Thread Tools Search this Thread
Old 08-12-2009, 03:57 AM   #61
Sparrow
Wizard
Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.
 
Posts: 4,395
Karma: 1358132
Join Date: Nov 2007
Location: UK
Device: Palm TX, CyBook Gen3
Quote:
Originally Posted by Ea View Post
Sometimes the OCR software mangles things so much it's uncomprehensible, or you just can't guess the right word.
If it's a public domain book, you could do a search for the text you have in Google Books - the search hits will return text snippets from their books and hopefully they'll contain the missing word.
It'll return text from Snippet view and No Preview books - which you can't ordinarily access (afaik).
I'm having to do this quite a lot for the book I'm proofing at the moment.

E.g.
The PDF I'm using has "and the dresses of the ladies, ....ped about the piano"

Searching Google Books for the text and book name:
"and the dresses of the ladies" Diana Trelawny
I can see the missing text is "as they stood grouped".
Sparrow is offline   Reply With Quote
Old 08-12-2009, 05:28 AM   #62
Ea
Wizard
Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.
 
Ea's Avatar
 
Posts: 3,490
Karma: 5239563
Join Date: Jan 2008
Location: Denmark
Device: Kindle 3|iPad air|iPhone 4S
Quote:
Originally Posted by Sparrow View Post
If it's a public domain book, you could do a search for the text you have in Google Books - the search hits will return text snippets from their books and hopefully they'll contain the missing word.
It'll return text from Snippet view and No Preview books - which you can't ordinarily access (afaik).
No, it's stuff that I own - but then I have the scan and can display it beside the text I'm working on.
Ea is offline   Reply With Quote
Advert
Old 08-12-2009, 05:40 AM   #63
AlexBell
Wizard
AlexBell ought to be getting tired of karma fortunes by now.AlexBell ought to be getting tired of karma fortunes by now.AlexBell ought to be getting tired of karma fortunes by now.AlexBell ought to be getting tired of karma fortunes by now.AlexBell ought to be getting tired of karma fortunes by now.AlexBell ought to be getting tired of karma fortunes by now.AlexBell ought to be getting tired of karma fortunes by now.AlexBell ought to be getting tired of karma fortunes by now.AlexBell ought to be getting tired of karma fortunes by now.AlexBell ought to be getting tired of karma fortunes by now.AlexBell ought to be getting tired of karma fortunes by now.
 
AlexBell's Avatar
 
Posts: 3,413
Karma: 13369310
Join Date: May 2008
Location: Launceston, Tasmania
Device: Sony PRS T3, Kobo Glo, Kindle Touch, iPad, Samsung SB 2 tablet
Quote:
Originally Posted by Ea View Post

On Mac I can do a search and replace of line breaks, so I can remove them - but I haven't been able to find a metod that works in Windows. I remember an article on Lifehacker I read years ago, about reformatting Project Gutenberg books, but I couldn't get any of the suggested methods to work.
I use the inexpensive Atlantis word processor. All one needs to do it select the text from which one wants to remove line breaks, press Ctrl-Shift-U, and the line breaks are gone in seconds. Highly recommended.

Regards, Alex
AlexBell is offline   Reply With Quote
Old 08-12-2009, 05:49 AM   #64
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,556
Karma: 93383099
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Quote:
Originally Posted by Ea View Post
Sometimes the OCR software mangles things so much it's uncomprehensible, or you just can't guess the right word.
What's even more insidious is when OCR is combined with a spell checker, and a "mangled" word has been replaced by another word, which "fits in" to the sentence, and yet is totally wrong.

A good example is the Erksine Childers classic spy story "The Riddle of the Sands". Pretty much all the "free" versions of it, PG included, in a paragraph describing the appearance of the cabin of a boat, include a mysterious reference to "banks of yam". I defy anyone to guess what that really should be, without recourse to an original page scan or a printed copy of the book.

The correct text, in case you're wondering, is "hanks of yarn" .
HarryT is offline   Reply With Quote
Old 08-12-2009, 06:30 AM   #65
sony_fox
Zealot
sony_fox has learned how to buy an e-book online
 
Posts: 109
Karma: 84
Join Date: Jun 2009
Location: Manchester
Device: Kobo Auroa H2O
Couple of points:

Authors' proofing. I follow the blogs of a big name, make neough money to live off it professional genre fiction authors. They spend days manually proofing the 'correct' galleys returned to them by the publishers for final checks. It is time they budget in the writing of any book, so many days for the writing, plus so many extra for the proofing afterwards. What's worse is that by the time they get the galleys they are already mid-flow in a different story, and have to break mindsets to focus on the 'old' story properly. Hard work indeed but part of the job. One is now re-editing corrupted files of backcatalog to sell from their own webportal as ebooks.


What should I do when I find an error. Once recent purchased and DRMd ebook, was mostly ok. Until about page 500 where for the next 50 pages wordswerefrequently runaltogether it was really quite extremely annonying. SHould I contact the author, the seller or the publisher? or all of them?
sony_fox is offline   Reply With Quote
Advert
Old 08-12-2009, 06:32 AM   #66
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,556
Karma: 93383099
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Quote:
Originally Posted by sony_fox View Post
What should I do when I find an error. Once recent purchased and DRMd ebook, was mostly ok. Until about page 500 where for the next 50 pages wordswerefrequently runaltogether it was really quite extremely annonying. SHould I contact the author, the seller or the publisher? or all of them?
Contact the seller initially. Reputable sellers will pass the problem report back to the publisher (who are generally the ones who actually create the eBook).
HarryT is offline   Reply With Quote
Old 08-12-2009, 06:38 AM   #67
Sparrow
Wizard
Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.
 
Posts: 4,395
Karma: 1358132
Join Date: Nov 2007
Location: UK
Device: Palm TX, CyBook Gen3
Quote:
Originally Posted by HarryT View Post
What's even more insidious is when OCR is combined with a spell checker, and a "mangled" word has been replaced by another word, which "fits in" to the sentence, and yet is totally wrong.

A good example is the Erksine Childers classic spy story "The Riddle of the Sands". Pretty much all the "free" versions of it, PG included, in a paragraph describing the appearance of the cabin of a boat, include a mysterious reference to "banks of yam". I defy anyone to guess what that really should be, without recourse to an original page scan or a printed copy of the book.

The correct text, in case you're wondering, is "hanks of yarn" .
I reckon that would be quite easy to figure out.
Given the context, 'yam' is obviously 'yarn'.
'banks' is more of a puzzler - but 'b' for 'h' is a very common OCR error, and about the only correction candidate that comes to mind for 'banks'.
Sparrow is offline   Reply With Quote
Old 08-12-2009, 07:17 AM   #68
Ea
Wizard
Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.
 
Ea's Avatar
 
Posts: 3,490
Karma: 5239563
Join Date: Jan 2008
Location: Denmark
Device: Kindle 3|iPad air|iPhone 4S
[QUOTE=HarryT;551376]What's even more insidious is when OCR is combined with a spell checker, and a "mangled" word has been replaced by another word, which "fits in" to the sentence, and yet is totally wrong.
...[QUOTE]
Yikes! That is bad. It's hard to spot words that the spell checker doesn't catch.

Quote:
Originally Posted by AlexBell View Post
I use the inexpensive Atlantis word processor. All one needs to do it select the text from which one wants to remove line breaks, press Ctrl-Shift-U, and the line breaks are gone in seconds. Highly recommended.

Regards, Alex
But if you had a text where you want to remove single line breaks and replace double empty lines with true paragraph breaks - can you do that?
(I know there's some difference between line break and paragraph break, but I haven't looked into it, so I might be missing some information)
Ea is offline   Reply With Quote
Old 08-12-2009, 07:31 AM   #69
Lemurion
eReader
Lemurion ought to be getting tired of karma fortunes by now.Lemurion ought to be getting tired of karma fortunes by now.Lemurion ought to be getting tired of karma fortunes by now.Lemurion ought to be getting tired of karma fortunes by now.Lemurion ought to be getting tired of karma fortunes by now.Lemurion ought to be getting tired of karma fortunes by now.Lemurion ought to be getting tired of karma fortunes by now.Lemurion ought to be getting tired of karma fortunes by now.Lemurion ought to be getting tired of karma fortunes by now.Lemurion ought to be getting tired of karma fortunes by now.Lemurion ought to be getting tired of karma fortunes by now.
 
Lemurion's Avatar
 
Posts: 2,750
Karma: 4968470
Join Date: Aug 2007
Device: Note 5; PW3; Nook HD+; ChuWi Hi12; iPad
I do a lot of freelance editing, proofing, and light rewriting. Almost everything I've seen people complain about (and I've complained about the same things myself) is stuff that's my job, not the author's.

As has been said before, a lot of this is things the author literally cannot see; their brain fills in what's supposed to be there. So when someone tells the author to correct an ebook they're often asking them to do something that's not their job, that they're uniquely ill-suited for, and that they may not even know about because they often have very little if anything to do with ebook releases.

Yes, many authors will do what they can, but by this point it's out of their hands.
Lemurion is offline   Reply With Quote
Old 08-12-2009, 10:27 AM   #70
slayda
Retired & reading more!
slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.
 
slayda's Avatar
 
Posts: 2,764
Karma: 1884247
Join Date: Sep 2006
Location: North Alabama, USA
Device: Kindle 1, iPad Air 2, iPhone 6S+, Kobo Aura One
[QUOTE=Ea;551462][QUOTE=HarryT;551376]What's even more insidious is when OCR is combined with a spell checker, and a "mangled" word has been replaced by another word, which "fits in" to the sentence, and yet is totally wrong.
...
Quote:
Yikes! That is bad. It's hard to spot words that the spell checker doesn't catch.


But if you had a text where you want to remove single line breaks and replace double empty lines with true paragraph breaks - can you do that?
(I know there's some difference between line break and paragraph break, but I haven't looked into it, so I might be missing some information)
In MS Word they call them "Manual line break" and "Paragraph mark". Look in the "Find and Replace" window (from the "Edit" pulldown menu), click "More", then click "Special" to see what you can "find" and "replace".

There is also a "Show/Hide"button to allow you to see the various formating symbols. You can they see the difference between these two marks.

Maybe this will help or maybe I totally missed the problem.
slayda is offline   Reply With Quote
Old 08-12-2009, 10:34 AM   #71
Ea
Wizard
Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.
 
Ea's Avatar
 
Posts: 3,490
Karma: 5239563
Join Date: Jan 2008
Location: Denmark
Device: Kindle 3|iPad air|iPhone 4S
Quote:
Originally Posted by slayda View Post
...

In MS Word they call them "Manual line break" and "Paragraph mark". Look in the "Find and Replace" window (from the "Edit" pulldown menu), click "More", then click "Special" to see what you can "find" and "replace".

There is also a "Show/Hide"button to allow you to see the various formating symbols. You can they see the difference between these two marks.

Maybe this will help or maybe I totally missed the problem.
This is great
It's not a current problem on my Mac, but it's been bothering me how to handle it, and I may well get a Windows machine next time.
Ea is offline   Reply With Quote
Old 08-12-2009, 11:35 AM   #72
corroonb
Addict
corroonb ought to be getting tired of karma fortunes by now.corroonb ought to be getting tired of karma fortunes by now.corroonb ought to be getting tired of karma fortunes by now.corroonb ought to be getting tired of karma fortunes by now.corroonb ought to be getting tired of karma fortunes by now.corroonb ought to be getting tired of karma fortunes by now.corroonb ought to be getting tired of karma fortunes by now.corroonb ought to be getting tired of karma fortunes by now.corroonb ought to be getting tired of karma fortunes by now.corroonb ought to be getting tired of karma fortunes by now.corroonb ought to be getting tired of karma fortunes by now.
 
corroonb's Avatar
 
Posts: 317
Karma: 1232685
Join Date: Nov 2008
Location: Ireland
Device: Kindle Voyage, Kobo Aura, Nexus 9
A trick I've found with OCR errors is to identify the consistent errors and look for other words that might not be picked up with a spell check. Obviously this only work well if the error occurs all the time as you would expect of an automated process.

For example I had an OCR text that had replaced every cl at the start of a word with d. It was easy to find the words like dothes and doset with a spell checker and do a global replace but I had to search for every word that makes sense with a cl and a d in front of it using a dictionary. And you can't use a global replace with dean/clean or dosed/closed as the context has to be checked.

Apologies if this is obvious.
corroonb is offline   Reply With Quote
Old 08-12-2009, 11:43 AM   #73
Ea
Wizard
Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.
 
Ea's Avatar
 
Posts: 3,490
Karma: 5239563
Join Date: Jan 2008
Location: Denmark
Device: Kindle 3|iPad air|iPhone 4S
Quote:
Originally Posted by corroonb View Post
A trick I've found with OCR errors is to identify the consistent errors and look for other words that might not be picked up with a spell check. Obviously this only work well if the error occurs all the time as you would expect of an automated process.

For example I had an OCR text that had replaced every cl at the start of a word with d. It was easy to find the words like dothes and doset with a spell checker and do a global replace but I had to search for every word that makes sense with a cl and a d in front of it using a dictionary. And you can't use a global replace with dean/clean or dosed/closed as the context has to be checked.

Apologies if this is obvious.
It's a good idea. I've sort of been doing this already, but it's the same as being completely conscious about it, and I've never thought to use a dictionary to help.
Ea is offline   Reply With Quote
Old 08-12-2009, 11:50 AM   #74
starrigger
Jeffrey A. Carver
starrigger ought to be getting tired of karma fortunes by now.starrigger ought to be getting tired of karma fortunes by now.starrigger ought to be getting tired of karma fortunes by now.starrigger ought to be getting tired of karma fortunes by now.starrigger ought to be getting tired of karma fortunes by now.starrigger ought to be getting tired of karma fortunes by now.starrigger ought to be getting tired of karma fortunes by now.starrigger ought to be getting tired of karma fortunes by now.starrigger ought to be getting tired of karma fortunes by now.starrigger ought to be getting tired of karma fortunes by now.starrigger ought to be getting tired of karma fortunes by now.
 
starrigger's Avatar
 
Posts: 1,355
Karma: 1107383
Join Date: Aug 2008
Location: Massachusetts, USA
Device: Lenovo Yoga Tab Plus, Droid phone, Nook HD+
Quote:
Originally Posted by HarryT View Post
Contact the seller initially. Reputable sellers will pass the problem report back to the publisher (who are generally the ones who actually create the eBook).
Actually, it is often the seller, not the publisher, who generates the ebook. Contact the seller. FW, Amazon, whoever.

The error might come from the original file. More likely it came from the conversion.
starrigger is offline   Reply With Quote
Old 08-12-2009, 12:10 PM   #75
tomsem
Grand Sorcerer
tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.
 
Posts: 6,950
Karma: 27060153
Join Date: Apr 2009
Location: USA
Device: iPhone 15PM, Kindle Scribe, iPad mini 6, PocketBook InkPad Color 3
It seems there should be a way to exploit the fact that ebooks are in electronic form, and that we are all connected by the internet. Every reader is a potential proof-reader.

Windows and OS X have a 'crash reporting' mechanism that allows users who experience crashes to send a report to the software publisher in question. Something analogous could be developed for ebooks and built into the reader software (at least for those devices which support annotation). One might even institute a microcredit scheme so that people who report the errors are rewarded in some tangible way. (hmm, publishers could intentionally introduce errors and give credits to the first 100 readers who find it, to encourage this proofing activity..)

So the idea is that users who encounter an error would invoke their reader's 'report an error' function, which would flag the location and allow the user to type a short note as to the nature of the error, the ebook version, the reader's contact info (if they opt in) etc. These error reports would be collected and forwarded or sent directly to the publisher when the device is 'connected' to the internet or tethered to a host computer. The publisher would then resolve the errors, publish a new edition and make it available for download to anyone who owns that title or is purchasing anew. The reader's librarian software could periodically check for and download updates (Amazon already is set up to update titles automatically - maybe a little too automatically in some cases).

eBook marketplaces that institute such a self-correcting system would become preferred to those that do not.
tomsem is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
The Adventures of Joe Nobody and the Badly Formatted Epub mklynds Sigil 44 01-30-2013 02:43 PM
Classic Bought a Badly Formatted Book From B&N lionel47 Barnes & Noble NOOK 11 05-22-2010 04:31 PM
Unutterably Silly How To Write Badly Well Madam Broshkina Lounge 4 11-04-2009 08:26 AM
battery question (I let it drain really badly) rheostaticsfan Bookeen 5 11-01-2008 03:21 PM
Bricked iLiad after badly done reflash ? Pode iRex 6 05-19-2008 03:42 PM


All times are GMT -4. The time now is 06:51 PM.


MobileRead.com is a privately owned, operated and funded community.