View Full Version : Typos in ebooks


raac
04-03-2010, 04:30 PM
Hi,

I don't know if this has already been discussed, but aren't others also frustrated at the numerous typos in retail e-books? I have seen numerous mistakes in books from both Sony and Amazon. This doesn't happen with paper books and so I feel somewhat ripped off when it happens with ebooks.

How do others feel? Any tips on getting the publishers or sellers to take notice? Maybe it's time to mount a mass complaint?

mr ploppy
04-03-2010, 05:05 PM
I always have a bit of paper next to the bed to write down any mistakes I find, then fix them in Sigil later for the next time I read them. I did ask one of the publishers once if they wanted the fixed copy, but they didn't reply.

raac
04-03-2010, 06:40 PM
I've done something similar before. However, it shouldn't be the customer's job to proof the book. Furthermore, I don't re-read everything I buy so it still ruins the reading experience.

SensualPoet
04-03-2010, 09:08 PM
Just for my own edification ... are we talking about "real" books here -- Harper Collins, Penguin, Random House et al -- as opposed to "Google Books" and public domain material?

Occasionally some gaffe turns up even in commercial paper product but these are rare. Is that frequency we are talking about? Or are commercial e-books simply shoddier than their direct hard cover and paper back counterparts?

LDBoblo
04-03-2010, 09:35 PM
Just for my own edification ... are we talking about "real" books here -- Harper Collins, Penguin, Random House et al -- as opposed to "Google Books" and public domain material?

Occasionally some gaffe turns up even in commercial paper product but these are rare. Is that frequency we are talking about? Or are commercial e-books simply shoddier than their direct hard cover and paper back counterparts?
From my experience with both free and commercial ebooks, they're simply shoddier. I not too long ago went through a file that was around 600 pages on my reader, and every time I came across a typo, I bookmarked it so I could fix it up when stripping it and converting to PDF. By the time I got to the end of the book, I checked the bookmark count: 126.

The overall rate is lower, but still makes paperbacks look great by comparison.

riemann42
04-03-2010, 10:40 PM
Some commercial eBooks seem to have OCR mistakes, but this is rare with new titles. The most common mistake I see is a word broken where a hyphen should be, or a hyphen inserted where one shouldn't be. I see this even in new books. I think this happens when a human goes back and manually corrects the H&J on the layout for the physical book, and the script they use to convert to OEBPS doesn't handle it correctly.

I imagine there's at least a couple of employees at every big publishing house who go back through and touch up a book "incorrectly," thinking they are making the final product have better quality.

I'm also betting that very little proof-reading is done on most eBooks, with the thinking that the proof reading for the physical books was enough.

AlexBell
04-04-2010, 03:12 AM
I too have been angry and frustrated by the shoddy formatting found in many commercial ebooks. I had a frank exchange of views a week or so ago with HarperCollins Australia about 'Flashman' by George MacDonald Fraser. I won't go into the details now, but I told them that I would be ashamed to put out something as poorly produced over my signature, and that I certainly would not have bought the book if I had known how badly produced it was. I did get a response, but only that the book was produced by HarperCollins UK, and they would send the email to them.

I have a suggestion, but it depends on whether commercial publishers read MobileRead Forums, and whether they give a damn anyway.

Suppose each time any of us finds a particularly badly produced book we notify each other through this forum to warn others not to buy that book - something like a 'Hall of Shame' - and copy the details to the publisher, and mention that the comment has been posted here.

We might even need a special subforum for the Hall of Shame. How do people feel about the idea? Would the moderators permit this?

Regards, Alex

JSWolf
04-04-2010, 08:29 AM
I read Transfer of Power by Vince Flynn recently and I found a lot of errors that should have been avoided. It's like the books was converted from a PDF copy.

SensualPoet
04-04-2010, 09:45 AM
I believe the comments here about shoddy commercial e-book production; I just don't understand it -- that is, I'm shocked and surprised.

Wouldn't all "recent" titles exist as some sort of text-based electronic file? Surely anything that's been in the catalogue consistently since the latter 1980s exists digitally -- and not as pdf images that have to be OCR'd to create an e-book.

Michael J Hunt
04-04-2010, 09:49 AM
I've just posted the following on the 'smell of books' thread and I think it would be just as appropriate for it to be here as well:

I'm an ignoramus about e-book readers as I don't own one. But I do have three e-books 'out there' and I'm curious to know how they appear on (or in) an e-reader. When I look at the pdf versions on my computer screen there's absolutely nothing about them that is different to their paper counterparts. Am I being naive, but aren't all books published in both forms identical? Could this only be true for books that are published with the intention of their appearing in both forms? Also, is it possible for unwanted errors to creep in during conversion from one format to another? If this is so, surely the publisher should subject the final pdf version to a proof-read before he sells it.

As an editor I'm probably more aware than most people of typos in professionally produced manuscripts. If there's only one, or perhaps two, in a book, I consider that to be not sufficient to put me off a book. If there are more than two, I find I continue to read with half my mind fixed on when I'm going to find the next. So, for me, errors can be distracting.

HarryT
04-04-2010, 09:49 AM
Wouldn't all "recent" titles exist as some sort of text-based electronic file? Surely anything that's been in the catalogue consistently since the latter 1980s exists digitally -- and not as pdf images that have to be OCR'd to create an e-book.

You might have thought so, but you'd be surprised. Publishers are an ultra-conservative lot. Although manuscripts have been submitted electronically for some years, many publishers did not retain electronic versions of the books after typesetting them, and hence have to OCR even fairly recent books in order to create an eBook. Hence lots of OCR errors.

Newly-published books should really not have this problem.

ATimson
04-04-2010, 10:00 AM
I read Transfer of Power by Vince Flynn recently and I found a lot of errors that should have been avoided. It's like the books was converted from a PDF copy.
Well... the books are probably converted from the laid-out-in-InDesign version, rather than the manuscript version (otherwise they'd be missing any revisions made from the page proofs, etc.).

If they're not doing the layout in InDesign "right", so that its ePub export knows how to handle the line-break/hyphenation issues that crop up in a PDF, I'd expect to see similar errors. (And the resulting PDF ebook probably fails at reflowing properly on mobile devices, too.)

mr ploppy
04-04-2010, 10:01 AM
Just for my own edification ... are we talking about "real" books here -- Harper Collins, Penguin, Random House et al -- as opposed to "Google Books" and public domain material?

Occasionally some gaffe turns up even in commercial paper product but these are rare. Is that frequency we are talking about? Or are commercial e-books simply shoddier than their direct hard cover and paper back counterparts?

The last paid for ebook I read had mistakes on average every 4 or 5 pages. Ironically enough, the fan-made ebooks I read hardly ever seem to have any mistakes at all (though I never go for the initial "release").

A few examples from the last book I read that I haven't got around to fixing yet (there was lots and lots like this that I have already fixed and thrown the bits of paper away):

one of science team = one of the science team
I kind have to go fight - I kinda (or kind of) ...
but there dozens of people sleeping = but there was ...
Youre sure theyre there but you can hear anything = can't hear
could she haven fallen back in love = could she have
A lot them are true believers = a lot of them are ...
or maybe ever the top dog in Homeland = maybe even ...
not wasting to make a scene = not wanting to ...

HarryT
04-04-2010, 10:02 AM
Well... the books are probably converted from the laid-out-in-InDesign version, rather than the manuscript version (otherwise they'd be missing any revisions made from the page proofs, etc.).

If they're not doing the layout in InDesign "right", so that its ePub export knows how to handle the line-break/hyphenation issues that crop up in a PDF, I'd expect to see similar errors. (And the resulting PDF ebook probably fails at reflowing properly on mobile devices, too.)

Vince Flynn's books do seem to have a particularly horrible problem. I bought one not so long ago for my Kindle, and it was missing all the quotation marks, dashes, and full stops. I kept it, because it was a good story, but the almost complete lack of punctuation made it a "challenge" to read, especially the dialogue parts of it.

ficbot
04-04-2010, 10:09 AM
I am reading 'Ceremony in Death' by J.D. Robb right now and there are random spaces in the middle of words. I a halfway through and this has happened a dozen or so times. I read other books in the series and they did not have this issue. It is a mobipocket file so I am not sure how I could edit it. I purchased the book from Fictionwise. This is one reason the at-paper prices are imho unfair. You are not getting a product of the same quality.

HarryT
04-04-2010, 10:11 AM
I am reading 'Ceremony in Death' by J.D. Robb right now and there are random spaces in the middle of words.

That sounds like an issue with "soft hyphens". Unfortunately it's all too common.

ATimson
04-04-2010, 10:11 AM
Vince Flynn's books do seem to have a particularly horrible problem. I bought one not so long ago for my Kindle, and it was missing all the quotation marks, dashes, and full stops. I kept it, because it was a good story, but the almost complete lack of punctuation made it a "challenge" to read, especially the dialogue parts of it.
Wow. That's... special. I could understand the problems with the quotes or dashes, if the book's doing something strange like requesting high-Unicode versions of those characters that aren't supported by your device (haven't looked at the source file for your book, naturally, so I don't know if that applies :)).

But missing periods? That takes talent. :smack:

raac
04-04-2010, 10:13 AM
Yes, I'm talking about commercial books including new titles.
I'm currently reading "A User's Guide to the Universe: Surviving the Perils of Black Holes, Time Paradoxes, and Quantum Uncertainty." from Amazon. This book has just come out on hardback. Yesterday I was reading a section involving coin flips and randomness. Half way through the page, I find this:
"... When you Hip a coin a million times, you'll very likely Hip heads... ...If Louie and Dave Hip..." (location 1106)

And earlier in the book:
"Much like a pot of water, a particular can bubble into the vacuum, with the caveat that it doesn't last very long." (location 845) I assume they mean a "particle"

There are other mistakes, I've just not made notes of them. The first set (flips => Dips) looks like an OCR error. WTF are they doing with OCR on a new book?

I read the whole Narnia series from Amazon and found minor mistakes in most books, particularly in The Last Battle. In Dr. Golem from the Sony book store there were several typos and the text was badly formatted.

How do we go about setting up a Hall of Shame?

HarryT
04-04-2010, 10:14 AM
Wow. That's... special. I could understand the problems with the quotes or dashes, if the book's doing something strange like requesting high-Unicode versions of those characters that aren't supported by your device (haven't looked at the source file for your book, naturally, so I don't know if that applies :)).

But missing periods? That takes talent. :smack:

It was one of his first books, so had almost certainly been OCR'd. The strange thing was, though, that it wasn't just some of the quotation marks that were missing - it was every single one, and the same with the dashes and full stops. As you say, something had evidently gone very seriously wrong during the process of eBook creation.

rhadin
04-04-2010, 11:01 AM
There are lots of shades of gray that raise their head when it comes to typos. Some are a result of shoddy conversion; most, I think, are a result of lack of editing. I particularly find this to be true in self-published ebooks and in ebooks from very small publishers. Some examples of what I have seen in ebooks I have bought are found in On Words & eBooks: Give Me a Brake! (http://americaneditor.wordpress.com/2010/03/04/on-words-and-ebooks-give-me-a-brake/) The problem is likely to increase as anyone with a computer, a word processing program, and a yen to write publishes their work through outfits like Smashwords and Manybooks. I suspect that this is the curse of the Internet Age and a symptom of the decline in education standards, along with publishers OCRing but not proofreading.

Strether
04-04-2010, 12:01 PM
Having heard about Steven Saylor's mysteries set in ancient Rome, I recently bought the first in the series, Roman Blood, from Amazon. Alas, it turned out to be a Topaz-formatted book. Very ugly and with slow-turning pages, but obviously had been scanned and the errors were legion. Especially hyphenated words that had come at the end of a line in the printed text, but were inappropriate if they came anywhere else, which they did in the ebook. I could only read it for 3 or 4 chapters, then stopped and asked Customer Service to refund my money, which they obligingly did.

Normally, I'd be on the lookout for Topaz books, but this one snuck in on me. Looking at the rest of the series, I see they're all formatted the same way. I wonder if the author knows/cares that they're so poorly produced?

Jim

DJHARKAVY
04-04-2010, 02:38 PM
My wife is a freelance proofreader, and she gets pretty much all books as pdf's these days and makes corrections to them.

If they have the book as a digital copy, and they proof it as a digital copy, why should there be major errors in the digital version?

raac
04-04-2010, 03:12 PM
I have contacted Amazon. They accepted details of the errors and say that they will get the publisher to correct them. They offered to either give a $5 voucher and I keep the book, or allow me to return the book for a full refund.

Harryplopper
04-04-2010, 06:44 PM
The first ebook I bought for my new Sony 600 was Hitchhiker's Guide to the Galaxy. It had many typos. The most annoying was that every word that had a "ff" in it, the first f was replaced with a space. E fective, e fort, e ficient throughout the whole book. Great story, lousy ebook. :angry:

On the other side, I want to praise George R R Martin's Song of Ice and Fire ebooks. They had almost no errors. :2thumbsup

JSWolf
04-04-2010, 06:50 PM
Vince Flynn's books do seem to have a particularly horrible problem. I bought one not so long ago for my Kindle, and it was missing all the quotation marks, dashes, and full stops. I kept it, because it was a good story, but the almost complete lack of punctuation made it a "challenge" to read, especially the dialogue parts of it.

Yes, I did end up having to return that book to Kobo because of all the errors. I did manage to get a Mobipocket edition that seems to be OK enough that I can convert it to ePub. But it should never have been released with all those errors. The problem I had was trying to decide if I should have a go at buying it again in ePub someplace else or trying a different format.

JSWolf
04-04-2010, 06:58 PM
My wife is a freelance proofreader, and she gets pretty much all books as pdf's these days and makes corrections to them.

If they have the book as a digital copy, and they proof it as a digital copy, why should there be major errors in the digital version?

I hate to say this, but the PDF is sometimes used as for the print version and then the PDF is given out to the eBook department to create the various versions from. And we all know that a novel length PDF cannot be converted to any other format without errors. So, then we get shoddy proofreading of this PDF conversion and it goes out with errors that need not be there.

What needs to be done with new eBooks is to keep the book in electronic format and keep that up-to-date with all edits/changes/etc. but in a format that can easily be used to make different eBooks formats out of. That way, we get the same text that goes to the print edition.

JSWolf
04-04-2010, 06:59 PM
On the other side, I want to praise George R R Martin's Song of Ice and Fire ebooks. They had almost no errors. :2thumbsup

But, the errors in the eBook, are they also in the paperback? If they are not, then you cannot commend it at all.

Stitchawl
04-04-2010, 07:26 PM
I'm beginning to feel as if I'm the only one who doesn't mind the typos!
I wonder if there is something wrong with me?!? :eek:

When I read a book on my reader, it's for the pleasure of reading a story, :book2: not to correct student's term papers. If there is a mistake I just read past it and pay it no mind.

Stitchawl

jes1
04-04-2010, 07:56 PM
I don't mind the occasional error, but some eBooks get ridiculous. One of the characters in a book I read was named $Acute;ibhear evereywhere in at least 2 books. Not sure how to pronounce that one. :) And The Hobbit was horrible when it first came out. There were multiple errors on every page. (That was an iPod Touch sized page not a hardcover sized one.) Fortunately Fictionwise had a corrected version later.

Ravensknight
04-04-2010, 08:19 PM
I bought my first ebook from Waterstones UK, Gardens of the Moon by Steven Erikson. It came across as something someone had QUICKLY scanned in on their computer and tossed out for free. Only what I paid wasn't free. My complaint email was never answered.


I'm beginning to feel as if I'm the only one who doesn't mind the typos!
I wonder if there is something wrong with me?!? :eek:

When I read a book on my reader, it's for the pleasure of reading a story, :book2: not to correct student's term papers. If there is a mistake I just read past it and pay it no mind.

Stitchawl

That is like saying you can enjoy playing tennis with a racket with some broken strings and just "imagining" the boundary lines.
A smooth flow, a correct flow, is a VERY important part of the reading experience. You can survive without it, but you can survive without deodorant too ;)

mrkarl
04-04-2010, 08:38 PM
typos..................one of the reasons I'm cool with file sharing.....for the time being.
But I fail to see the need for the publishers when you don't need the equipment of the printing industry.

raac
04-04-2010, 09:00 PM
File-sharing isn't really the way to go because it give ammunition to the DRM idiots. I think a better solution is to demand a refund every time you get a book with typos. You could, of course, read it and list all the typos before demanding your refund. If everyone did this, the publishers would have to sit up and listen. If it isn't the same quality as the paper version then they shouldn't be charging money for it.

Even for recent books they appear to be converting PDF to ebook. This, beyond anything, indicates to me that many publishers are clueless about the digital age.

riemann42
04-04-2010, 09:51 PM
On more than one occasion I have used Google Books to see if an error is in the print version. Unless it is an H&J error, it typically is.

Stitchawl
04-04-2010, 09:53 PM
That is like saying you can enjoy playing tennis with a racket with some broken strings and just "imagining" the boundary lines.

Did... for many years. Enjoyed it too. :2thumbsup
Somehow managed to enjoy skiing on wooden skis and fishing with a cane pole and piece of string too.

A smooth flow, a correct flow, is a VERY important part of the reading experience.

I agree... if one is reading poetry or classical literature. But not necessary for the average mystery, thriller, sci-fi. To me, anyway. YMMV.

I think the terrain is more interesting than the map. :)

Stitchawl

riemann42
04-04-2010, 10:00 PM
File-sharing isn't really the way to go because it give ammunition to the DRM idiots. I think a better solution is to demand a refund every time you get a book with typos. You could, of course, read it and list all the typos before demanding your refund. If everyone did this, the publishers would have to sit up and listen. If it isn't the same quality as the paper version then they shouldn't be charging money for it.

A refund? Come-on. If it is over the top, maybe. I think a letter to the publisher is a good response in most cases. Also, care should be taken to make sure that the error is indeed ebook specific.

Even for recent books they appear to be converting PDF to ebook. This, beyond anything, indicates to me that many publishers are clueless about the digital age.

Most ebooks are made using a script to convert the InDesign file (which amounts to the same thing as converting the PDF). If a soft return is added to fix H&J, it can make it into the ebook and cause weird line breaks. It's not that publishers are clueless about the digital age, it's that typesetters are lazy, and in many cases, ignorant and incompetent. I have met many "Graphic Designers" who think if it looks good on the final page, it is OK, not realizing that the text may need to get reflowed some day.

Hiring someone to proof the ebook once would be a good start.

DJHARKAVY
04-04-2010, 10:18 PM
I hate to say this, but the PDF is sometimes used as for the print version and then the PDF is given out to the eBook department to create the various versions from. And we all know that a novel length PDF cannot be converted to any other format without errors.

If the pdf is based on the manuscript and not on a scan of a paper version (as is the case in the books and articles that my wife proofs), there should be no additional errors in conversion.

It 'should' only have problems if it needs to be OCRed...

What needs to be done with new eBooks is to keep the book in electronic format and keep that up-to-date with all edits/changes/etc. but in a format that can easily be used to make different eBooks formats out of. That way, we get the same text that goes to the print edition.

Agreed.

AlexBell
04-05-2010, 06:00 AM
Hi,

How do others feel? Any tips on getting the publishers or sellers to take notice? Maybe it's time to mount a mass complaint?

I don't know how mounting a mass complaint would be implemented, but I am going to start a thread entitle Hall of Shame, and start with some details of the faults in one of the books I bought recently published by HarperCollins UK. I have told them that I am going to do this, and that I will recommend that people do not buy the book because of the shoddy production.

Perhaps other people would be willing to add to the thread about books they have bought, and also tell the publisher that the book has been reviewed in this thread.

Regards, Alex

mr ploppy
04-05-2010, 09:09 AM
File-sharing isn't really the way to go

If the intention is to ultimately end up with a library full of ebooks without spelling mistakes, then file sharing is the way to get it. Anyone who finds a mistake in a fan-made ebook can either fix it themself and re-upload it, or post a list of corrections for someone else to act on.

Perhaps the publishers should offer some sort of bounty for proof reading, or at least have forums on their websites for people to report the mistakes they find. But they would need to act on those reports, otherwise people would soon lose interest in reporting them.

JSWolf
04-05-2010, 09:47 AM
If the pdf is based on the manuscript and not on a scan of a paper version (as is the case in the books and articles that my wife proofs), there should be no additional errors in conversion.

It 'should' only have problems if it needs to be OCRed...

When you convert a novel length PDF to any other format, there WILL be errors. And unless you have people go through the PDF comparing it to the conversion so that all the errors are found and corrected. So yes, there will be errors and if not caught then we get them in the eBooks we buy.

raac
04-05-2010, 11:06 AM
@mr ploppy
If the intention is to ultimately end up with a library full of ebooks without spelling mistakes, then file sharing is the way to get it.

I do see where you're coming from: what I'm saying is that we shouldn't have to resort to file-sharing for this purpose. Over-reliance on file-sharing can have a negative impact on the consumer because rights holders use it as justification for DRM. But let's not turn this into a piracy thread, yes? There's nothing stopping you from reporting errors to the publisher in an e-mail. A forum isn't really necessary.

ATimson
04-05-2010, 11:10 AM
There's nothing stopping you from reporting errors to the publisher in an e-mail. A forum isn't really necessary.
There ought to be some way to alert others that a book is poorly formatted. Print books can be flipped through in the bookstore, or previewed via Amazon's Look Inside feature, where such will be obvious; there's no equivalent for ebooks if you don't have a Kindle or Nook.

mr ploppy
04-05-2010, 11:31 AM
@mr ploppy
If the intention is to ultimately end up with a library full of ebooks without spelling mistakes, then file sharing is the way to get it.

I do see where you're coming from: what I'm saying is that we shouldn't have to resort to file-sharing for this purpose. Over-reliance on file-sharing can have a negative impact on the consumer because rights holders use it as justification for DRM. But let's not turn this into a piracy thread, yes? There's nothing stopping you from reporting errors to the publisher in an e-mail. A forum isn't really necessary.

I've reported a few, but got no answer to any of them so I don't bother anymore. It's not really piracy related, it's just the way things are done over there. There's no real reason why the same model couldn't be adopted by publishers, but there would need to be some sort of incentive to do so. Maybe a bounty where they refund you some of the purchase price for each mistake you find, or give you a discount off your next purchase.

raac
04-05-2010, 07:05 PM
Maybe a bounty where they refund you some of the purchase price for each mistake you find, or give you a discount off your next purchase.

This is more or less exactly what Amazon did when I reported the mistakes to them.

rakulos
04-06-2010, 03:09 AM
I bought my first ebook from Waterstones UK, Gardens of the Moon by Steven Erikson. It came across as something someone had QUICKLY scanned in on their computer and tossed out for free. Only what I paid wasn't free. My complaint email was never answered.


Glad I'm not the only one that was hacked off by this! My OCD nature made me go through and fix the mistakes. Randomly applied italics do not make for an enjoyable read :angry:

I was going to replace my entire pbook collection with electronic versions - I love the Erikson books - but I'm not going to bother now so its their loss really.

Michael J Hunt
04-06-2010, 07:40 AM
I'm trying to understand all the ins and outs of this discussion, but the use of initials confuses me - perhaps it's obvious to everyone else here, but, what do OCR and OCD mean? Thanks in advance.

MJ

Logseman
04-06-2010, 07:56 AM
OCR= Optical Character Recognition. I.E we're talking of how they convert printed text into an electronic document. In an OCR process errors are deemed to occur, what people are angry about is that ebook versions are being released without contempt of what the final result is.

OCD= Obsessive Compulsive Disorder. rakulos was just stating s/he can't bear reading badly formatted books and, due to to that syndrome, can't get any enjoyment of the book until s/he corrects them.

JSWolf
04-06-2010, 08:22 AM
What we could do is start a thread that lists the books and what errors there are along with the corrections.

Tonycole
04-07-2010, 04:13 AM
I too have been angry and frustrated by the shoddy formatting found in many commercial ebooks. I had a frank exchange of views a week or so ago with HarperCollins Australia about 'Flashman' by George MacDonald Fraser. I won't go into the details now, but I told them that I would be ashamed to put out something as poorly produced over my signature, and that I certainly would not have bought the book if I had known how badly produced it was. I did get a response, but only that the book was produced by HarperCollins UK, and they would send the email to them.

I have a suggestion, but it depends on whether commercial publishers read MobileRead Forums, and whether they give a damn anyway.

Suppose each time any of us finds a particularly badly produced book we notify each other through this forum to warn others not to buy that book - something like a 'Hall of Shame' - and copy the details to the publisher, and mention that the comment has been posted here.

We might even need a special subforum for the Hall of Shame. How do people feel about the idea? Would the moderators permit this?

Regards, Alex
Hi, I think your idea of a sort of "hall of shame" is a very good one. I would be very happy to have a sort of continually growing post on my eReader blog with this, as like so many others, I have been palmed off with full price books from apparently reputable publishers, that were a mass of typos.

If anyone feels like helping with .this,, please send me your "Black list" (name of book and Publisher) at tony@ebookanoid.com, and I shall start the post. I shall also hand on all such emails to a friend who runs a blog for Apple owners, and who is equally angry about this matter, and will certainly join in such a campaign

sassanik
04-07-2010, 04:24 AM
I'm trying to understand all the ins and outs of this discussion, but the use of initials confuses me - perhaps it's obvious to everyone else here, but, what do OCR and OCD mean? Thanks in advance.

MJ

OCR refers to the process of scanning a paper book, or pages of a book, into a computer. I believe that OCR stands for Optical Character Recognition?

OCD= obsessive compulsive disorder



I have noted that a lot of times books have trouble with quotation marks. I find that they are missing very frequently when reading resulting in me reading the line a couple of times to figure out where the character is stopping talking. I am guessing some of this is caused by reflow?

I also have just had some where things are spelled wrong, and have funky punctuation marks from major publishers as well as some of the smaller ones.

Personally I find that I catch different errors in my writing when I proofread a hardcopy instead of a digital one. Which could account for why some of the errors are getting through the proofreading stage.

Amy

Logseman
04-07-2010, 04:32 AM
The Hall of Shame has been created, in case you'd like to start bashing the books which deserve it ^^

http://www.mobileread.com/forums/showthread.php?p=860710

Sunspark
04-07-2010, 09:54 PM
What a publisher needs to do, is after putting out a book, advertise that they will pay $1 for every error/typo/problem spotted by a reader who reports it to them. And the corrected versions to be free downloads for people who have paid for the crappy versions.

If just 1 publisher does this, the others will be forced to follow suit because then they will have the rep as the publishers that are no good.

Goshzilla
04-08-2010, 12:06 AM
Hi,

I don't know if this has already been discussed, but aren't others also frustrated at the numerous typos in retail e-books? I have seen numerous mistakes in books from both Sony and Amazon. This doesn't happen with paper books and so I feel somewhat ripped off when it happens with ebooks.

How do others feel? Any tips on getting the publishers or sellers to take notice? Maybe it's time to mount a mass complaint?


These typos happen all the time with paper books. The math and sciences are filled with them. I'm just saying this isn't unique to electronic books.

WillAdams
04-08-2010, 07:47 AM
Well, Dr. Donald Knuth used to issue physical checks for errors in his books and programs (I got one for an error and an improvement in his _Digital Typography_) --- now he does virtual monies in the ``The Bank of San Serriffe'':

http://www-cs-faculty.stanford.edu/~uno/boss.html

It would be great if other authors or publishing houses would adopt similar practices.

William

Hamlet53
04-08-2010, 08:21 AM
Errors are not limited to e-books. I recall way back in college a Professor required use of a text book he had written for a course in Catalytic Chemistry; first edition published by Wiley. So the first day of class he hands out, I kid you not, 30 pages typed of errata. Some minor stuff yes, but also chemical equations that did not balance and significant missing content; as in “as illustrated in example 5.1 above.” No, not even there. I was forewarned by students who had taken the class in previous years about this so it was not like it had just been discovered in a newly published work. I am certain that the publisher was waiting to sell out the first run before republishing a corrected edition.

Anyway to get back to the subject. If the original book was only available in paper form and scanning with OCR was used to get the electronic version errors are bound to happen. I have been a volunteer proof reader for Gutenberg Project and I know how much proofing and review goes on there before books are released as final. Even so errors can still be found in texts downloaded from there. I don't know how much proofing Google Books does for the books they have scanned, the public domain ones free for complete download, but these tend to have a lot of errors.

alecE
04-08-2010, 08:27 AM
FWIW, a quick summary from the book I am reading at the moment - Meta Maths by Gregory Chaitin, published by Atlantic Books.

I'll draw a veil over the difficulty the epub format has with mathematical equations. However I've encountered about ten instances of words which are hard hyphenated so that the hyphen appears in the middle of a word, in the middle of a line. The typo which amused me most though was the failure to distinguish between minus two and root two, so that -2 was displayed instead of √2 (just in case the symbol does not display properly here it represents the square root symbol) - just a little bit critical :-)

Needless to say there are also a couple of ordinary misspellings as well.

In my own not very humble opinion, I cannot see any reason whatsoever why such shoddy work should be permitted. I will be notifying the publisher, whom I am sure will embrace me with tears of gratitude for showing him/her/it these errors.

Now this is a title that, because of its maths content, demands closer scrutiny than a 'normal' text. Thankfully I've only found one instance of a maths error, but this surely implies that the production/proofing processes are sub-standard?

Ankh
04-08-2010, 10:24 AM
That sounds like an issue with "soft hyphens". Unfortunately it's all too common.
And easy to fix (if the book is DRM-free). Use Calibre to convert the book to, say, Microsoft ".lit", delete original epub, then convert it from lit to epub.

calvin-c
04-08-2010, 12:32 PM
Hi,

I don't know if this has already been discussed, but aren't others also frustrated at the numerous typos in retail e-books? I have seen numerous mistakes in books from both Sony and Amazon. This doesn't happen with paper books and so I feel somewhat ripped off when it happens with ebooks.

How do others feel? Any tips on getting the publishers or sellers to take notice? Maybe it's time to mount a mass complaint?

Strange-I see quite a few typos in paper books. Probably not as many as I see in ebooks, but I do see quite a few. Maybe it's the books I read. (I don't keep track so this is more an impression than a statement of fact. Maybe I just notice them more in ebooks? I don't think so, but I guess it's possible.)

WillAdams
04-08-2010, 12:54 PM
The problem is, ebooks are normally derived from the files which were used to create the paper book which often have formatting applied to them which is specific to the physical books text block size, so things which were correct (manually inserted hyphens, forced line breaks) become incorrect (hyphen in the middle of a word, spurious paragraph break), in addition to glitches from poor conversion methodologies.

The answer of course is to tag the book in a rich, unambiguous markup format up-front such as TEI, then convert to other formats (and when errors are found, always correcting the source), but this requires an extra step which is hard to justify financially.

William

WarnerYoung
04-08-2010, 05:43 PM
I'm beginning to feel as if I'm the only one who doesn't mind the typos!
I wonder if there is something wrong with me?!? :eek:

When I read a book on my reader, it's for the pleasure of reading a story, :book2: not to correct student's term papers. If there is a mistake I just read past it and pay it no mind.

Stitchawl

It depends on the ebook, too, of course. If the errors are minor (e.g. a few missing periods, maybe an extra space somewhere), you're not as likely to notice them.

When the errors are literally legion, it's hard not to notice. I recently compiled the list of ebook errors (obviously OCR typos) in By the Sword by Mercedes Lackey, I ended up with a list of 250 typos. And some of the items in my list were "search for this word, it's wrong everywhere".

I sent the list into the publisher, who were actually fairly nice. They replied and said they would look into it. Of course, now that the ebook has been removed from ereader.com, who knows if a corrected version will ever appear again?

Worldwalker
04-08-2010, 06:30 PM
Having heard about Steven Saylor's mysteries set in ancient Rome, I recently bought the first in the series, Roman Blood, from Amazon. Alas, it turned out to be a Topaz-formatted book. Very ugly and with slow-turning pages, but obviously had been scanned and the errors were legion. Especially hyphenated words that had come at the end of a line in the printed text, but were inappropriate if they came anywhere else, which they did in the ebook. I could only read it for 3 or 4 chapters, then stopped and asked Customer Service to refund my money, which they obligingly did.

Normally, I'd be on the lookout for Topaz books, but this one snuck in on me. Looking at the rest of the series, I see they're all formatted the same way. I wonder if the author knows/cares that they're so poorly produced?

Steven Saylor seems to be happy to exchange email with his fans. He's the one who told me about the UK mass market edition of "The Triumph of Caesar", which is the only way to buy it in anything close to US MM format now that his US publisher is going for maximum revenue per unit and releasing the books only as trade paperbacks. I doubt if he'll know what a botch they've made of his books unless someone tells him. His contact info is on his website. Also, he's a really cool guy.

As for the comparison between free public-domain ebooks (Project Gutenberg, etc.) and commercially produced ebooks: I only buy ebooks from Baen, and I haven't found significant problems, but that's probably because Baen, being a SF specialty house founded by one forward-looking man, has probably had good electronic copies of those books from the get-go, and no need to scan/OCR them. From what I'm reading in this thread, some of what's turned out by the major publishing houses is barely spell-checked and not proofread at all. Project Gutenberg (and sites like Feedbooks and Manybooks, which scrape PG) have the benefit of the Distributed Proofreading project (have you proofread a page today?). Via PGDP, several separate human beings proofread every page of every scan, in several rounds of proofreading, and while some errors undoubtedly creep through (and there are more of them in books created prior to the formation of PGDP), by and large the quality is much higher than the "OCR it and ship it" books from the major publishers.

Y'know, if publishers were smart, they'd so something like that themselves: a system like PGDP, where you could register as a proofreader and get credit based on your accuracy in proofreading (as determined by a comparison of some significant number of people proofing each page) and number of pages you did, which would be applied towards the purchase price of that book once it is finished and released. There's probably some complicated problem involving minimum wage laws that prohibits it, but it still seems like a good idea. At least that way, ebooks would suck less.

Or, if they paid a bunch of college students (or even random people on the Web) ten bucks an hour to proofread OCR'd scans, they could proofread a 200-page book for under $500. It wouldn't take much of an increase in sales once word got around (hey, did you hear BigPubCo's ebooks are better than everyone else's?) to cover that.

Michael J Hunt
04-09-2010, 03:14 AM
Thanks to Sassanik and Logesman for the explanation of OCR and OCD. I get the allusion now.

MJ

rleguillow
04-09-2010, 09:38 AM
And easy to fix (if the book is DRM-free). Use Calibre to convert the book to, say, Microsoft ".lit", delete original epub, then convert it from lit to epub.

Well, that is a question - is it possible that a conversion from one format to another is inserting such errors?

HarryT
04-09-2010, 02:17 PM
Well, that is a question - is it possible that a conversion from one format to another is inserting such errors?

The only format conversion that tends to introduce textual errors is converting from PDF to something else, because a PDF file doesn't contain "text" at all, and the conversion tool has to "reconstruct" the page from what are basically graphical components.

Conversion between "text" formats - ePub, Lit, Mobi, etc - should be "lossless" as far as the actual text is concerned. You may, however, lose formatting in the process.

kad032000
04-09-2010, 02:36 PM
Well, that is a question - is it possible that a conversion from one format to another is inserting such errors?

Converting one format to another ALWAYS carries the possibility of error. (Not just for ebooks. For anything.)

TallMomof2
04-09-2010, 05:29 PM
I've had a number of eReader books that convert poorly. Soft hyphens, curly quotes that don't translate, and having all punctuation except "." and "," not translate.

WarnerYoung
04-09-2010, 08:32 PM
The only format conversion that tends to introduce textual errors is converting from PDF to something else, because a PDF file doesn't contain "text" at all, and the conversion tool has to "reconstruct" the page from what are basically graphical components.

Conversion between "text" formats - ePub, Lit, Mobi, etc - should be "lossless" as far as the actual text is concerned. You may, however, lose formatting in the process.

I'm not sure that's strictly true. It depends on how the PDF file was generated. Otherwise, a standard PDF reader wouldn't be able to let you select and copy its text, or search through the text in its files. Or am I missing something here?

WarnerYoung
04-09-2010, 08:38 PM
Part of the reason for these typos is that, until recently (I would guess), many publishers didn't put that much effort into it. I talked to someone from one of the publishers at San Diego ComicCon last year (I want to say it was Penguin, but I don't remember for sure). The woman I talked to seemed pretty enthusiastic about ebooks, but she admitted that they were way behind on producing them because they had ONE person in the entire company who handled converting things into ebooks.

Another problem is the format. Not all publishers natively support all ebook formats. I emailed Harper Collins about some typos I found in the Septimus Heap ebooks from eReader.com. Their reply said they would look into it, but they would have to get eReader.com to take care of it, because they only supported ePub natively. The actual conversion to PDB format is apparently handled by eReader.com.

Jellby
04-10-2010, 04:15 AM
I'm not sure that's strictly true. It depends on how the PDF file was generated. Otherwise, a standard PDF reader wouldn't be able to let you select and copy its text, or search through the text in its files. Or am I missing something here?

Even with text-based PDFs, the PDF does not (necessarily) contain information about words, paragraphs, etc. The characters are easy to extract (unless there are funny fonts involved) but joining hyphenated words at the end of line, putting spaces where they belong, removing page numbres and headers, dealing with footnotes, putting columns in the right order, detecting paragraphs, etc. is a different matter.

kdawnbyrd
04-10-2010, 08:08 AM
I'm reading "Fresh Kills," an Amazon Breakthrough Novel winner. I'm only about 25% and I've already found about six places where they scrunched words together without a space.

mr ploppy
04-12-2010, 12:35 PM
Even with text-based PDFs, the PDF does not (necessarily) contain information about words, paragraphs, etc. The characters are easy to extract (unless there are funny fonts involved) but joining hyphenated words at the end of line, putting spaces where they belong, removing page numbres and headers, dealing with footnotes, putting columns in the right order, detecting paragraphs, etc. is a different matter.

Mobipocket's reader/converter seems to do a better job of converting from PDF than Calibre, though it is still not perfect. I don't see how conversion would be responsible for all the spelling mistakes in ebooks though.

Solitaire1
04-12-2010, 03:04 PM
I think that a factor contributing to typos in ebooks is that ebooks are still relatively new and the publishing industry is adapting to the change. I think that as the industry adapts to the change errors in ebooks will become less common.

One way errors could be reduced is to maintain a book's electronic source text in a form that can be easily formatted for many uses. It could be something like a plain text file with plain text markup to indicate intended formatting (such as [Bold text starts here]). The markup could be similar to HTML, but intended to be read by a human, rather than interpreted by a computer. A human takes the source text file and formats it in accordance with the instructions for a specific ebook format.

I think that ebooks are in the same statues as CDs and digital audio recording were in the early days. Due to the differences between analog and digital recording, it took the recording industry time to adjust to the change and I think that it will be the same with ebooks.

WillAdams
04-12-2010, 03:18 PM
Solitaire1 --- having a human involved goes counter to reducing errors.

The solution already exists (rich markup using XML schemas like docbook or TEI), the problem is that the discipline to use such is contra-indicated by publisher's obsessions w/ short-term profits, editorial usage of Microsoft Word and lack of discipline / technical competence in typical design-school graduates.

William

Worldwalker
04-12-2010, 03:45 PM
The markup could be similar to HTML, but intended to be read by a human, rather than interpreted by a computer.

Um, might I point out that HTML is, in fact, meant to be read by a human? Or that the Web existed long before FrontPage, let alone Dreamweaver? There are still plenty of us hand-coding HTML, and a whole lot more of us squinting at lousy auto-created code to fix it.

Is reading <b> or <strong> really that much harder than reading [boldface starts here]?

A human takes the source text file and formats it in accordance with the instructions for a specific ebook format.

And thereby inserts errors.

You're talking about having a human act as a dumb processing system -- something which a computer can do much more efficiently. Having a human go along looking for [boldface starts here] and doing something with it isn't nearly as efficient as having a computer do the same, in terms of either accuracy or time.

On the other hand, you've provided a perfect example right here:

I think that ebooks are in the same statues as CDs and digital audio recording were in the early days.

You wrote "statues" where you meant "status". Since we can assume that you know the difference between sculptures and condition, it probably happened because your fingers, running half on automatic as you thought a line ahead of where you were actually typing, inserted that extra 'e' and, since it made a legitimate word, no little red line appeared under it on your screen. That's the part that we need humans for. To a computer, since pearls can be in statues (The Adventure of the Six Napoleons), and since tourists can be in statues (the Statue of Liberty), why can't CDs and ebooks be in statues? That's where we need a human who can understand what it was you were trying to say, as distinct from what you actually wrote, and spot the typo.

And that's what the problem is with the ebooks: not that computers can't read the formatting, or that a human could read it better, but that computers can't spot when something has gone wrong. They can read the formatting just fine; they can't understand the content. That's particularly true of OCR'd text, but it also comes up with things like soft hyphens, hard returns, and other things meant to format text ... for humans.

Due to the differences between analog and digital recording, it took the recording industry time to adjust to the change and I think that it will be the same with ebooks.

Correct me if I'm wrong, but wasn't the recording industry using digital recording for masters long before digital formats became available on a consumer level? I don't think there was that big an adjustment.

WarnerYoung
04-12-2010, 05:48 PM
You're talking about having a human act as a dumb processing system -- something which a computer can do much more efficiently. Having a human go along looking for [boldface starts here] and doing something with it isn't nearly as efficient as having a computer do the same, in terms of either accuracy or time.



But Solitaire1 does have a point. Precisely because computers can't always spot errors, you probably DO need a human involved somewhere in the process. For cases where the book is already in an electronic form and just needs to be converted to an ebook, maybe little to no human intervention is needed. But for other cases, such as old books that have to be scanned, it seems to me you'd pretty much have to have a human help proofread it.

Worldwalker
04-12-2010, 11:54 PM
I'm not saying you don't; quite the opposite, in fact.

He was suggesting that books that are already in electronic format be tagged with some quasi-HTML (the main difference, apparently, being using phrases instead of abbreviations, like [bold text starts here] instead of <b>) and humans do something when they see that tag, presumably selecting the relevant text and clicking a "bold" button, when converting it to some particular ebook format. Precisely why a human pushing buttons would be faster or more accurate than an HTML renderer eludes my comprehension.

That's not where the problems are. The problems are in bad scans that aren't even spell-checked, or spell-checked but not proofread. Those are what need to be gone over with a fine-toothed comb by a real human. And those, unfortunately, are also what is sold by publishers putting greed ahead of all else, including their own long-term profitability.

You'll notice that recent Project Gutenberg books -- basically, any that have been through the Distributed Proofreading Project -- are much superior in quality to most backlist commercial ebooks. And they are at times working with books hundreds of years old, victims of age and worn type. They're proofread by humans -- why not go do a page? That makes all the difference. That's where the human eye is needed: checking the scan against the original. Not in reading through a computer file and clicking "bold" every time you see [bold text starts here].

Solitaire1
04-13-2010, 01:33 AM
Quote from Solitaire1:
The markup could be similar to HTML, but intended to be read by a human, rather than interpreted by a computer.


Quote from Worldwalker:
Um, might I point out that HTML is, in fact, meant to be read by a human? Or that the Web existed long before FrontPage, let alone Dreamweaver? There are still plenty of us hand-coding HTML, and a whole lot more of us squinting at lousy auto-created code to fix it.

Is reading <b> or <strong> really that much harder than reading [boldface starts here]?

You are right that it is likely as easy to read standard HTML tags as it is to read more verbose tags, especially if the HTML tags are fairly simple. I was just thinking of clarity and to make it easier to see them with the eye when I mentioned the longer tags.

Quote from Solitaire1:
A human takes the source text file and formats it in accordance with the instructions for a specific ebook format.

Quote from Worldwalker:
And thereby inserts errors.

You're talking about having a human act as a dumb processing system -- something which a computer can do much more efficiently. Having a human go along looking for [boldface starts here] and doing something with it isn't nearly as efficient as having a computer do the same, in terms of either accuracy or time.

On the other hand, you've provided a perfect example right here:

Quote from Solitaire1:
I think that ebooks are in the same statues as CDs and digital audio recording were in the early days.

Quote from Worldwalker:
You wrote "statues" where you meant "status". Since we can assume that you know the difference between sculptures and condition, it probably happened because your fingers, running half on automatic as you thought a line ahead of where you were actually typing, inserted that extra 'e' and, since it made a legitimate word, no little red line appeared under it on your screen. That's the part that we need humans for. To a computer, since pearls can be in statues (The Adventure of the Six Napoleons), and since tourists can be in statues (the Statue of Liberty), why can't CDs and ebooks be in statues? That's where we need a human who can understand what it was you were trying to say, as distinct from what you actually wrote, and spot the typo.

And that's what the problem is with the ebooks: not that computers can't read the formatting, or that a human could read it better, but that computers can't spot when something has gone wrong. They can read the formatting just fine; they can't understand the content. That's particularly true of OCR'd text, but it also comes up with things like soft hyphens, hard returns, and other things meant to format text ... for humans.

Thanks for the catch. I completely missed that when I posted it. This clearly illustrates the problems with humans reviewing text, especially if someone writes a bit too quickly.

Quote from Solitaire1:
Due to the differences between analog and digital recording, it took the recording industry time to adjust to the change and I think that it will be the same with ebooks.

Quote from Worldwalker:
Correct me if I'm wrong, but wasn't the recording industry using digital recording for masters long before digital formats became available on a consumer level? I don't think there was that big an adjustment.

If I remember correcly, in the early days of CDs there were problems with getting CDs to sound good. The following article from Stereophile covers the issue: http://www.stereophile.com/news/10790/

neilmarr
04-13-2010, 02:32 AM
***the PDF is sometimes used as for the print version and then the PDF is given out to the eBook department to create the various versions from. And we all know that a novel length PDF cannot be converted to any other format without errors. So, then we get shoddy proofreading of this PDF conversion and it goes out with errors.***

Right you are, Jon. In my technical innocence when we decided to run conversions through Smashwords and elsewhere and also produce our own ePubs, I thought our print-ready PDFs would do the job admirably. I was soon put right by my technical partner. It's actually a matter of going back to the (Word) source material from which the PDF was produced and starting over.

Even then, we've had reports of odd errors here and there (from kind readers), which will mean re-proofing all formats of 100 titles when we have time. Meantime, we immediately correct a specific conversion-generated slip when it's pointed out. Not ideal, I know, but it's the best we can do short-term with such a small team.

This is an interesting thread, by the way. Thanks. Cheers. Neil

Worldwalker
04-13-2010, 02:55 AM
Solitaire1, please edit your post and correct your repeated mis-capitalization of my name.

Thank you.

sassanik
04-13-2010, 05:09 AM
I'm not saying you don't; quite the opposite, in fact.

You'll notice that recent Project Gutenberg books -- basically, any that have been through the Distributed Proofreading Project -- are much superior in quality to most backlist commercial ebooks. And they are at times working with books hundreds of years old, victims of age and worn type. They're proofread by humans -- why not go do a page? That makes all the difference. That's where the human eye is needed: checking the scan against the original. Not in reading through a computer file and clicking "bold" every time you see [bold text starts here].

A good point to note is that at Project Gutenberg most documents go through several rounds of proofreaders to ensure that as few as possible errors get through the system. It does not seem to be that case for many ebooks, some of those errors seem like they should have been caught on a second or third reading.

Amy

Drybonz
04-13-2010, 06:10 PM
I have noticed some of the bargain eBooks on Amazon have reviews that say there are a lot of typos in the books. It makes me very wary about buying these books because I worry about whether what I'm reading is even authentic.

Here's an example... Henry Miller's Rosy Crucifixion has a Kindle edition that is bundled as a trilogy for $10. Reviews complain about multiple typos. Now, while for this example there are other digital editions to choose from, there are quite a few authors that don't have public domain books and the only digital versions are these bargain editions (See Jean Genet in the Kindle store for an example). So if I wanted to read a digital book that only has one of these bargain editions to choose from I would be concerned.

So, I guess my question is... does anyone have an experience with these bargain Kindle editions that they can share? Should I be concerned with the accuracy of these books? If so, what is the best source for these contemporary classics that digital publishers have, so far, ignored?

Thanks.

Solitaire1
04-13-2010, 06:26 PM
Solitaire1, please edit your post and correct your repeated mis-capitalization of my name.

Thank you.

Corrected. I apologize for that.

MerLock
04-13-2010, 07:25 PM
I think the quality of ebooks will get better since they know the end product will be digitalized. I think the same goes for music, when people tried converting their cassette tapes to mp3's, the quality was pretty horriffic unless you had pretty good software.

I'm guessing the source for most books in the past were just typed manuscripts so the only way to digitalize them is to scan them. Guessing now days the source is now typed on a computer so is already in digital format.

Just a guess though.

ATimson
04-13-2010, 07:35 PM
I think the quality of ebooks will get better since they know the end product will be digitalized.
They've known that for over a decade at this point, though. If they haven't figured it out yet, I don't have much hope for their learning.

WarnerYoung
04-13-2010, 09:55 PM
A good point to note is that at Project Gutenberg most documents go through several rounds of proofreaders to ensure that as few as possible errors get through the system. It does not seem to be that case for many ebooks, some of those errors seem like they should have been caught on a second or third reading.

Amy

And as I mentioned in an earlier post, at least one major publisher I talked to last year admitted they have one, and only one, man who handles all their ebooks. The implication (though I'll admit it wasn't said outright) was that this one guy handles converting ebooks and everything else involved, like proofreading. That's pretty sad.

On the other hand, as the ebook market starts to gain more momentum, hopefully publishers will realize that ebooks actually need the same attention and processing (editor, proofreader, etc.) as regular print books. One hopes, anyway.

AlexBell
04-14-2010, 03:09 AM
On the other hand, as the ebook market starts to gain more momentum, hopefully publishers will realize that ebooks actually need the same attention and processing (editor, proofreader, etc.) as regular print books. One hopes, anyway.

But what can we do to bring on that happy day? I haven't seen many practical suggestions apart from reviewing particularly bad ones in a Hall of Shame thread and telling the publisher that one has done this.

I know it won't always work - I've had no response whatever from HarperCollins UK, but did get a gracious response from Clarity Press who are working on the problems I raised.

Regards, Alex

JSWolf
04-14-2010, 08:31 AM
***the PDF is sometimes used as for the print version and then the PDF is given out to the eBook department to create the various versions from. And we all know that a novel length PDF cannot be converted to any other format without errors. So, then we get shoddy proofreading of this PDF conversion and it goes out with errors.***

Right you are, Jon. In my technical innocence when we decided to run conversions through Smashwords and elsewhere and also produce our own ePubs, I thought our print-ready PDFs would do the job admirably. I was soon put right by my technical partner. It's actually a matter of going back to the (Word) source material from which the PDF was produced and starting over.

Even then, we've had reports of odd errors here and there (from kind readers), which will mean re-proofing all formats of 100 titles when we have time. Meantime, we immediately correct a specific conversion-generated slip when it's pointed out. Not ideal, I know, but it's the best we can do short-term with such a small team.

This is an interesting thread, by the way. Thanks. Cheers. Neil

Starting from a Word document has it's own set of issues. The CSS used by Word when you save as HTML is a mess. The best way to do it actually is to do it as a plain text file like Project Gutenberg does. with markup for bold and italics. That way, you just slap <p></p> around the paragraphs and add in your classes for as needed. makes it very easy to then do the CSS and convert to various formats. It's not hard and it would work just fine.

WarnerYoung
04-20-2010, 10:00 PM
But what can we do to bring on that happy day? I haven't seen many practical suggestions apart from reviewing particularly bad ones in a Hall of Shame thread and telling the publisher that one has done this.

I know it won't always work - I've had no response whatever from HarperCollins UK, but did get a gracious response from Clarity Press who are working on the problems I raised.

Regards, Alex

Honestly? I don't know. The only thing I can think of is simply to vote with my pocketbook, and let the publishers know it. This applies not only to bad ebooks with errors, but also to ebooks that are priced ridiculously high.

The only other idea I can think of is to find some way to raise awareness of this issue. Given how much mainstream press has been talking about ebooks and ereaders lately, it might help if the same press started talking about the problems with ebooks, too.

If anyone else has better ideas, I'd love to hear it, too.

Tonycole
04-23-2010, 02:20 AM
Whilst reading the comments on the topic of typos and so on in eBooks, one of the commentators, Alex, suggested creating a sort of Hall of Shame, and gathering all the typos and lousy proof reading examples into one place on a number of blogs, in an effort to shame the publishers into taking eBooks seriously and delivering quality.

Whilst personally I have not had much of this problem, it is obviously serious enough and wide spread enough to infuriate a hell of a lot of people, so it seemed a good idea to me.

So I thought that if I, and any other bloggers who were interested should create a page called "Lousy proof reading - Publishers take note!", or something similar and spread the word that people could send us their examples of lousy proof reading, we would all place them on this page on our blogs, and exchange all examples, so we all carried the total list of examples. This would surely make the publishers sit up and take notice of us all.

I shall be contacting other bloggers to suggest this to them, but I thought that if you were willing, we could make a start on this.......

If you think that Alex's idea is worth pursuing, contact me at www.ebookanoid.com or tony@ebookanoid.com, and we shall see what we can manage

NickSpalding
04-23-2010, 07:20 AM
Typos and spelling mistakes keep me awake at night...

I went through my book over and over and over again, but I bet some of the buggers have still crept through.

It really takes you out of a book if you come across a mistake like this, especially in a tense bit.

I remember reading a copy of a Stephen King book as it got near to the climax and there was a howler of a spelling mistake. King's not known for his strong endings anyway and that really didn't help.

kceb10
04-23-2010, 11:00 AM
I've heard alot of you guys complaining about typos in ebooks but have not really seen very many maybe 1 per book, until i read Moreta, dragonlady of pern by Anne McCaffery this book has so many typos i almost stopped reading it. There had to be close to one every other page i was very dissapointed in this book

WarnerYoung
04-23-2010, 10:07 PM
I've heard alot of you guys complaining about typos in ebooks but have not really seen very many maybe 1 per book, until i read Moreta, dragonlady of pern by Anne McCaffery this book has so many typos i almost stopped reading it. There had to be close to one every other page i was very dissapointed in this book

Which format of ebook was it? That might make a difference. I found a number of errors in The Lord of the Rings (eReader.com's PDB format), but when I looked at an excerpt of it in B&N's EPUB version, those errors weren't there.

And one every other page is good, compared to Mercedes Lackey's By The Sword (again, PDB format).

In any case, I suspect that any time it's an older book and there are a lot of errors, it's likely to be scanning and OCR errors.

Tonycole
05-05-2010, 03:27 AM
Hi all, I have had agreement from Alex here to use material from this thread as part of a new page on my Blog (www.ebookanoid.com) which I shall call something like Typos - Hall of Shame.

The idea is to build up a whole page of reports of the worst typos in the eBooks we buy, in an attempt to shame the publishers into doing their proof reading properly.

But I also need your individual agreement to use your comments in this page, so if you are OK with me doing this, could you let me know by means of an email to:

tony@ebookanoid.com

Best wishes and thanks,

Tony

aagstn
05-05-2010, 09:13 AM
Just bought League of Frightened Men by Rex Stout this week in eReader format. It is a Random House book but has several OCR errors. I've found one or two each chapter. Random House is charging a $15 suggested price for this book and can't be bothered to proofread it for errors.

Patricia29
05-17-2010, 11:36 PM
I have found all kinds of errors in all of the books I have downloaded from Sony's Reader Store, the worst one so far is James Micheners book "Caravans": He'll becomes hell, we'll becomes well, names of characters and places are spelled differently on the same page. Hyphens and capital letters are added all over the place, it takes all the pleasure away from reading and is an insult to the authors.
It should not be difficult to edit ebooks properly, unless this is another example of outsourcing to third world countries.:angry:


I've just posted the following on the 'smell of books' thread and I think it would be just as appropriate for it to be here as well:

I'm an ignoramus about e-book readers as I don't own one. But I do have three e-books 'out there' and I'm curious to know how they appear on (or in) an e-reader. When I look at the pdf versions on my computer screen there's absolutely nothing about them that is different to their paper counterparts. Am I being naive, but aren't all books published in both forms identical? Could this only be true for books that are published with the intention of their appearing in both forms? Also, is it possible for unwanted errors to creep in during conversion from one format to another? If this is so, surely the publisher should subject the final pdf version to a proof-read before he sells it.

As an editor I'm probably more aware than most people of typos in professionally produced manuscripts. If there's only one, or perhaps two, in a book, I consider that to be not sufficient to put me off a book. If there are more than two, I find I continue to read with half my mind fixed on when I'm going to find the next. So, for me, errors can be distracting.

AlexBell
05-18-2010, 04:25 AM
I have found all kinds of errors in all of the books I have downloaded from Sony's Reader Store, the worst one so far is James Micheners book "Caravans": He'll becomes hell, we'll becomes well, names of characters and places are spelled differently on the same page. Hyphens and capital letters are added all over the place, it takes all the pleasure away from reading and is an insult to the authors.
It should not be difficult to edit ebooks properly, unless this is another example of outsourcing to third world countries.:angry:

I suggest that you send the details of the errors to toni cole at tony@ebookanoid.com. He is a member of MobileRead Forum, and also has a blog where he is trying to publicise typos in ebooks - as part of a Hall of Shame enterprise.

Regards, Alex

mcgriff
05-18-2010, 01:10 PM
Last book I read, Last Argument of Kings by Joe Abercrombie, had a typo near the end of the book. Kinda ruined the mood as the story line was resolving. I had to pause and read the word a few times in shock that the error was in print. I can't direct you to the other errors I have recently seen as I read so much. Any more I expect to find an error in any book I buy.

DJHARKAVY
05-18-2010, 01:39 PM
My wife is a professional proofreader/editor. She cannot read books, paper or e, without finding tons of errors that she thinks a proofer SHOULD have caught.

There are more errors in eBooks, as a function of OCR error (although that should not be the case in books printed more recently) but I rarely find enough to damage my reading experience. I can usually figure out what is meant from context and you would be amazed at how much we tend to do so automatically.

Maggie Leung
05-18-2010, 01:54 PM
My wife is a professional proofreader/editor. She cannot read books, paper or e, without finding tons of errors that she thinks a proofer SHOULD have caught.

There are more errors in eBooks, as a function of OCR error (although that should not be the case in books printed more recently) but I rarely find enough to damage my reading experience. I can usually figure out what is meant from context and you would be amazed at how much we tend to do so automatically.

I also edit for a living. Unless I'm working, I overlook errors. Otherwise, they'd take all the fun out of reading.

DJHARKAVY
05-18-2010, 02:04 PM
I also edit for a living. Unless I'm working, I overlook errors. Otherwise, they'd take all the fun out of reading.

Is your experience the same as hers?

Maggie Leung
05-18-2010, 02:06 PM
Is your experience the same as hers?

Yes, it is.

Michael J Hunt
05-19-2010, 07:51 AM
I, too, am an editor and my family objects intensely when I 'edit' our conversations. Is there something I could take to prevent this from happening?

MJ

Maggie Leung
05-19-2010, 01:14 PM
I, too, am an editor and my family objects intensely when I 'edit' our conversations. Is there something I could take to prevent this from happening?

MJ

Maybe aversion therapy? You could wear a rubberband around your wrist and they could snap it as needed, lol.

BooksForABuck
05-19-2010, 01:20 PM
At BooksForABuck.com, we edit all of our books, send the edited MS back to the author for review, then go through the MS again once they're back from the author. Yet, still, some typographical errors sneak through. Unfortunately, I've been seeing more and more typos and errors in books put out by the big publishers as well. Not sure if that's because there are more or because I've become hypersensitized to errors as a result of my job.

Simply put, books are like software programs...it's impossible to eliminate all bugs, but we spend a lot of time and money testing to minimize the number and severity.

Rob Preece
Publisher, BooksForABuck.com

calvin-c
05-19-2010, 01:57 PM
Conversion between "text" formats - ePub, Lit, Mobi, etc - should be "lossless" as far as the actual text is concerned. You may, however, lose formatting in the process.
At least partially this depends on your definition of text. I've seen many cases where conversion between formats has 'lost' characters it didn't recognize, usually punctuation or other symbols. (curly quotation marks & apostrophe's are particularly common, also foreign characters although those are less common.)

Another artifact I've often seen from conversions is 'added' text. Usually this is because the original used some sort of markup language that the conversion didn't recognize and, reasonably, anything it doesn't recognize it treats as text. The result is sometimes things like "Joe said <emp>&quotHold on!&quot</emp>" easily recognizable as 1) a non-HTML markup tag and 2) an erroneously typed HTML symbol code.

I'll note that neither of the above *should* be a problem in a conversion program, but they are. Whether or not it's a problem with the conversion program is something else that depends on your definition. It's been said that the best documentation for a program is the source code-that describes *exactly* what the program will handle and, by omission, what it won't.

In that case, using the program to convert files containing 'text' it won't handle is operator error rather than a problem with the program. That's not a practical definition, but I think it points out why there's a gray area when defining what constitutes a problem in a conversion program.

But, regardless of whether it's a problem in the conversion program or in how it's used, the fact remains that conversion programs can introduce errors that don't exist in the original document, and not just in formatting.

susan_cassidy
05-19-2010, 05:03 PM
I recently read "Jane and the Man of the Cloth", by Stephanie Barron on my Kindle, and it had hundreds of OCR errors. Many were 'gentle' rendered as 'gende', including 'gendeman' for 'gentleman', etc. It was horrific, since the book was set in Regency times, and words like gentleman are used constantly. I highlighted every one I could, and sent the list to Amazon, with a complaint. They refunded my money, and I see that the book is not currently available for Kindle. The first book in the series had some typos, too, but not quite as bad. I enjoyed the story, but the typos were a constant irritant.

Many errors could be caught with spell-check, and a couple of little programs that could catch things like improper spacing around hyphens, and maybe validating proper names against a list for the book in question.

When they are too lazy to run spell-check, it really makes me mad.

Michael J Hunt
05-23-2010, 02:50 PM
Susan,
I discovered, through trial and error, that strange changes to the middles, beginnings and endings of certain words follow after you use the 'Find' and 'Change to' button (or equivalent) in your programme. Particularly if the word you change and replace is a short one. A good for instance is if you change the name of a character from, say 'Tom' to 'Fred', any references to tomatos will appear as 'Fredatos', or, if you're writing a jungle story, 'tom-tom' to 'fred-fred', or sci-fi, 'atom' to 'afred', or a medical, from 'anatomical' to 'anafredical'. This can be both irritating and funny.

When I first saw it, it took me some time to realise what was happening, then it dawned; now I'm very cautious when I use the facility.

MJ

susan_cassidy
05-23-2010, 02:59 PM
Susan,
I discovered, through trial and error, that strange changes to the middles, beginnings and endings of certain words follow after you use the 'Find' and 'Change to' button (or equivalent) in your programme.
MJ
I don't know what you are referring to. I'm talking about just reading a book on my Kindle. There is no "change to" function on a Kindle. If I disassemble the book back into HTML, I can see the words as originally included, not a function of the software. OCR software, when used to generate an ebook, is just a first step. You still have to proofread and correct. The publisher did not, in the book I was reading.

JSWolf
05-23-2010, 03:05 PM
Susan,
I discovered, through trial and error, that strange changes to the middles, beginnings and endings of certain words follow after you use the 'Find' and 'Change to' button (or equivalent) in your programme. Particularly if the word you change and replace is a short one. A good for instance is if you change the name of a character from, say 'Tom' to 'Fred', any references to tomatos will appear as 'Fredatos', or, if you're writing a jungle story, 'tom-tom' to 'fred-fred', or sci-fi, 'atom' to 'afred', or a medical, from 'anatomical' to 'anafredical'. This can be both irritating and funny.

When I first saw it, it took me some time to realise what was happening, then it dawned; now I'm very cautious when I use the facility.

MJ

In your example of Tom to Fred. There is a way around it. Use a case specific search for Tom (not the capitol T) and also search for a space around the word. That will catch most instanced of Tom and correctly replace them with Fred. Then just do a search/ manual replace for Tom (case specific again) without any spaces. That will get them all without making other words that use tom or Tom.

Also, there is regex that would get it right in a search/replace.

Patricia29
05-23-2010, 05:43 PM
I have read three ebooks since getting my Sony ereader, it seems the older the book, the more errors there are.
The worst one so far is James Michener's book "Caravan", he'll becomes hell, she'll becomes shell and we'll becomes well, capital letters and hyphens are added to words where none should exist, and a lot of words are misspelled or misplaced. There is barely a page without an error and it greatly affects the joy of reading, I even had to go back several pages on more than one occasion just to make sense of a sentence.
I have tried to complain to Sony, both by phone and email, so far no satisfaction. How can someone who speaks English as a second language, understand a complaint involving grammatical errors??? (All he could suggest was that I download the book again, but what would that have achieved?)
I wonder if the authors, (their heirs) and or the publishers know what Sony et al are doing to their work. It is shoddy work at best and criminal at least. There is no excuse for bad editing and there is no way of getting your money back. Shame on Sony!!!

JSWolf
05-23-2010, 11:04 PM
I purchased a Star Trek eBook from Kobo and it had no quotes or apostrophes. The publisher is Simon&Schuster.

All Kobo was able to do is refund my money.

L.J. Sellers
05-23-2010, 11:32 PM
I haven't come across the lack of punctuation yet. That's unacceptable. Publishers need to hire professionals to format their books correctly before uploading. It's not that complicated. Some of the errors are the fault of the translation technology though.

I predict it will all improve in time. But the greater level of attention might drive e-book prices up too.
L.J.

pdurrant
05-24-2010, 04:17 AM
When they are too lazy to run spell-check, it really makes me mad.

It's worse when they run a lazy spell-check.

Michael J Hunt
05-26-2010, 06:55 PM
Hi Susan,

Sorry about the misunderstanding. I was explaining how things can go awry in the production process and how strange changes can occur in the text. There's no excuse for a professional publisher of e-books to allow such errors to occur. When they do occur, it's simply the result of poor quality control and you should be entitled to a refund.

Jon, thanks for that explanation. I had no idea the problem could be avoided so easily. Pardon my ignorance, but what is 'regex'?

MJ

bizzybody
05-27-2010, 04:49 AM
Regex = REGular EXpression. http://en.wikipedia.org/wiki/Regular_expression

Also known as "the stuff that Yahoo, Google and other search engines do their best to ignore in order to maximize profits from sponsored hits and prevent the users from finding what they're looking for". ;P

If I had the $$$ to do it, I'd start a web search engine called lookstupid.com Why that? "Look, you stupid computer! Stop trying to out-think me and just search for EXACTLY what I entered!"

Search engines used to obey things like "quoted strings", where only hits were returned if the page contained EXACTLY what was between the quotes. Boolean operators like AND, NOT, NEAR were obeyed. These days they're typically completely ignored.

lookstupid.com would strictly obey them. I'd even have an IN-ORDER operator so if you know some of the words, perhaps from a book title or song lyric, but not the exact and complete thing, you could enter what you know with IN-ORDER and only hits with those words in that order, first to last, would be returned.

Search engine programmers all need tied to a chair and made to read a few dozen books on this stuff. :P

bizzybody
05-27-2010, 05:15 AM
Typos. BAEN has them a-plenty.

I've read all of the free BAEN e-books they've released on the CD-ROMs included with some of their hardcovers. I'm pretty certain they've all had typos.

BAEN does eARCs (electronic Advance Reader Copies) and charges money for them. ARC buyers are supposed to do things like proofread and provide feedback to the author. Sometimes the eARC and the final version have significant changes from reader feedback, but usually not.

But still the typos persist.

One place where punctuation issues can creep in is when mixing Unicode and non-Unicode text encodings. For English text books, the Extended ASCII character set works just fine. It has all the punctuation required. The problem is some platforms (like Palm OS) don't natively support Unicode, and conversion programs don't have settings for converting from Unicode to E-ASCII punctuation characters.

Load an ASCII file into a program on a Unicode compliant platform and it'll display fine. go the other way (if it's possible without conversion) and depending on the reading software you may find all the Unicode characters replaced with boxes or empty spaces or *nothing*- with the text on either side jammed together, or the program will replace the Unicode character with the same numbered character from the systems native character set.

Do a conversion from a Unicode source to a format which doesn't support Unicode at all on any platform, and the foul-ups will appear on all platforms that file can be opened on.

One fix that usually takes care of this in HTML is saving as Filtered HTML in MS Word. That's the cleanest HTML one can get from MS Word.

Another is this little program. http://ratzmandious.110mb.com/files/UTFStripper.zip
Does exactly what it says on the tin. It takes a text file, looks for the UTF-8 codes that start with &# followed by three or four digits (the codes are all 4 digit but leading zeroes can be dropped) and replaces them with the ASCII equivalents.

What you're left with is technically still a UTF-8 Unicode compliant HTML file, but there's no Unicode in it and it will convert cleanly for platforms that don't have Unicode support. The file size will also shrink a bit due to replacing the 5 or 6 characters required to encode a single character with a real single character.

Were one so inclined, all the text of a UTF-8 encoded HTML file could be entered in those &#nnnn codes, but the file size would be about 5 times larger. Hmmm, that'd be somewhat simple to do a conversion program for, but it'd have to ignore HTML tags and whatever in the header must remain ASCII.

The problem is all the USA's fault for being the main originator of computer technology way back when, when early computer technical people never gave a thought to anyone using non-English languages on computers. ;) That, and the original non-extended ASCII character set only using 7 bits, that resulted in the need for the BinHex file encoding format for Macintosh files due to early internet router computers being programmed to assume all the data passing through was plain English text, so we'll just set that first bit to zero on every outgoing byte, m'kay... saves 1/8th on bandwidth at 110 or 300 characters per second.

On behalf of The USA, I humbly apologize for that. ;)

bizzybody
05-27-2010, 05:30 AM
One more thing. If you're typo-phobic and grammatical and spelling errors make you want to scream... avoid craigslist.org That place is a 'showcase' for the worst you'll ever see in abuse of the English language. Cn't blm txtng fr it either. Craigslist was that bad before text messaging on cellphones, but it may have become worse since.

I'm a compulsive reader. Put text in front of my eyes, I read it. I don't just recognize a STOP sign, I *read* the word STOP, every time. That comes from being dyslexic. I learned to overcome it for reading and can read quite quickly, when I'm not tired. I waited to start the Harry Potter books until #7 was out, then read them all in two weeks. Not uncommon for me to blast through a 400+ page book in a single Friday night. "WTF? It's Six AM!" But writing always was a huge PITA, or pain in the hand, until personal computers. Hooray for the backspace key! I'm not a fast typist but unless I'm tired I'm mostly accurate. I do know instantly when I hit the wrong key, I know I'm tired when I hit the wrong key several times in a row and have to *think hard* to push down that @#%@% dyslexia and force my finger to the correct key. Then I go to bed. Same for when I'm reading and have to back up a bit and have read the same bit three or four times and still don't know what I've read and the book's fallen on the floor...

pdurrant
05-27-2010, 07:22 AM
Typos. BAEN has them a-plenty.

I've read all of the free BAEN e-books they've released on the CD-ROMs included with some of their hardcovers. I'm pretty certain they've all had typos.

BAEN does eARCs (electronic Advance Reader Copies) and charges money for them. ARC buyers are supposed to do things like proofread and provide feedback to the author. Sometimes the eARC and the final version have significant changes from reader feedback, but usually not.

But still the typos persist.


Everyone has typos. But if you point them out (be specific!) to Arnold Bailey of Webscriptions, THEY GET FIXED!

eARCs are not for people to proofread and report typos. They're to get extra money out of people addicted to particular authors or series. You're not expect to provide feedback. Baen has professional editors who do that. The point of eARCs is that there hasn't been time yet between the author handing in the manuscript and the eARC appearing for the final copy editing to have been done.



As for unicode, the great tragedy of unicode is that they didn't work out the utf8 encoding first (& that we'd need more than 65535 characters). If they had done that, we'd never have had utf16 and utf32, and the horror of BOMs and surrogate pairs.

rhadin
05-27-2010, 09:28 AM
I wonder if the authors, (their heirs) and or the publishers know what Sony et al are doing to their work. It is shoddy work at best and criminal at least. There is no excuse for bad editing and there is no way of getting your money back. Shame on Sony!!!

It's not Sony, its the publisher of the ebook. Sony is no more responsible for the errors than Barnes & Noble is for errors in pbooks that you buy at their stores. Unlike Amazon who converts ebook files to its own formats and thus can introduce errors, the files Sony sells are publisher provided.

aagstn
05-27-2010, 10:43 AM
I'm really disappointed in how many scan errors are in the recent Rex Stout books from Random House. I just finished If Death Ever Slept and I think every time in the book the word him or his was used it came up as Ms and Mm. Very annoying. I have read most of the ones released so far and all have dozens of scan errors. They show as version 3.0 at the end so I guess they didn't do a very good job looking for errors on versions 1.0 and 2.0.

Dellaster
05-27-2010, 11:14 AM
eARCs are not for people to proofread and report typos. They're to get extra money out of people addicted to particular authors or series. You're not expect to provide feedback. Baen has professional editors who do that. The point of eARCs is that there hasn't been time yet between the author handing in the manuscript and the eARC appearing for the final copy editing to have been done.

Exactly. In fact their info page on eARCs states:
Please do not send in typos and errors, we know the eARC is unproofed and the author doesn't need to be inundated with "corrections".

Patricia29
06-20-2010, 06:11 PM
It's not Sony, its the publisher of the ebook. Sony is no more responsible for the errors than Barnes & Noble is for errors in pbooks that you buy at their stores. Unlike Amazon who converts ebook files to its own formats and thus can introduce errors, the files Sony sells are publisher provided.

If I buy something from COSTCO, books included, and I find them to be substandard, I return them to COSTCO. When I buy books from Sony, there is no recourse.
If you are a reseller, it is your responsibility to ensure you are selling a quality product.
I have now been waiting over a month for Sony to respond to my first complaint.
My latest problem with a Sony download is text size. The book is The Traitor by Stephen Coonts, the text is so small, even when I maximize the size, it is still difficult to read. So far Sony's only suggestion is to redownload it, which I have done, but it is still no better.

seajewel
08-11-2010, 06:29 PM
I was just looking at a sample copy of Midnight Tides by Steven Erikson, and realized how many typos there are, just by skimming a few pages. It really bothers me that I would have to pay $10 for an ebook that is just chock full of errors. Especially in the case of the Malazan series, when names are often mistyped, spelled Beru on one page and Bern on another, or Coll and then Coil (i know there are two separate characters with the name, but in the ebook they're definitely mistakenly interchanged) it really detracts from the reading experience because so many of the names are strange, I end up not knowing what is the "correct" spelling of a character's name. I literally would have to look it up online to figure out what the correct name is. Tiste Edur in the Midnight TIdes sample shows up as TIFTE Edur in some places.

it really would bother me *slightly* less if we didn't have to pay so much, as much as or more than paperbacks cost, knowing that the paperbacks do not contain nearly as many errors. Just wanted to rant out of frustration.. I love ebooks, but I feel publishers have a long way to go before they realize they are not promoting and supporting the ebook market as they should.

ETA: the sample copy of Midnight Tides I refer to is the excerpt from Amazon Kindle edition.

corona
08-11-2010, 11:57 PM
Vince Flynn's books do seem to have a particularly horrible problem. I bought one not so long ago for my Kindle, and it was missing all the quotation marks, dashes, and full stops. I kept it, because it was a good story, but the almost complete lack of punctuation made it a "challenge" to read, especially the dialogue parts of it.

You can move on to William Gaddis!

cmdahler
08-12-2010, 12:24 AM
If I buy something from COSTCO, books included, and I find them to be substandard, I return them to COSTCO. When I buy books from Sony, there is no recourse.

Yes, there is. You bought that book from Sony with a credit card. If you find the product that you bought flawed or otherwise not-as-advertised (basically, any reason that you would be justified in returning a print book for a refund), then you can ask Sony for a refund. If they refuse, simply dispute the charge with your credit card company. Chargebacks are really expensive, relatively speaking, for any company. If people started doing this when they found a number of typographical errors in the ebook or other formatting errors or issues that make the book substandard, Sony (or Amazon, or whoever) would quickly start to clean up their act. As long as people keep the mindset that "there is no recourse" and don't do the simple act of calling the credit card customer service line and disputing the charge (takes 5 minutes, and you've got your money back), the booksellers have no financial incentive to improve their product.

AlexBell
08-12-2010, 04:04 AM
Yes, there is. You bought that book from Sony with a credit card. If you find the product that you bought flawed or otherwise not-as-advertised (basically, any reason that you would be justified in returning a print book for a refund), then you can ask Sony for a refund. If they refuse, simply dispute the charge with your credit card company. Chargebacks are really expensive, relatively speaking, for any company. If people started doing this when they found a number of typographical errors in the ebook or other formatting errors or issues that make the book substandard, Sony (or Amazon, or whoever) would quickly start to clean up their act. As long as people keep the mindset that "there is no recourse" and don't do the simple act of calling the credit card customer service line and disputing the charge (takes 5 minutes, and you've got your money back), the booksellers have no financial incentive to improve their product.

Now that's an excellent idea. I'll certainly check it out.

Regards, Alex

Quexos
01-26-2011, 06:29 PM
I'm currently reading "IT" by Stephen king on my e-reader and I have read almost 150 pages and I must say there are so many typos and missing spaces that I am flummoxed by such errors.
Countless missing spaces such as "shewas" for "she was", Italics wrongly set, many times "comers" where in the print you have "corners", one case of "aubumhaired" for "auburn-haired" and a few cases of a wrong letter such as "be went" for "he went" ...
Do you guys think this is a OCR error issue or something else ?

pdurrant
01-27-2011, 03:25 AM
I'm currently reading "IT" by Stephen king on my e-reader and I have read almost 150 pages and I must say there are so many typos and missing spaces that I am flummoxed by such errors.
Countless missing spaces such as "shewas" for "she was", Italics wrongly set, many times "comers" where in the print you have "corners", one case of "aubumhaired" for "auburn-haired" and a few cases of a wrong letter such as "be went" for "he went" ...
Do you guys think this is a OCR error issue or something else ?

Classic OCR errors. Complain to the publisher. Complain to the retailer. Get a refund.

bizzybody
01-27-2011, 05:42 AM
Since comers is such a rarely used and archaic word in English, almost always with the word all before it, any time English OCR software thinks it sees "comers" it should be flagged, tagged and bagged as corners unless all is right before it. http://www.thefreedictionary.com/All+comers

OCR software needs a lot more work on discriminating between lowercase m and rn. I've seen books where nearly every instance of each was recognized as the other. Same thing for some other troublesome letter pairs. A "Does this word really exist in English?" sanity check would cure tons of OCR errors. Of course that would require OCR software to include spelling and grammar checking too.

GeoffC
01-27-2011, 05:45 AM
OCR software is still not good enough - nor, it seems, do the publishers care about the quality until there are complaints.

One reason why books at DP go through 3 levels of proofing check after OCR...

Quexos
01-27-2011, 01:45 PM
I see.
However this remains unforgivable. All that is needed after a text has been OCR'd is for someone to proof-read it. Now don't tell me publishers can't afford that or did not think of that.

pdurrant
01-27-2011, 01:59 PM
I see.
However this remains unforgivable. All that is needed after a text has been OCR'd is for someone to proof-read it. Now don't tell me publishers can't afford that or did not think of that.

Either the publisher decided not to do any proofing, or someone messed up and released the raw text instead of the proofed text.

In the first case, get a refund, in the second case, get a refund and wait for them to fix it before buying again.

DreamWriter
01-27-2011, 02:27 PM
Having to read around typos or mis-scans in a book can certainly detract from the story! I've seen a lot of problems in e-books as well. It is disappointing. I don't understand why publishers don't take more care to produce a high-quality product.

Particularly in the case of indie books, I suggest that you contact the author/publisher. In most cases, they do care and any errors will be corrected.

mr ploppy
01-27-2011, 03:25 PM
I see.
However this remains unforgivable. All that is needed after a text has been OCR'd is for someone to proof-read it. Now don't tell me publishers can't afford that or did not think of that.

Like most businesses, publishers are looking at ways of saving money. Proff-redding was one of the things they dropped several years ago. Instead of paying people to do it, they print advance reader copies which the author sends out to fans. The fans then report any mistakes they notice and those get fixed. The ones that don't get noticed stay in the book.

Caltsar
01-27-2011, 03:53 PM
Most of the eBooks I buy are from B&N, and while I tend to stick to sci-fi and fantasy (as well as some non-fiction), the books I've bought have no more errors than the paper books I read. I've personally been happy with their quality, though I'll be the first to complain when errors that can easily be handled by skimming through the book or running a spell checker show up.

It does seem that newer eBooks are getting to be higher quality these days. I remember when I bought a few ebooks for my Palm back in the day, they were filled with ridiculous OCR errors and looked like the publisher didn't even open the file before sending it out.

Under the Covers
01-27-2011, 06:48 PM
More than just typos, one recently downloaded book contained all sorts of usage errors (too for to, roll for role, parish for perish, etc. ad nauseum), along with flat-out grammatical and sentence construction errors -- too many fundamental errors to attribute them to just typos, scanning, or proofing errors. After that book, I'd be happy to see a well written book with only typos.

Patricia29
01-28-2011, 10:34 AM
I just finished 'Fall of Giants' by Ken Follett, in every instance in the ebook, the country house in Wales is referred to as T? Gwyn. I believe it should be Ty Gwyn.
This was a major irritation, since it appears countless times throughout the book and there is no excuse for it!

HarryT
01-28-2011, 10:36 AM
Since comers is such a rarely used and archaic word in English, almost always with the word all before it, any time English OCR software thinks it sees "comers" it should be flagged, tagged and bagged as corners unless all is right before it. http://www.thefreedictionary.com/All+comers


I don't know about you, but I'd really rather not have the word "newcomers" (a relatively common word) converted into "newcorners".

HarryT
01-28-2011, 10:45 AM
I just finished 'Fall of Giants' by Ken Follett, in every instance in the ebook, the country house in Wales is referred to as T? Gwyn. I believe it should be Ty Gwyn.
This was a major irritation, since it appears countless times throughout the book and there is no excuse for it!

There is actually a sensible reason for that. The word would almost certainly have been "Ty" where the "y" had a circumflex accent. This is a Unicode character which is absent from many fonts. The "?" is the Sony's way of saying "this is a character which isn't present in my font".

bizzybody
01-30-2011, 04:39 AM
Which is why e-books for platforms that don't do Unicode need to have the text converted to extended ASCII. That has all the characters to handle most languages which use 'english' style characters.

Even better would be for the reader software to implement its own Unicode support using its own fonts.

HarryT
01-30-2011, 10:54 AM
Which is why e-books for platforms that don't do Unicode need to have the text converted to extended ASCII. That has all the characters to handle most languages which use 'english' style characters.


Is the "y with a circumflex" character present in extended ASCII?

It's not really relevant to this case, though, given that ePub is a Unicode-based standard.

Jellby
01-30-2011, 11:23 AM
Is the "y with a circumflex" character present in extended ASCII?

I think he/she meant that it should be "degraded" to something in extended ASCII, i.e., Tŷ -> Ty

It's not really relevant to this case, though, given that ePub is a Unicode-based standard.

And the Mobipocket format should support it too, if properly encoded. The problem is only the default font in the device not having the required character, nothing related to the format.

HarryT
01-30-2011, 11:27 AM
And the Mobipocket format should support it too, if properly encoded. The problem is only the default font in the device not having the required character, nothing related to the format.

Yes, that was my meaning: that this isn't a typo in the book.

Andrew H.
01-30-2011, 11:58 AM
I downloaded a sample on K4PC and found "Ty Gwyn." FYI.

HarryT
01-30-2011, 12:05 PM
I downloaded a sample on K4PC and found "Ty Gwyn." FYI.

That in itself is really a "cop-out", because "Tŷ" is a Welsh word ("House": "Tŷ Gwyn" means "White House"), whereas "Ty" isn't a word at all. Given, though, that "ŷ" isn't an ASCII character, most Welsh speakers are probably used to seeing "ŷ" written as "y".

bizzybody
01-30-2011, 09:26 PM
All these characters are in the extended ASCII set, or Windows 1252 which is pretty much the same thing. The extended ASCII set with line drawing characters is a creation of IBM.

I had to leave the semicolons off the UTF-8 codes because the forum software is not setup to leave *everything* between the code commands 100% exactly as entered. With the semicolons after the numbers the bleeping forum "helpfully" converts the codes to the characters.

Any e-book conversion software that can convert to formats for which there is a reader for non-unicode platforms should have an option to use extended ASCII or Windows 1252 encoding, including converting all these UTF-8 codes (with the semicolon of course) to their ASCII equivalents instead of to their Unicode equivalents.

The result looks exactly the same, but the file size can be significantly smaller.


!
!
"
"
#
#
$
$
%
%
&
&
'
'
(
(
)
)
*
*
+
+
,
,
-
-
.
.
/
/
0
0
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
:
:
;
;
<
<
=
=
>
>
?
?
@
@
A
A
B
B
C
C
D
D
E
E
F
F
G
G
H
H
I
I
J
J
K
K
L
L
M
M
N
N
O
O
P
P
Q
Q
R
R
S
S
T
T
U
U
V
V
W
W
X
X
Y
Y
Z
Z
[
[
\
\
]
]
^
^
_
_
`
`
a
a
b
b
c
c
d
d
e
e
f
f
g
g
h
h
i
i
j
j
k
k
l
l
m
m
n
n
o
o
p
p
q
q
r
r
s
s
t
t
u
u
v
v
w
w
x
x
y
y
z
z
{
{
|
|
}
}
~
~




ƒ









ˆ



Š



Œ

Ž















˜



š



œ

ž

Ÿ

 
&nbsp
¡

¢

£

¤

¥

¦

§

¨

©

ª

«

¬

­
*
®

¯

°

±

²

³

´

µ



·

¸

¹

º

»

¼

½

¾

¿

À

Á

Â

Ã

Ä

Å

Æ

Ç

È

É

Ê

Ë

Ì

Í

Î

Ï

Ð

Ñ

Ò

Ó

Ô

Õ

Ö

×

Ø

Ù

Ú

Û

Ü

Ý

Þ

ß

à

á

â

ã

ä

å

æ

ç

è

é

ê

ë

ì

í

î

ï

ð

ñ

ò

ó

ô

õ

ö

÷

ø

ù

ú

û

ü

ý

þ

ÿ

Œ

œ

Š

š

Ÿ

ƒ

































Like I said earlier, the *best* thing would be for e-book reader software to include its own Unicode support on platforms without native support, but unless someone else does it, that will never ever happen for Mobipocket since Amazon bought it for use on Kindle. Failing that, the only thing one can do when converting to formats for any Palm reader app or other non-unicode platform is to pre-convert the source to de-Unicode it, unless you like common punctuation replaced by spaces, blank boxes, 'weird' characters or simply removed and replaced with nothing at all.

If there's anyone here that knows the C# programming language, I posted a program with source code on the forum. It's a text string replacer that works if the list of text strings to swap is kept short enough. It still has some bugs, it can't handle the full list of UTF-8 and ASCII codes, it corrupts the list by replacing much of it with the unknown character box. With it fixed to handle long enough lists it'd be very useful for doing very fast replacing of any strings of text with any other strings of text.

JSWolf
01-30-2011, 09:33 PM
Given the ePub version, there is no reason not to embed a font that supports proper unicode and thus, the word would be correct.

Jellby
01-31-2011, 07:59 AM
All these characters are in the extended ASCII set, or Windows 1252 which is pretty much the same thing.

You are assuming there's one "extended ASCII", but that's not true, there are many. Which one should be chosen? The one you like best? Windows 1252 is comparable to ISO-8859-1, which is fine for Western European languages, but why not any other variant?

Most current devices are perfectly capable of showing a wide range of Unicode characters, there's no need to "downgrade" to some limited 8-bit encoding. If some characters are not properly displayed it's because the device is lacking a good font (and the possibility of using a custom one). Should ebooks be created with typos just so they show sort-of-OK in defective devices? I don't think so. What if next month a software upgrade fixes the devices? Now suddenly all those books would be of reduced quality for no reason (and don't hope a "corrected" version of every book will be released).

HarryT
01-31-2011, 08:07 AM
Most current devices are perfectly capable of showing a wide range of Unicode characters, there's no need to "downgrade" to some limited 8-bit encoding. If some characters are not properly displayed it's because the device is lacking a good font (and the possibility of using a custom one). Should ebooks be created with typos just so they show sort-of-OK in defective devices? I don't think so. What if next month a software upgrade fixes the devices? Now suddenly all those books would be of reduced quality for no reason (and don't hope a "corrected" version of every book will be released).

The book I'm currently proof-reading, John Buchan's "The Courts of the Morning", is about a revolution in a fictional South American country. Unfortunately, the eBook source is an ASCII version, so a lot of the work of proof-reading consists of restoring the accents to all the Spanish words that should have them.

elcreative
01-31-2011, 10:12 PM
Proofreading and proofreaders were one of the first things to bite the dust when we went electronic with books being produced on computers from author to printer... they weren't eliminated exactly but the process was made lazier by the assumption (from people who didn't know better) that spellchecking (and grammar checking!!!) could automate most of the process so reducing the need for proper proofing... :bookworm:


I see.
However this remains unforgivable. All that is needed after a text has been OCR'd is for someone to proof-read it. Now don't tell me publishers can't afford that or did not think of that.

jrlewis
02-01-2011, 08:20 PM
As a technical exercise, the typos in a commercially published ebook do make sense. Publishers usually follow a long and linear process of converting text from one format to another until the *final* version only exists in a special formatting program like InDesign or FrameMaker, not a standard text file like Word or Rich Text.

So when they export to a non-print-ready format, like Epub or Mobi, they're going to get all sort of administrative formatting garbage in the file. I would hope that they would have a proofer check specifically for this issue, but apparently they aren't doing a great job of it right now.

bizzybody
02-02-2011, 03:47 AM
Yet another common OCR bugaboo is reading the pair cl as a lowercase d. If OCR software just had a list of such common goofs and lists of words that often have recognition problems, then present them to the user with the surrounding words for correction, that would help a lot.

Another one I've seen a lot of is turning italic sans-serif uppercase I and lowercase l into forward slashes.

Much of it depends on the quality of the paper, which affects how much the ink spreads, but fer cripes sake, there shouldn't be a mixup between rn and m if the software simply compared the width of rn VS m. I don't see how it should be possible for it to see a lowercase m as rn, especially not with 100% of the lowercase m's in a book as I've seen a few times, and in the same ones every instance of rn was rendered as an m.

As for any device or OS that doesn't support unicode, it's not "defective", it just doesn't support unicode. It's also highly unlikely any such will ever get unicode support. Therefore e-book creation software should have the OPTION to create non-unicode output when making an e-book for reader software such as Mobipocket or TealDoc which has a version for those platforms.

HarryT
02-02-2011, 03:55 AM
As for any device or OS that doesn't support unicode, it's not "defective", it just doesn't support unicode. It's also highly unlikely any such will ever get unicode support.

Why do you say that? A fair number of devices in the last few years have switched from Mobi to ePub support, via a firmware upgrade, and hence have acquired Unicode support in the process. One example are the reading devices made by Bookeen, where the user can choose to install either a Mobipocket or an ePub firmware. It's really not unlikely at all!

ATimson
02-02-2011, 09:12 AM
As for any device or OS that doesn't support unicode, it's not "defective", it just doesn't support unicode. It's also highly unlikely any such will ever get unicode support. Therefore e-book creation software should have the OPTION to create non-unicode output when making an e-book for reader software such as Mobipocket or TealDoc which has a version for those platforms.
And then, what, retailers should start selling "Unicode" and "non-Unicode" versions of each book? That'll work real well.

Jellby
02-02-2011, 10:38 AM
Fortunately, Ernest Vincent Wright wrote Gadsby (http://en.wikipedia.org/wiki/Gadsby_(novel)) in 1939 for readers that don't support the letter "e" :rolleyes:

JSWolf
02-06-2011, 06:07 PM
And then, what, retailers should start selling "Unicode" and "non-Unicode" versions of each book? That'll work real well.

Just wait until I post my list of errors in Seize the Fire here and at the Trek BBS. It'll be a pretty good list.

As for Unicode, there is a solution. If ADE does not support the characters needed, just embed a font that works. Simple solution to the problem. Why publisher don't get how to do things like that, I have no idea.

rePete
04-08-2011, 05:43 PM
This doesn't happen with paper books

Perhaps it's because I'm a slow, plodding reader (I don't skim!!) and chew the dialog - I'm not in a hurry. Consequently, I find typos in MANY pb/hb that I annotate in the front of the book. When finished, I email the publisher with my list of typos so they can fix them next time around.

Got the next revision to one of O'Reilly's technical books by doing that :xmas:

Tome Keeper
04-15-2011, 10:10 AM
I'm not sure all issues are to do with OCR. At the moment I am very annoyed that the last 14 books I have read from Amazon have had typos! Mainly the being te or just e and several words in a row without spaces. I can cope with te, especially as the books in hard copy were rare and by my favourite author, but so many missing spaces is impossible to ignore!

GeoffC
04-15-2011, 10:20 AM
intheendofdays :hatsoff:

Welcome to Mobileread ....

Those examples you have listed can be OCR errors, as well as formatting problems. Did you tell Amazon ?

Giggleton
04-15-2011, 10:36 AM
Perhaps it's because I'm a slow, plodding reader (I don't skim!!) and chew the dialog - I'm not in a hurry. Consequently, I find typos in MANY pb/hb that I annotate in the front of the book. When finished, I email the publisher with my list of typos so they can fix them next time around.

Got the next revision to one of O'Reilly's technical books by doing that :xmas:

There's quite a few people that do this, I sent an email to Amazon asking about automating this process, with prerelease books sent to kindles as a reward for finding typos. Sounds like a fun game to me. One day we'll get there, not sure if Amazon will be taking us though.

:book2:

Tome Keeper
04-15-2011, 01:37 PM
intheendofdays :hatsoff:

Welcome to Mobileread ....

Those examples you have listed can be OCR errors, as well as formatting problems. Did you tell Amazon ?

Heard too many stories about people getting banned from Amazon for returns and complaints to let them know. As I said only a few effect my ability to read them.

sourcejedi
04-15-2011, 04:28 PM
Heard too many stories about people getting banned from Amazon for returns and complaints to let them know. As I said only a few effect my ability to read them.

The cases I heard were for multiple returns of physical goods, where Amazon was taking a loss. I'm not saying they were all treated fairly, but a _complaint_ about an ebook shouldn't entail any risk. And they have a relatively good reputation for ebook customer service. [Kobo took something like 5 round-trips to realize their free sample downloads on the website were the crap automated OCR from the Internet Archive]. Amazon are supposed to be relatively good at passing on a simple "this ebook is not up to scratch" to publishers, as opposed to others who expect readers to provide full lists of typos.

Personally I'd report anything I found in a paid book; publishers need to know there's a problem. In the few cases where you've found it an annoyance, it seems a pity just to sit on that information :).

sourcejedi
04-15-2011, 04:46 PM
Since comers is such a rarely used and archaic word in English, almost always with the word all before it, any time English OCR software thinks it sees "comers" it should be flagged, tagged and bagged as corners

<grin>

I found one instance of exactly the same error in #1 of Diane Duanes 'Young Wizard' series. That particular passed my eye completely. I found it after running a scanno-finding pass.

The guys at Project Gutenberg Distributed Proofing have accumulated a database of the commonly corrections they have to apply, and automated the process of scanning for them with a tool called GuiGuts. It was beautiful watching it find things like that.

It doesn't substitute for proofreading -- and it's still a manual process -- but it'd work great as a backstop, or a verification pass. (If it picks up a lot of errors, it's probably also missed a similar number of errors, but it serves as a good suggestion that the book hasn't been proofed as well as the Distributed Proofers would have managed).

Hellmark
04-15-2011, 05:11 PM
I remember reading one book, where they apparently forgot to edit the last page, and so the final page of the book was so littered with OCR errors that it was unintelligible. Luckily my girlfriend had a copy of it on the book shelf, so was able to finish the story.

SeaBookGuy
04-15-2011, 05:49 PM
In a recent library book, "t" often became "l" so that the word "to" would appear as "lo", etc.

1611mac
04-15-2011, 06:03 PM
I mostly read classic works from the late 1700's to 1800's so my "books" are mostly ocr scans from places like google books, etc. These are free works so they have not been cleaned up.

Add to that that some are in "old english" which tends to drive OCR software nuts.

But thru it all... I am grateful to be able to read the works and most are surprisingly pretty easy to read.

.

John the Miner
04-15-2011, 08:50 PM
What a joy it is to find some kindred spirits in this discussion re typos in ebooks!
I've been a proofreader since 1957 (spending 20 years of that time as the boss of a newspaper reading room); now I'm "retired" and running my own editing/proofreading business.
Late last century, newspaper publishers decided it was more economical to dispense with reading rooms and rely on their journalists to write and correct their own material, with some amusing consequences.
In my home town, the [statewide] daily instituted a box on its letters pages declaiming that it is "committed to accurate, fair and fearless publication of news and commentary". This subsequently became known as the "OOPS!" column or the "BAD JOKE" column among the staff when it first appeared due to the few and very selective occasions that it admitted that a mistake had appeared in the pages.
Now, in this electronic world, when anybody can write and "publish" anything in cyberspace without any form of editorial control (and certainly no proofreading input), it seems the chickens have come home to roost.
The use of OCR technology shows the limitations of its own usefulness in transcribing the printed word to its electronic form.
Unfortunately, the standard of education today means that more and more "readers" are seeing fewer and fewer literals when they scan a page, with less appreciation of any mistakes they encounter.
It also means that these same "readers" present the words they put out there in a barely readable form as far as intelligible discourse is concerned. They might know what they meant to say, but it's obvious they haven't proofread their own entries, thereby losing any impact their statements might have had.

1611mac
04-15-2011, 08:57 PM
What a joy it is to find some kindred spirits in this discussion re typos in ebooks!
-edit-
The use of OCR technology shows the limitations of its own usefulness in transcribing the printed word to its electronic form.
-edit-.

Funny how things have come "full circle" for me personally. In the late 70's my first real job was working for a yellow page company in the "computer room." (a 16 bit nova clone)

Input for the yellow page ads came this way: The copy was typed on IBM Selectric typewriters and then my job was to "scan" (OCR) the typed sheets to convert to "electronic format."

Soon after that "terminals" came in and there was no more scanning.

Now here we are in 2011, 35 years later and OCR is a major reason I can enjoy old classic books!

Weird :thanks:

JSWolf
04-15-2011, 09:08 PM
intheendofdays :hatsoff:

Welcome to Mobileread ....

Those examples you have listed can be OCR errors, as well as formatting problems. Did you tell Amazon ?

A lot of those errors are also because the publisher uses a PDF file as the source. That generates all kinds of errors.

John the Miner
04-15-2011, 09:29 PM
Ah, 1611mac! A man after my own heart.
My first introduction to electronic typesetting was late last century when the company I worked for (a Murdoch enterprise) put in paper tape readers to produce its papers in a cold type format. PAPER TAPE!
This technology had already been outdated for a couple of decades when they decided to resurrect the system from another of their newspaper concerns (we believed it had come from a bankrupt South Sea Island newspaper they owned and they had to keep it going for a while until something better (and cheaper!) came along.
The paper I worked for was touted as the state's most successful tabloid, outselling its nearest rival (a broadsheet) by thousands and thousands. They also made a point in saying, in the broadsheet, that it would never go tabloid.
However, when el momento de la verdad finally arrived, they had no qualms about closing the tabloid down and transforming the broadsheet into a tabloid.
I'll always remember that they had had a competition which had been running for weeks at the time, the main prize of which was a round-the-world trip. Of course, the suckers had to buy the tabloid to get an entry token to include with their entry so they had a chance to win it.
When we got the word on a Saturday afternoon that the paper (a Sunday publication) was going belly up, their last issue had the plea: "Don't forget to keep those cards and letters coming in, folks, if you want to win!"
Needless to say (although I'm saying it), nobody got to enjoy the promised trip, but the entries sure swelled the company's coffers by a goodly sum. Funny, that.
They informed the gullible public in a seven-line paragraph buried away on about page 36that the paper was closing down

AlexBell
04-16-2011, 04:18 AM
Welcome, John. It's a pleasure to see another user of the ECO Reader, presumably someone from Australia, and another person who values good proof reading.

Regards, Alex

John the Miner
04-16-2011, 05:36 AM
G'day, Alex!
Yes, an Australian.
Sorry. I had to dice the ECO reader ... when I got my first one, it went well for two charges (about 10 days' use), then it refused to start up the next time. I replaced that one with a new one, which refused to shut down after its initial first charge and first use.
Now I use a Laser 7, a media player/ebook reader which is going perfectly. As well, I have a couple of Cobolt media player/ebook readers. None of them use e-ink displays, which, when you think of it, are pretty limited in that they depend on outside light to read the screen, or else you have to buy a light to shine on the screen while it's being read.
Each type of reader has its own good/bad points, I suppose. Different strokes for different folks.
I came across the Mobile Read forum when I was searching for free downloadable ebooks for the e-readers. I entered the discussion on the merits or otherwise of the Kindles and the rights or wrongs of stripping DRM content off downloadable books (which I'm not in the least interested in). I'll stick to the classics and out-of-copyright texts.
I put my five cents' worth in and introduced the subject of the vast number of mistakes present in downloadable books and I gently suggested that some of the correspondents put their messages across in a barely understandable form, but was shot down in flames and told to get off and start up a new "tread", one called it.
So I'm sitting back and watching how this discussion goes (particularly the suggestion that the publishers should be shown just how shoddy some of their products are).
That will be interesting for all concerned on this side of the screen, but I think it will be absolutely ignored by the ones it will be directed to.

AlexBell
04-16-2011, 06:07 AM
Hello, John

I'm sorry you've had trouble with your ECO Reader - mine is still working, though my Sony is my favourite.

And I'm puzzled that you had the reaction about proof reading. There have been many threads about typos in ebooks. I started one entitled 'Hall of Shame' but it petered out. Still, I'm sure many people share our views.

Regards, Alex

John the Miner
04-16-2011, 06:16 AM
... and did you happen to take notice of the amount of spelling mistakes put in their submissions by the ones who contacted you?
It amazes that people can sit down at a keyboard and bash away at it, put their thoughts down on screen and hit the "send" button without reading what they wrote.
It's a DFW, to use an Australian acronym ... maybe you've heard it?

crich70
04-16-2011, 06:29 AM
Ah the IBM Selectric. Probably the best typewriter ever made. Some 20 yrs ago I took a college course in office machine repair and we had to strip the IBM Selectric down and put it back together in a working order. Not an easy thing to do since each part worked with the others around it so that a mistake in a setting could throw a lot of things off. No doubt some errors do creep in due to the person who is supposed to scan it for typo's isn't paying proper attention to what he/she is doing, but I have a feeling that as long as humans write there will also still be some typo's that creep in.

I remember a couple that the (former) editor of a paper caught once (they were mentioned when he was being roasted). One said, "hens for sale, all dressed and ready for the rooster." It was supposed to read roaster instead of rooster. lol. The other was a missing comma in an ad to sell a car. Part of the ad advertised "nine person roof rack," because a comma was left out between person and roof. *snicker*

Funny how things have come "full circle" for me personally. In the late 70's my first real job was working for a yellow page company in the "computer room." (a 16 bit nova clone)

Input for the yellow page ads came this way: The copy was typed on IBM Selectric typewriters and then my job was to "scan" (OCR) the typed sheets to convert to "electronic format."

Soon after that "terminals" came in and there was no more scanning.

Now here we are in 2011, 35 years later and OCR is a major reason I can enjoy old classic books!

Weird :thanks:

rogerVA
04-16-2011, 11:10 AM
Typos are rampant in the ebooks, especially in the various cheapo ones I might buy. I notice the same thing on various professional websites, too. There's something about typing for a digital audience that seems to make publishers care less about making sure every word is right. It suggests they think they're more ephemeral in nature than the printed (on paper) word, which is, I suppose, true. This annoys me greatly and is another reason to hold onto "real" books of titles/authors you really enjoy.

1611mac
04-16-2011, 11:40 AM
I'm not a speed reader (wish I were) but I wonder how typo's affect a good speed reader.

HarryT
04-16-2011, 12:55 PM
I mostly read classic works from the late 1700's to 1800's so my "books" are mostly ocr scans from places like google books, etc. These are free works so they have not been cleaned up.

Add to that that some are in "old english" which tends to drive OCR software nuts.

But thru it all... I am grateful to be able to read the works and most are surprisingly pretty easy to read.

.

You know that many of the classics are available here at MR in carefully proof-read versions?

Books printed written in the 18th century are not "Old English", by the way. "Old English" died out around the 12th century, and refers to the language of such works as "Beowolf", the opening lines of which are:


Hwt! wē Gār-Dena in ġeār-dagum,
ēod-cyninga, rym ġefrūnon,
hū ā elingas ellen fremedon.
Oft Scyld Scēfing sceaena rēatum,
monegum mǣġum, meodosetla oftēah,
egsode eorlas. Syan ǣrest wear
fēasceaft funden, hē s frōfre ġebād,
...


which I'm sure you'll agree is not very like 18th century English.

pidgeon92
04-16-2011, 12:55 PM
I purchased Say You're One of Them from Fictionwise. In the second story of the book, there are several instances where something - I am assuming it is some sort of odd foreign character - is being replaced by "do02D9," and worse, sometimes by "zoke02D9ke02D9."

HarryT
04-16-2011, 12:56 PM
I'm not a speed reader (wish I were) but I wonder how typo's affect a good speed reader.

I take it that was deliberate? :)

1611mac
04-16-2011, 01:15 PM
You know that many of the classics are available here at MR in carefully proof-read versions?

Sorry, when I say "classics" I'm referring to classics in my field which is church and bible history. When I say "old english" I'm referring to "u's" being "v's" and "s" being "f", and also the early spelling differences. Such as "The Prouerbes of Solomon the sonne of Dauid, King of Israel, To knowe wisedome and instruction? as found in the first printing of the King James Bible.

1611mac
04-16-2011, 01:16 PM
I take it that was deliberate? :)

good catch!

John the Miner
04-16-2011, 01:41 PM
Not a typo in the usual cases we are talking about, but I used to work for the state education department.
Every year they'd stack on a slap-up dinner for all the department's bigwigs. One year I got to proofread the menu at the last minute. Among the gustatory delights on offer for the night was "roast prostitute".
I guess the spellchecker couldn't find "prosciutto" in its dictionary.
Another time, in my early days at a newspaper, I proofread a full-page broadsheet ad for a clothing store. The ad was for men's shirts and featured a diagonal line from bottom left to top right saying MEN'S SHIRTS in hand-picked three-inch poster type (you know the size and font that used to decorate a wire frame outside newsagents' shops blatting the day's headlines -- another thing that's gone the way of spats, or the dodo).
Well, the comp who set the ad up on the stone, the galley boy who pulled the proof, the proofreader who read it (me), the copyholder who read the ad to me, the linotype operator who corrected the typos in it, the comp who inserted the correx, the proofreader who revised it (me, again), the stereo bloke who put the stone under the press for flonging, the press tech who paginated the pages, the press room guy who ran his eye over the machine proof -- all overlooked the missing R in the big line of type.
Think of the legendary image of the editor running down the steps to the press room screaming "Stop the presses! Stop the presses!"? Well ...
I believe they ran about 20,000 copies of the paper through until they could stop the presses. The editor gave strict instructions that not one was to be allowed to leave the building and ordered that every last one had to be pulped, under pain of instant dismissal.
I fronted him and he said, "I'm not blaming you; you couldn't be so stupid as to let that one go deliberately. Or are you?"
Somehow or other, I survived.

Tonycole
04-22-2011, 01:31 AM
Superb! I wish all the typos I am confronted with in almost every ebook I read these days was as funny as that one!
I mostly get my ebooks from sources such as Smashwords, and invariably they are full of typos.
Given that the prices are generally in the $2 area, I should perhaps not be too critical, but I feel strongly that any author who has any pride in his or her work should take the trouble to proof read their masterpiece carefully.
Am I wrong in this idea?
www.ebookanoid.com

Michael J Hunt
04-22-2011, 05:42 AM
Because publishers these days expect high standards of self-editing from new writers, I shall be running a workshop on the subject at a literary festival next week. I've never tried such a workshop before although I've been running novel writing support groups for the last five years. While there are obvious overlaps, it wasn't until I was preparing suitable exercises that I became aware of the differences between the two subjects. I'd be really interested to hear comments about this and I wonder if others have thought of bringing would-be novel writers from their home towns together to try something similar.

Here's my introductory quote:

Self-editing isn't so much correcting grammar but making it more effective; it doesnt require a deep knowledge of grammar its more instinctual. Youre aware that something isnt quite right, so you fix it. It doesnt mean that what youve written is ungrammatical, it may just be a bit clunky.

sourcejedi
04-22-2011, 06:21 AM
I mostly get my ebooks from sources such as Smashwords, and invariably they are full of typos.
Given that the prices are generally in the $2 area, I should perhaps not be too critical, but I feel strongly that any author who has any pride in his or her work should take the trouble to proof read their masterpiece carefully.
Am I wrong in this idea?

Please don't dump unrelated URLs into conversation :). That's what people use signatures for.

I think there's actually a reasonable argument in favour of typos or other simple mistakes that produce non-dictionary words. Often the eye will skip over them without noticing. And naively running a spellchecker over something can produce even worse results. If there's a tradeoff between a laborious proof-reading of spelling and typing, v.s. honing word choices, grammar, and how well sentences flow, etc., I'd rather a budding author spent their finite time and enthusiasm on the latter.

I'd much prefer a brainfart like "we we" for "we were"... even something like "struggle to breath[e]"... than common incorrect usage, like "should of" (should've), or multiple long sentences in sore need of a comma or two.

Or to put it another way -- there's a fantasy with adult elements that's written as a serial online (read: blog), as a full-time job, where after several years the author still relies on commenters to correct multiple errors per post. Perhaps it's an unfair comparison, but it does work really well. I don't see why Smashwords doesn't encourage and support fixing of typos (or scannos in backlist material) by readers.

raac
05-19-2011, 06:34 PM
There is no question that e-books contain errors not present in the original. Here is a excerpt of an e-mail I sent to Penguin today.

<quoting>
I recently bought the e-book version of Collapse, by Jared Diamond. The book itself is excellent, but the e-book has a number of deficiences not present in the paper version, despite the latter being cheaper in some stores.

Firstly, the e-book version contains none of the images (apart from maps) present in the original. Other e-books contain images and the devices are able to display them, so why have the been removed from this book? Secondly, the text contains references to the missing images. Obviously little care has been taken in editing when converting from paper to e-book format. Thirdly, the index is useless: it is merely an alphabetical list which contains no page numbers or hyper-links to the listed terms. Fourthly, there seems to be some problem with the font embedding because on my Sony PRS-950 some character are rendered as question marks. This occurs for all the symbols separating terms in the chapter summary lists, and for some characters which have accents.

. . .

E-books have a number of disadvantages for the reader, such as an inability to lend or sell them. If, despite this, you wish to charge the same price as the physical book then, please, at least create them with same care and do not remove content present in the original.
</quoting>


I always write to publishers when stuff like this happens. They must be told that it's not acceptable to sending out half-baked jobs of this sort.

wodin
05-19-2011, 07:01 PM
I understand seeing typos in books derived from the darknet, where pbooks have been scanned and OCRed. OCR is not an exact science and errors inevitably crop up, but I don't understand why there are typos in ebooks from publishers.

After all, they already have electronic versions of the books that feed the typesetting software. Even the most entry level programmer should be able to produce a script to convert that to any ebook format necessary. So why do we still get typos??

:blink::blink:

JSWolf
05-19-2011, 07:16 PM
I understand seeing typos in books derived from the darknet, where pbooks have been scanned and OCRed. OCR is not an exact science and errors inevitably crop up, but I don't understand why there are typos in ebooks from publishers.

After all, they already have electronic versions of the books that feed the typesetting software. Even the most entry level programmer should be able to produce a script to convert that to any ebook format necessary. So why do we still get typos??

:blink::blink:

Part of the problem is that a lot of publishers use a PDF file as the source for the eBooks and that always brings it's own set of problems.

raac
05-19-2011, 07:39 PM
I just don't buy that as a suitable excuse. A publisher should be proofing what they sell. Why is it the case that ebooks are exempt from that? The example I cite above has nothing to do with OCR errors, it's just a shoddy job.

I suspect what's happened is that publishers are falling over themselves to enter the e-book market, but that the production pipeline for books is still in its infancy. The market was originally too small for them to devote many resources to it. That's changing and I hope the quality will follow suit.

DiapDealer
05-19-2011, 08:05 PM
I can honestly say that after several years and purchasing hundreds of ebooks, I've only had one that I would consider absolutely shoddy--with regard to formatting. I returned it for a refund and contacted the author, who put me in touch with someone from the publisher who wanted location numbers and examples of the errors.

All of the rest have had the occasional typo that I'm willing to overlook since the same typos occur in most of the Dead Trees I buy as well.

wodin
05-19-2011, 08:24 PM
Part of the problem is that a lot of publishers use a PDF file as the source for the eBooks and that always brings it's own set of problems.


I doubt if the authors are writing in Adobe Acrobat.

elcreative
05-19-2011, 08:30 PM
The publishers make PRINT books... one of the primary standards for final format (to send to printers) is PDF... since it is the format actually used , it often is the only format retained by the publisher and this then has to be converted to the relevant eReader formats... Nobody suggested that authors write in Acrobat but equally the publishers would have had no reason to maintain anything other than the final format used...


I doubt if the authors are writing in Adobe Acrobat.

paola
05-20-2011, 02:52 AM
There is no question that e-books contain errors not present in the original. Here is a excerpt of an e-mail I sent to Penguin today.

<quoting>
I recently bought the e-book version of Collapse, by Jared Diamond. The book itself is excellent, but the e-book has a number of deficiences not present in the paper version, despite the latter being cheaper in some stores.

Firstly, the e-book version contains none of the images (apart from maps) present in the original. Other e-books contain images and the devices are able to display them, so why have the been removed from this book? Secondly, the text contains references to the missing images. Obviously little care has been taken in editing when converting from paper to e-book format. Thirdly, the index is useless: it is merely an alphabetical list which contains no page numbers or hyper-links to the listed terms. Fourthly, there seems to be some problem with the font embedding because on my Sony PRS-950 some character are rendered as question marks. This occurs for all the symbols separating terms in the chapter summary lists, and for some characters which have accents.

. . .

E-books have a number of disadvantages for the reader, such as an inability to lend or sell them. If, despite this, you wish to charge the same price as the physical book then, please, at least create them with same care and do not remove content present in the original.
</quoting>


I always write to publishers when stuff like this happens. They must be told that it's not acceptable to sending out half-baked jobs of this sort.
I find that Penguin produced the ebook version without all of the images outrageous: I am very curious to know what they replied to you (hope they did!)

bizzybody
05-20-2011, 03:17 AM
There is no question that e-books contain errors not present in the original. Here is a excerpt of an e-mail I sent to Penguin today.

<quoting>
Fourthly, there seems to be some problem with the font embedding because on my Sony PRS-950 some character are rendered as question marks. This occurs for all the symbols separating terms in the chapter summary lists, and for some characters which have accents.
</quoting>


Check to see if your Sony PRS-950 supports Unicode. If not, you'll get those character issues. The only way to fix them on a device that has no Unicode support is to edit the source to replace Unicode characters with their ASCII or Windows-1251 equivalents - if the book and your device are English.

Another possibility on non-Unicode devices is for the reader software to provide its own support for it, but that's something I've yet to see in action.

Most devices, like Palm OS PDAs and phones, without native Unicode support are never going to get a software update to add it.

Here I reiterate my pitch for the OUR or One Ultimate Reader app. Cross platform for iOS, Palm, WebOS, Android, Windows, Linux, Symbian etc. and able to open Mobi/Kindle, ePub, Rocket, TealDoc, PalmDoc, Plucker and more - while providing its own Unicode support for platforms that don't have it.

Just needs some coders who really love to read and are fed up with the multitude of e-book formats and having to run multiple apps to read them. ;)

Jellby
05-20-2011, 04:01 AM
Fourthly, there seems to be some problem with the font embedding because on my Sony PRS-950 some character are rendered as question marks.

Is there really an embedded font? It could be that there's nothing wrong in the book, just that the default font in the PRS-950 (and others) does not have those particular characters. Should a book contain workarounds for the case some reader does not provide a reasonably complete font (or the possibility of selecting a customized font)? I don't think so.

After all, they already have electronic versions of the books that feed the typesetting software.

I'm afraid that's where the reasoning fails. It seems most publishers don't have electronic versions of the books they publish, at most they have the print-ready PDF versions they send to the printer, as others have said.

raac
05-20-2011, 09:02 AM
Yes, I think it does have something to do with Sony not supporting the unicode used in the book. It translates some symbols correctly but not all of them. The book looks ok on my computer's screen. It is for these reasons that I listed that point as number four. The lack of images and the issues with their references is more serious. Still, if there is a unicode compatibility issue I don't see why publishers can't use the ASCII character codes so we don't have this problem.

EDIT:
I should say that the images definitely aren't there because I've exploded the e-pub and looked for them. In the past I've discovered missing images this way. I too am interested in what Penguin will say...

Jellby
05-20-2011, 01:26 PM
Still, if there is a unicode compatibility issue I don't see why publishers can't use the ASCII character codes so we don't have this problem.

The problem is not the encoding of the book, it is the lack of an appropriate glyph in whatever font is used.

What do you mean with "ASCII character codes"? I guess it's one of these:

1) Use named or numerical entities instead of the Unicode character. For instance, instead of "Pea" write "Pe&ntilde;a". This does not solve anything, no matter whether you use "" or "&ntilde;", the font does not have the character, and it shows a box or a question mark instead.

2) "Downgrade" to some similar character that is in the ASCII set, removing diacritics etc. For instance, instead of "Pea" write "Pena". This risks to be completely wrong and misleading, "pea" means rock, boulder, while "pena" means pain, sorrow. There's a reason why diacritics exist in most languages.

raac
05-20-2011, 02:35 PM
T
1) Use named or numerical entities instead of the Unicode character. For instance, instead of "Pea" write "Pe&ntilde;a". This does not solve anything, no matter whether you use "" or "&ntilde;", the font does not have the character, and it shows a box or a question mark instead.


That's what I meant, yes. I suppose you're right it wouldn't help. If the book is packaged with a suitable font that contains that character then shouldn't everything be good?

Jellby
05-20-2011, 02:58 PM
If the book is packaged with a suitable font that contains that character then shouldn't everything be good?

Yes, assuming that the reading software supports embedded fonts. But that would force a particular font on the user on readers that would otherwise support selecting a custom font (e.g., the Cybooks).

Really, the culprit here is Sony (for not allowing custom fonts on their readers) and Adobe (for not providing a better Unicode coverage in the default font). There's no excuse for any of them.

bizzybody
05-20-2011, 05:49 PM
Here's a program that can read in a UTF-8 encoded HTML file and replace the UTF-8 HTML codes with the exact extended ASCII equivalent. http://www.mobileread.com/forums/showthread.php?t=109996

It's not just for that, it can be used to process any text file and swap any specific string(s) with other text string(s). It's written in C# and needs a bit more debugging because if the replacement list is too long it does things it should not do.

As is, it can handle enough to swap the most common accented characters used in English, as well as the punctuation characters. Debugged to handle any length swap list, it could be a very useful text file manipulation tool. It's already faster than any word processor or text editor for doing huge numbers of replacements.

With a full character set swap file (which it currently can't handle) one could use it for one time pad cipher codes. ;) Could even run a file through several swaps to swap words for code words then totally scramble all the letters. The receiving person would need correctly formatted swap lists, used in the right order, to unscramble and decode.

WTH use UTF-8 for punctuation when ASCII and ordinary character encodings for Windows and other systems have characters like left and right quotes that produce exactly the same visible result? Unicode for standard characters when there's no need is text-bloat.

Replacing a couple thousand left and right unicode double quote marks with the left and right ASCII versions can reduce the file size quite a bit! A UTF-8 code is up to 7 characters, if leading zeroes are used. &#nnnn; One could write a whole text file that way but it'd be six times larger than using plain characters.

Another method that mostly works on HTML source files is to Save As Filtered HTML from Microsoft Word, but that can introduce its own issues with Microsoft's 'additions'.

Jellby
05-21-2011, 03:31 AM
Here's a program that can read in a UTF-8 encoded HTML file and replace the UTF-8 HTML codes with the exact extended ASCII equivalent. http://www.mobileread.com/forums/showthread.php?t=109996

I use recode (http://recode.progiciels-bpi.ca/index.html), which is very easy:

recode utf8..html file.html

WTH use UTF-8 for punctuation when ASCII and ordinary character encodings for Windows and other systems have characters like left and right quotes that produce exactly the same visible result? Unicode for standard characters when there's no need is text-bloat.

Replacing a couple thousand left and right unicode double quote marks with the left and right ASCII versions can reduce the file size quite a bit! A UTF-8 code is up to 7 characters, if leading zeroes are used. &#nnnn; One could write a whole text file that way but it'd be six times larger than using plain characters.

I think you are inverting the terms. Real ASCII has only 128 characters, everything else must be represented through (named or numerical) entities. &rsquo; and & #8217; are "ASCII representations" in this discussion, as they use ASCII characters to represent another character that is not in the ASCII set. This is where text bloat is possible.

Using Unicode characters means using some Unicode encoding to represent the character directly, not through entities like above, so I can just write "" or "". These, in UTF-8, take at most 4 bytes, and typically 2 bytes (for Latin, Cyrillic or Greek scripts) or 3 bytes (for some punctuation).

But anyway, in ePUB all files are compressed, so the "bloat" introduced by the entities will be largely cancelled (since they are repetitive sequences, they can be more efficiently compressed).

raac
05-23-2011, 11:07 AM
Penguin have so far only sent me a stock reply, saying that they have forwarded my message on to the appropriate department and may contact me again. We'll see what happens...

Michael J Hunt
05-24-2011, 09:11 AM
I'm not a Kindle user, but I was surprised (shocked, dismayed) to see a full-page Kindle advert on the back cover of the Radio Times (a high profile weekly magazine in the UK) that displayed a page from 'Ordinary Thunderstorms', where the em-dash, or even the shorter en-dash, has been superceded by a hyphen. At first I thought 'river-all' was some obscure feature of a river, until, in the same sentence, I came to 'no doubt-but let's wait'. The next paragraph starts with, 'There he is-look-stepping hesitantly down from a taxi'.

I found this so distracting, I couldn't read on - even though it was only a single-page advert. There is no way that I would buy a Kindle if all their books are edited in this way.

Am I alone in finding this disturbing? Or is it common practice in e-readers, which regular customers accept without complaint?

DreamWriter
05-24-2011, 10:54 AM
I just downloaded a sample of Ordinary Thunderstorms so I could see what you are talking about. Actually, those aren't hyphens. There are en dashes where there should be the longer em dashes. (If you still have the advertisement, compare the en dashes you referred to with the hyphens in "pale-faced" and "even-featured" if they show there.)

I find it very difficult to read that way too. I'm not sure why the publisher did that. It's very easy to code in em dashes. It was certainly a very poor example for Amazon to use in their Kindle advert.

I have to say that I have not seen that in a Kindle ebook before. I've usually seen the proper em dash used, two hyphens together, or space-hyphen-space.

Edited to add: When I created my husband's ebook, I did use the proper em dash. But there is a drawback, on the Kindle anyway. Kindle attempts to justify text, but it cannot hyphenate. Text is reflowable, so a publisher cannot control this either. If a line break occurs at an em dash (or an en dash), the Kindle cannot break it right after the dash, as you would see in print. Instead, it treats the word-em dash-word as a block and carries it all to the next line. This can leave a very unsightly space at the end, where the line broke. There's nothing that can be done about that. That's one reason why some people use space-hyphen-space instead of em dash in ebooks. (And others probably don't know how to create the em dash.)

This doesn't explain why the publisher used the en dash instead of the em dash in the book you cited, but I wanted to point out that there are some related difficulties with ebook formatting.

SeaBookGuy
05-24-2011, 11:25 AM
Speaking of em dashes -- my last ebook had those instead of a final ess-apostrophe, so I was faced with things like " ... my parents -- car, the neighbors -- children" etc.

DreamWriter
05-24-2011, 11:31 AM
Speaking of em dashes -- my last ebook had those instead of a final ess-apostrophe, so I was faced with things like " ... my parents -- car, the neighbors -- children" etc.

Ew, that's awful! I can't think of any reason why they did that.

bizzybody
05-25-2011, 12:15 AM
The attached file is a text file with the UTF-8 codes and their extended ASCII or Windows-1252 equivalents. (Or ISO 8859-1.) Note that the non-breaking space has the HTML "friendly" code because that's a non-printable character, also non-type-able without using the Alt+nnn code. The HTML code works with any book conversion software I've used.

Any Unicode supporting system should *not* need any of these characters' Unicode versions or UTF-8 codes in order to properly display them.

In fonts like Terminal, or the ANSI set (which Terminal is a monospaced TrueType clone of), some of the characters are different, but you won't encounter that on PDAs or book readers.

If you want your book to reach the widest possible audience, without getting questions about why there's all those weird characters or boxes or why the punctuation is all missing or replaced with nothing and the words jammed together... use the normal characters on this list instead of their Unicode versions, or in HTML their UTF-8 codes.

If the language you're using in your book has characters not in this list, then it's extremely likely the people reading it will have a device that supports Unicode or some other method of displaying those characters.

The main reason for all these issues with character encoding is America's fault. Since the vast majority of personal computers are still based on Ye Olde IBM PC, which was originally designed by Americans for English speakers, support for "foreign" characters was pretty much an afterthought for MS-DOS and PC-DOS. A similar problem was built into the early Internet (which is *not* the World Wide Web), which in its early years was all American. All the characters required for English could be encoded using 7-bit words, so that's how it was done, leaving the one bit always assumed to be zero unless commands were sent to specifically initiate a binary file transfer.

Remember that even mainframe computers 30+ years ago had memory measured in kilobytes. A system with a whole megabyte of RAM had a gigantic amount of memory to play with.

That's why the BinHex encoding format was created for sending Macintosh files across the internet. Many of the early routing systems were set to ignore the leftmost bit so that all outgoing traffic had that bit set to zero, no matter what it had been when it came in. BinHex uses only 7-bit text characters, thus it would survive transits through 7-bit routers. The MacBinary format used 8-bit text characters and was up to 1/8th more compact, which was a big savings when a 3600 baud modem was "screaming fast" and there was no such thing as unlimited data accounts.

So when you see weird junk in your books, first blame the English-centric American pioneers of the micro computer and the Internet, then blame the people at the company who made your reading device for not getting on the Unicode bandwagon from the start. ;)

In other words, there's really no excuse for Palm OS (or any other PDA or book reader) to not have Unicode support, since the first standard for it was completed circa 1990~91 and the first Palm didn't go on sale until 1996!

Michael J Hunt
05-25-2011, 11:05 AM
Hi Dream Writer. So it isn't just me - I'm relieved to hear it. What I find unbelievable, is that a company like Amazon didn't spot that for themselves when they agreed the advert. Talk about compounding an error.

One thing you mentioned that I'd like to pick up on, is where you state 'Some people probably don't know how to create the em-dash'. You can count me in on that - I assume you're referring to MicroSoft Word. How I do it, is copy an em-dash from the text and then paste it where I want it. Alternatively, I pick one out of 'Symbols', then copy it for further use. Cumbersome, I know, but it's far better than having hundreds of en-dashes to convert during editing.

If you know how to activate a consistent em-dash in Word, I'd be delighted if you could let me in on the trick.

pdurrant
05-25-2011, 01:07 PM
HIf you know how to activate a consistent em-dash in Word, I'd be delighted if you could let me in on the trick.

On Macintosh, en-dash is alt/- () and em-dash is alt/shift/- ()

On Windows you probably need to do something complicated with the numeric keypad. (Checked: Probably Alt+0150 for en-dash and Alt+0151 for em-dash)

http://en.wikipedia.org/wiki/Dash

DiapDealer
05-25-2011, 01:39 PM
Those are the correct alt codes for the different dashes on Windows.

If you don't feel like memorizing alt codes (or writing them down) just bring up the Windows Character Map utility (Programs->Accessories->System Tools). It will allow you to select and copy any of the special (or unicode) characters so you can paste them into documents.

DreamWriter
05-25-2011, 01:49 PM
Hi Dream Writer. So it isn't just me - I'm relieved to hear it.
No, it isn't only you. Those formatting irregularities bother me too. Amazon shouldn't have used that example in their advert, and it's amazing that they didn't scrutinize it more carefully before publishing.

One thing you mentioned that I'd like to pick up on, is where you state 'Some people probably don't know how to create the em-dash'. You can count me in on that - I assume you're referring to MicroSoft Word.
It's super-easy to create an em-dash in MS Word. There's a feature called AutoCorrect. I use MS Word 2007, so these instructions apply to that version. To check how your AutoCorrect is set up, click on the circle-thingy in the upper-left corner in Word (called "Office Button"). Click on "Word Options," then "Proofing," "AutoCorrect Options," and "AutoFormat." If you check the box that says "hyphens (--) with dash ()," then every time you type a word followed by two hyphens in a row and another word, the two hyphens turn into an em-dash automatically.

When I'm working with HTML to create an e-book, I insert this code in the HTML wherever the em-dash appears:

&mdash;

I'm not sure that's always necessary, but I also make similar HTML changes for curly quotes, apostrophes, etc.

WillAdams
05-25-2011, 02:38 PM
Windows users can also use the nifty (free!) ``All Chars'' program which emulates the ``COMPOSE'' key on old DEC word processors. Highly recommended:

http://allchars.zwolnet.com/

William

JSWolf
05-25-2011, 04:37 PM
Hi Dream Writer. So it isn't just me - I'm relieved to hear it. What I find unbelievable, is that a company like Amazon didn't spot that for themselves when they agreed the advert. Talk about compounding an error.

One thing you mentioned that I'd like to pick up on, is where you state 'Some people probably don't know how to create the em-dash'. You can count me in on that - I assume you're referring to MicroSoft Word. How I do it, is copy an em-dash from the text and then paste it where I want it. Alternatively, I pick one out of 'Symbols', then copy it for further use. Cumbersome, I know, but it's far better than having hundreds of en-dashes to convert during editing.

If you know how to activate a consistent em-dash in Word, I'd be delighted if you could let me in on the trick.

Sony has the very same em dash bug in their BBeB parser. Very annoying to be sure. But ADE does not have that bug. To type an em dash in Windows, type alt 0151 (keypad) and you get an em dash. In fact, in ePub that have em dash with spaces, I fix that is it bothers me.

JSWolf
05-25-2011, 04:38 PM
No, it isn't only you. Those formatting irregularities bother me too. Amazon shouldn't have used that example in their advert, and it's amazing that they didn't scrutinize it more carefully before publishing.


It's super-easy to create an em-dash in MS Word. There's a feature called AutoCorrect. I use MS Word 2007, so these instructions apply to that version. To check how your AutoCorrect is set up, click on the circle-thingy in the upper-left corner in Word (called "Office Button"). Click on "Word Options," then "Proofing," "AutoCorrect Options," and "AutoFormat." If you check the box that says "hyphens (--) with dash ()," then every time you type a word followed by two hyphens in a row and another word, the two hyphens turn into an em-dash automatically.

When I'm working with HTML to create an e-book, I insert this code in the HTML wherever the em-dash appears:

&mdash;

I'm not sure that's always necessary, but I also make similar HTML changes for curly quotes, apostrophes, etc.

When you use Word for the source document, do you fix the garbage Word inserts when saved as filtered HTML?

DreamWriter
05-25-2011, 08:03 PM
When you use Word for the source document, do you fix the garbage Word inserts when saved as filtered HTML?

I have created only one ebook from an MS Word document (my husband's book), but yes, I did fix the HTML code. I also have a separate CSS file. The MS Word-generated HTML is certainly a mess, even when you choose the filtered option!

I created another ebook by adding all the HTML code by hand. It was very time-consuming but rather fun, actually.

neilmarr
05-26-2011, 04:50 AM
Amazon/Kindle do no editing at all, Michael (nor do the other ebook retailers). It's up to the publisher to prepare flawless files. That's why your own books with us, for intance, are -- after the full editorial process and thorough proofing by many sets of eyes -- painstakingly prepared at the technical end four times over: for print, and then individually created for all three morst popular digital formats, including Mobi for Kindle. Files are then checked for flaws on several in-house reading platforms BEFORE uploading to retail. When digital formatting is automated or when a for-print PDF (often containing occult errors that don't show in print but do in ebook form) is used for auto-conversion, anomalies will invariably creep in. Bestests. Neil

Michael J Hunt
05-26-2011, 12:48 PM
Thanks to everyone for all your helpful advice. The em-dash problem sounds so minor, yet, when you're writing books, the time consumed in overcoming it is considerable.

Thanks for the information about preparing an e-book, Neil.

djgreedo
05-28-2011, 07:35 AM
What bothers me is that publishers are using the cost of converting to ebook as justification for not relaxing ebook prices, yet many of them are clearly not doing anything other than OCR (i.e. no humans are proofreading these books).

I'm reading an excellent ebook at the moment that has lots of silly typos. One person in the book has the surname 'Horne' and it alternates between 'Horne' and 'Home' constantly. This is obviously an OCR error.

And like others noted, there are hyphenation issues on most Kindle books I've read to date, all of which should be easily picked up by a human proofreader.

raac
05-28-2011, 12:06 PM
If there are large numbers of ludicrous typos, I have a policy of returning the book and asking for my money back. Actions speak. If enough people did this, it would send a pretty clear message.

JSWolf
05-28-2011, 12:47 PM
I have created only one ebook from an MS Word document (my husband's book), but yes, I did fix the HTML code. I also have a separate CSS file. The MS Word-generated HTML is certainly a mess, even when you choose the filtered option!

I created another ebook by adding all the HTML code by hand. It was very time-consuming but rather fun, actually.

One thing I have done in the past is to create the ePub from the Word sourced HTML and then take the HTML and load it into Bookdesigner. Bookdesigner basically filters out the garbage and then save again as HTML. Take the HTML from BD, copy the chapters and replace then ePub chapters. Then clean up as needed. Sometimes it's a lot less work.

JSWolf
05-28-2011, 01:22 PM
The way I would create eBooks is to first create the ePub and get it looking just how I want it. After it's been looked at and proofed and is ready to go, convert that to Mobipocket. You won't have to worry about typos or whatnot creeping in as all that's already been proofed. The next step would be to see how it looks and if it looks good, then that's done.

One problem we have with publishers can be worse then OCR. They sometimes use a PDF source.

DreamWriter
05-28-2011, 02:04 PM
One thing I have done in the past is to create the ePub from the Word sourced HTML and then take the HTML and load it into Bookdesigner. Bookdesigner basically filters out the garbage and then save again as HTML. Take the HTML from BD, copy the chapters and replace then ePub chapters. Then clean up as needed. Sometimes it's a lot less work.
Thanks for that tip! I've never worked with Bookdesigner. I may try that next time I create an ebook. Cleaning up the MS Word-generated HTML by hand was very time-consuming, but using search and replace did help!

JSWolf
05-28-2011, 02:11 PM
Thanks for that tip! I've never worked with Bookdesigner. I may try that next time I create an ebook. Cleaning up the MS Word-generated HTML by hand was very time-consuming, but using search and replace did help!

If you want to do it by hand, notepad++ is very nice. You can load the CSS and all the XML files in separate tabs and use search/replace to search & replace in all the tabs at one go. Also, it supports regex search/replace. It's what I used to do a lot of ePub editing.

SeaBookGuy
05-28-2011, 02:12 PM
If there are large numbers of ludicrous typos, I have a policy of returning the book and asking for my money back. Actions speak. If enough people did this, it would send a pretty clear message.

I bought a Kindle book that had atrocious formatting -- truly unreadable. Amazon refunded the money, and I posted my experience as a review. The (self published) author responded with, "Gee, I had been wondering why there were so many returns!"