04-09-2010, 03:14 AM | #61 |
Enthusiast
Posts: 38
Karma: 50000
Join Date: Mar 2010
Location: Lancashire, England
Device: none
|
Thanks to Sassanik and Logesman for the explanation of OCR and OCD. I get the allusion now.
MJ |
04-09-2010, 09:38 AM | #62 |
Always Reading
Posts: 110
Karma: 1002645
Join Date: Dec 2008
Location: North Carolina
Device: FW 1192, PRS-700, kobo, Rocket, PRS-650, Nook Glow, Kobo Glo, others
|
Well, that is a question - is it possible that a conversion from one format to another is inserting such errors?
|
Advert | |
|
04-09-2010, 02:17 PM | #63 | |
eBook Enthusiast
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
Quote:
Conversion between "text" formats - ePub, Lit, Mobi, etc - should be "lossless" as far as the actual text is concerned. You may, however, lose formatting in the process. |
|
04-09-2010, 02:36 PM | #64 |
Connoisseur
Posts: 82
Karma: 184
Join Date: Jun 2008
Device: Sony PRS-505
|
|
04-09-2010, 05:29 PM | #65 |
Kindlephilia
Posts: 2,017
Karma: 1139255
Join Date: Nov 2007
Location: Snowpacolypse 2010
Device: Too many to count
|
I've had a number of eReader books that convert poorly. Soft hyphens, curly quotes that don't translate, and having all punctuation except "." and "," not translate.
|
Advert | |
|
04-09-2010, 08:32 PM | #66 | |
Junior Member
Posts: 8
Karma: 10
Join Date: Apr 2010
Device: Dell Axim x50v
|
Quote:
|
|
04-09-2010, 08:38 PM | #67 |
Junior Member
Posts: 8
Karma: 10
Join Date: Apr 2010
Device: Dell Axim x50v
|
Part of the reason for these typos is that, until recently (I would guess), many publishers didn't put that much effort into it. I talked to someone from one of the publishers at San Diego ComicCon last year (I want to say it was Penguin, but I don't remember for sure). The woman I talked to seemed pretty enthusiastic about ebooks, but she admitted that they were way behind on producing them because they had ONE person in the entire company who handled converting things into ebooks.
Another problem is the format. Not all publishers natively support all ebook formats. I emailed Harper Collins about some typos I found in the Septimus Heap ebooks from eReader.com. Their reply said they would look into it, but they would have to get eReader.com to take care of it, because they only supported ePub natively. The actual conversion to PDB format is apparently handled by eReader.com. |
04-10-2010, 04:15 AM | #68 |
frumious Bandersnatch
Posts: 7,516
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
Even with text-based PDFs, the PDF does not (necessarily) contain information about words, paragraphs, etc. The characters are easy to extract (unless there are funny fonts involved) but joining hyphenated words at the end of line, putting spaces where they belong, removing page numbres and headers, dealing with footnotes, putting columns in the right order, detecting paragraphs, etc. is a different matter.
|
04-10-2010, 08:08 AM | #69 |
K. Dawn Byrd, Author
Posts: 17
Karma: 20000
Join Date: Apr 2010
Device: I read ebooks on my Blackberry
|
Typos inbooks
I'm reading "Fresh Kills," an Amazon Breakthrough Novel winner. I'm only about 25% and I've already found about six places where they scrunched words together without a space.
|
04-12-2010, 12:35 PM | #70 | |
Feral Underclass
Posts: 3,622
Karma: 26821535
Join Date: Jan 2010
Location: Yorkshire, tha noz
Device: 2nd hand paperback
|
Quote:
|
|
04-12-2010, 03:04 PM | #71 |
Samurai Lizard
Posts: 14,247
Karma: 66666666
Join Date: Nov 2009
Device: NookColor
|
I think that a factor contributing to typos in ebooks is that ebooks are still relatively new and the publishing industry is adapting to the change. I think that as the industry adapts to the change errors in ebooks will become less common.
One way errors could be reduced is to maintain a book's electronic source text in a form that can be easily formatted for many uses. It could be something like a plain text file with plain text markup to indicate intended formatting (such as [Bold text starts here]). The markup could be similar to HTML, but intended to be read by a human, rather than interpreted by a computer. A human takes the source text file and formats it in accordance with the instructions for a specific ebook format. I think that ebooks are in the same statues as CDs and digital audio recording were in the early days. Due to the differences between analog and digital recording, it took the recording industry time to adjust to the change and I think that it will be the same with ebooks. |
04-12-2010, 03:18 PM | #72 |
Wizard
Posts: 1,234
Karma: 3350652
Join Date: Feb 2008
Device: Amazon Kindle Paperwhite (300ppi), Samsung Galaxy Book 12
|
Solitaire1 --- having a human involved goes counter to reducing errors.
The solution already exists (rich markup using XML schemas like docbook or TEI), the problem is that the discipline to use such is contra-indicated by publisher's obsessions w/ short-term profits, editorial usage of Microsoft Word and lack of discipline / technical competence in typical design-school graduates. William |
04-12-2010, 03:45 PM | #73 | ||||
Curmudgeon
Posts: 3,085
Karma: 722357
Join Date: Feb 2010
Device: PRS-505
|
Quote:
Is reading <b> or <strong> really that much harder than reading [boldface starts here]? Quote:
You're talking about having a human act as a dumb processing system -- something which a computer can do much more efficiently. Having a human go along looking for [boldface starts here] and doing something with it isn't nearly as efficient as having a computer do the same, in terms of either accuracy or time. On the other hand, you've provided a perfect example right here: Quote:
And that's what the problem is with the ebooks: not that computers can't read the formatting, or that a human could read it better, but that computers can't spot when something has gone wrong. They can read the formatting just fine; they can't understand the content. That's particularly true of OCR'd text, but it also comes up with things like soft hyphens, hard returns, and other things meant to format text ... for humans. Quote:
|
||||
04-12-2010, 05:48 PM | #74 | |
Junior Member
Posts: 8
Karma: 10
Join Date: Apr 2010
Device: Dell Axim x50v
|
Quote:
|
|
04-12-2010, 11:54 PM | #75 |
Curmudgeon
Posts: 3,085
Karma: 722357
Join Date: Feb 2010
Device: PRS-505
|
I'm not saying you don't; quite the opposite, in fact.
He was suggesting that books that are already in electronic format be tagged with some quasi-HTML (the main difference, apparently, being using phrases instead of abbreviations, like [bold text starts here] instead of <b>) and humans do something when they see that tag, presumably selecting the relevant text and clicking a "bold" button, when converting it to some particular ebook format. Precisely why a human pushing buttons would be faster or more accurate than an HTML renderer eludes my comprehension. That's not where the problems are. The problems are in bad scans that aren't even spell-checked, or spell-checked but not proofread. Those are what need to be gone over with a fine-toothed comb by a real human. And those, unfortunately, are also what is sold by publishers putting greed ahead of all else, including their own long-term profitability. You'll notice that recent Project Gutenberg books -- basically, any that have been through the Distributed Proofreading Project -- are much superior in quality to most backlist commercial ebooks. And they are at times working with books hundreds of years old, victims of age and worn type. They're proofread by humans -- why not go do a page? That makes all the difference. That's where the human eye is needed: checking the scan against the original. Not in reading through a computer file and clicking "bold" every time you see [bold text starts here]. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Looking for examples of typos in eBooks | Tonycole | General Discussions | 1 | 05-05-2010 04:23 AM |
typos or mistakes in ebooks | delcimai | Sony Reader | 15 | 02-14-2010 11:53 AM |
Typos during conversion | ddavtian | Calibre | 11 | 10-20-2008 12:57 AM |
eBooks and Typos | seldan | Reading and Management | 9 | 10-08-2007 12:35 PM |
ebook typos | sugarbear2403 | Sony Reader | 6 | 10-09-2006 11:47 PM |