Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book General > General Discussions

Notices

Reply
 
Thread Tools Search this Thread
Old 04-09-2010, 03:14 AM   #61
Michael J Hunt
Enthusiast
Michael J Hunt is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!Michael J Hunt is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!Michael J Hunt is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!Michael J Hunt is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!Michael J Hunt is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!Michael J Hunt is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!Michael J Hunt is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!Michael J Hunt is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!Michael J Hunt is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!Michael J Hunt is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!Michael J Hunt is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!
 
Posts: 38
Karma: 50000
Join Date: Mar 2010
Location: Lancashire, England
Device: none
Thanks to Sassanik and Logesman for the explanation of OCR and OCD. I get the allusion now.

MJ
Michael J Hunt is offline   Reply With Quote
Old 04-09-2010, 09:38 AM   #62
rleguillow
Always Reading
rleguillow ought to be getting tired of karma fortunes by now.rleguillow ought to be getting tired of karma fortunes by now.rleguillow ought to be getting tired of karma fortunes by now.rleguillow ought to be getting tired of karma fortunes by now.rleguillow ought to be getting tired of karma fortunes by now.rleguillow ought to be getting tired of karma fortunes by now.rleguillow ought to be getting tired of karma fortunes by now.rleguillow ought to be getting tired of karma fortunes by now.rleguillow ought to be getting tired of karma fortunes by now.rleguillow ought to be getting tired of karma fortunes by now.rleguillow ought to be getting tired of karma fortunes by now.
 
rleguillow's Avatar
 
Posts: 110
Karma: 1002645
Join Date: Dec 2008
Location: North Carolina
Device: FW 1192, PRS-700, kobo, Rocket, PRS-650, Nook Glow, Kobo Glo, others
Quote:
Originally Posted by Ankh View Post
And easy to fix (if the book is DRM-free). Use Calibre to convert the book to, say, Microsoft ".lit", delete original epub, then convert it from lit to epub.
Well, that is a question - is it possible that a conversion from one format to another is inserting such errors?
rleguillow is offline   Reply With Quote
Advert
Old 04-09-2010, 02:17 PM   #63
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Quote:
Originally Posted by rleguillow View Post
Well, that is a question - is it possible that a conversion from one format to another is inserting such errors?
The only format conversion that tends to introduce textual errors is converting from PDF to something else, because a PDF file doesn't contain "text" at all, and the conversion tool has to "reconstruct" the page from what are basically graphical components.

Conversion between "text" formats - ePub, Lit, Mobi, etc - should be "lossless" as far as the actual text is concerned. You may, however, lose formatting in the process.
HarryT is offline   Reply With Quote
Old 04-09-2010, 02:36 PM   #64
kad032000
Connoisseur
kad032000 doesn't litterkad032000 doesn't litter
 
Posts: 82
Karma: 184
Join Date: Jun 2008
Device: Sony PRS-505
Quote:
Originally Posted by rleguillow View Post
Well, that is a question - is it possible that a conversion from one format to another is inserting such errors?
Converting one format to another ALWAYS carries the possibility of error. (Not just for ebooks. For anything.)
kad032000 is offline   Reply With Quote
Old 04-09-2010, 05:29 PM   #65
TallMomof2
Kindlephilia
TallMomof2 ought to be getting tired of karma fortunes by now.TallMomof2 ought to be getting tired of karma fortunes by now.TallMomof2 ought to be getting tired of karma fortunes by now.TallMomof2 ought to be getting tired of karma fortunes by now.TallMomof2 ought to be getting tired of karma fortunes by now.TallMomof2 ought to be getting tired of karma fortunes by now.TallMomof2 ought to be getting tired of karma fortunes by now.TallMomof2 ought to be getting tired of karma fortunes by now.TallMomof2 ought to be getting tired of karma fortunes by now.TallMomof2 ought to be getting tired of karma fortunes by now.TallMomof2 ought to be getting tired of karma fortunes by now.
 
TallMomof2's Avatar
 
Posts: 2,017
Karma: 1139255
Join Date: Nov 2007
Location: Snowpacolypse 2010
Device: Too many to count
I've had a number of eReader books that convert poorly. Soft hyphens, curly quotes that don't translate, and having all punctuation except "." and "," not translate.
TallMomof2 is offline   Reply With Quote
Advert
Old 04-09-2010, 08:32 PM   #66
WarnerYoung
Junior Member
WarnerYoung began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Apr 2010
Device: Dell Axim x50v
Quote:
Originally Posted by HarryT View Post
The only format conversion that tends to introduce textual errors is converting from PDF to something else, because a PDF file doesn't contain "text" at all, and the conversion tool has to "reconstruct" the page from what are basically graphical components.

Conversion between "text" formats - ePub, Lit, Mobi, etc - should be "lossless" as far as the actual text is concerned. You may, however, lose formatting in the process.
I'm not sure that's strictly true. It depends on how the PDF file was generated. Otherwise, a standard PDF reader wouldn't be able to let you select and copy its text, or search through the text in its files. Or am I missing something here?
WarnerYoung is offline   Reply With Quote
Old 04-09-2010, 08:38 PM   #67
WarnerYoung
Junior Member
WarnerYoung began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Apr 2010
Device: Dell Axim x50v
Part of the reason for these typos is that, until recently (I would guess), many publishers didn't put that much effort into it. I talked to someone from one of the publishers at San Diego ComicCon last year (I want to say it was Penguin, but I don't remember for sure). The woman I talked to seemed pretty enthusiastic about ebooks, but she admitted that they were way behind on producing them because they had ONE person in the entire company who handled converting things into ebooks.

Another problem is the format. Not all publishers natively support all ebook formats. I emailed Harper Collins about some typos I found in the Septimus Heap ebooks from eReader.com. Their reply said they would look into it, but they would have to get eReader.com to take care of it, because they only supported ePub natively. The actual conversion to PDB format is apparently handled by eReader.com.
WarnerYoung is offline   Reply With Quote
Old 04-10-2010, 04:15 AM   #68
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,516
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Quote:
Originally Posted by WarnerYoung View Post
I'm not sure that's strictly true. It depends on how the PDF file was generated. Otherwise, a standard PDF reader wouldn't be able to let you select and copy its text, or search through the text in its files. Or am I missing something here?
Even with text-based PDFs, the PDF does not (necessarily) contain information about words, paragraphs, etc. The characters are easy to extract (unless there are funny fonts involved) but joining hyphenated words at the end of line, putting spaces where they belong, removing page numbres and headers, dealing with footnotes, putting columns in the right order, detecting paragraphs, etc. is a different matter.
Jellby is offline   Reply With Quote
Old 04-10-2010, 08:08 AM   #69
kdawnbyrd
K. Dawn Byrd, Author
kdawnbyrd can self-interpret dreams as they happen.kdawnbyrd can self-interpret dreams as they happen.kdawnbyrd can self-interpret dreams as they happen.kdawnbyrd can self-interpret dreams as they happen.kdawnbyrd can self-interpret dreams as they happen.kdawnbyrd can self-interpret dreams as they happen.kdawnbyrd can self-interpret dreams as they happen.kdawnbyrd can self-interpret dreams as they happen.kdawnbyrd can self-interpret dreams as they happen.kdawnbyrd can self-interpret dreams as they happen.kdawnbyrd can self-interpret dreams as they happen.
 
kdawnbyrd's Avatar
 
Posts: 17
Karma: 20000
Join Date: Apr 2010
Device: I read ebooks on my Blackberry
Typos inbooks

I'm reading "Fresh Kills," an Amazon Breakthrough Novel winner. I'm only about 25% and I've already found about six places where they scrunched words together without a space.
kdawnbyrd is offline   Reply With Quote
Old 04-12-2010, 12:35 PM   #70
mr ploppy
Feral Underclass
mr ploppy ought to be getting tired of karma fortunes by now.mr ploppy ought to be getting tired of karma fortunes by now.mr ploppy ought to be getting tired of karma fortunes by now.mr ploppy ought to be getting tired of karma fortunes by now.mr ploppy ought to be getting tired of karma fortunes by now.mr ploppy ought to be getting tired of karma fortunes by now.mr ploppy ought to be getting tired of karma fortunes by now.mr ploppy ought to be getting tired of karma fortunes by now.mr ploppy ought to be getting tired of karma fortunes by now.mr ploppy ought to be getting tired of karma fortunes by now.mr ploppy ought to be getting tired of karma fortunes by now.
 
mr ploppy's Avatar
 
Posts: 3,622
Karma: 26821535
Join Date: Jan 2010
Location: Yorkshire, tha noz
Device: 2nd hand paperback
Quote:
Originally Posted by Jellby View Post
Even with text-based PDFs, the PDF does not (necessarily) contain information about words, paragraphs, etc. The characters are easy to extract (unless there are funny fonts involved) but joining hyphenated words at the end of line, putting spaces where they belong, removing page numbres and headers, dealing with footnotes, putting columns in the right order, detecting paragraphs, etc. is a different matter.
Mobipocket's reader/converter seems to do a better job of converting from PDF than Calibre, though it is still not perfect. I don't see how conversion would be responsible for all the spelling mistakes in ebooks though.
mr ploppy is offline   Reply With Quote
Old 04-12-2010, 03:04 PM   #71
Solitaire1
Samurai Lizard
Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.
 
Solitaire1's Avatar
 
Posts: 14,247
Karma: 66666666
Join Date: Nov 2009
Device: NookColor
I think that a factor contributing to typos in ebooks is that ebooks are still relatively new and the publishing industry is adapting to the change. I think that as the industry adapts to the change errors in ebooks will become less common.

One way errors could be reduced is to maintain a book's electronic source text in a form that can be easily formatted for many uses. It could be something like a plain text file with plain text markup to indicate intended formatting (such as [Bold text starts here]). The markup could be similar to HTML, but intended to be read by a human, rather than interpreted by a computer. A human takes the source text file and formats it in accordance with the instructions for a specific ebook format.

I think that ebooks are in the same statues as CDs and digital audio recording were in the early days. Due to the differences between analog and digital recording, it took the recording industry time to adjust to the change and I think that it will be the same with ebooks.
Solitaire1 is offline   Reply With Quote
Old 04-12-2010, 03:18 PM   #72
WillAdams
Wizard
WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.
 
WillAdams's Avatar
 
Posts: 1,234
Karma: 3350652
Join Date: Feb 2008
Device: Amazon Kindle Paperwhite (300ppi), Samsung Galaxy Book 12
Solitaire1 --- having a human involved goes counter to reducing errors.

The solution already exists (rich markup using XML schemas like docbook or TEI), the problem is that the discipline to use such is contra-indicated by publisher's obsessions w/ short-term profits, editorial usage of Microsoft Word and lack of discipline / technical competence in typical design-school graduates.

William
WillAdams is offline   Reply With Quote
Old 04-12-2010, 03:45 PM   #73
Worldwalker
Curmudgeon
Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.
 
Posts: 3,085
Karma: 722357
Join Date: Feb 2010
Device: PRS-505
Quote:
Originally Posted by Solitaire1 View Post
The markup could be similar to HTML, but intended to be read by a human, rather than interpreted by a computer.
Um, might I point out that HTML is, in fact, meant to be read by a human? Or that the Web existed long before FrontPage, let alone Dreamweaver? There are still plenty of us hand-coding HTML, and a whole lot more of us squinting at lousy auto-created code to fix it.

Is reading <b> or <strong> really that much harder than reading [boldface starts here]?

Quote:
A human takes the source text file and formats it in accordance with the instructions for a specific ebook format.
And thereby inserts errors.

You're talking about having a human act as a dumb processing system -- something which a computer can do much more efficiently. Having a human go along looking for [boldface starts here] and doing something with it isn't nearly as efficient as having a computer do the same, in terms of either accuracy or time.

On the other hand, you've provided a perfect example right here:

Quote:
I think that ebooks are in the same statues as CDs and digital audio recording were in the early days.
You wrote "statues" where you meant "status". Since we can assume that you know the difference between sculptures and condition, it probably happened because your fingers, running half on automatic as you thought a line ahead of where you were actually typing, inserted that extra 'e' and, since it made a legitimate word, no little red line appeared under it on your screen. That's the part that we need humans for. To a computer, since pearls can be in statues (The Adventure of the Six Napoleons), and since tourists can be in statues (the Statue of Liberty), why can't CDs and ebooks be in statues? That's where we need a human who can understand what it was you were trying to say, as distinct from what you actually wrote, and spot the typo.

And that's what the problem is with the ebooks: not that computers can't read the formatting, or that a human could read it better, but that computers can't spot when something has gone wrong. They can read the formatting just fine; they can't understand the content. That's particularly true of OCR'd text, but it also comes up with things like soft hyphens, hard returns, and other things meant to format text ... for humans.

Quote:
Due to the differences between analog and digital recording, it took the recording industry time to adjust to the change and I think that it will be the same with ebooks.
Correct me if I'm wrong, but wasn't the recording industry using digital recording for masters long before digital formats became available on a consumer level? I don't think there was that big an adjustment.
Worldwalker is offline   Reply With Quote
Old 04-12-2010, 05:48 PM   #74
WarnerYoung
Junior Member
WarnerYoung began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Apr 2010
Device: Dell Axim x50v
Quote:
Originally Posted by Worldwalker View Post
You're talking about having a human act as a dumb processing system -- something which a computer can do much more efficiently. Having a human go along looking for [boldface starts here] and doing something with it isn't nearly as efficient as having a computer do the same, in terms of either accuracy or time.
But Solitaire1 does have a point. Precisely because computers can't always spot errors, you probably DO need a human involved somewhere in the process. For cases where the book is already in an electronic form and just needs to be converted to an ebook, maybe little to no human intervention is needed. But for other cases, such as old books that have to be scanned, it seems to me you'd pretty much have to have a human help proofread it.
WarnerYoung is offline   Reply With Quote
Old 04-12-2010, 11:54 PM   #75
Worldwalker
Curmudgeon
Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.
 
Posts: 3,085
Karma: 722357
Join Date: Feb 2010
Device: PRS-505
I'm not saying you don't; quite the opposite, in fact.

He was suggesting that books that are already in electronic format be tagged with some quasi-HTML (the main difference, apparently, being using phrases instead of abbreviations, like [bold text starts here] instead of <b>) and humans do something when they see that tag, presumably selecting the relevant text and clicking a "bold" button, when converting it to some particular ebook format. Precisely why a human pushing buttons would be faster or more accurate than an HTML renderer eludes my comprehension.

That's not where the problems are. The problems are in bad scans that aren't even spell-checked, or spell-checked but not proofread. Those are what need to be gone over with a fine-toothed comb by a real human. And those, unfortunately, are also what is sold by publishers putting greed ahead of all else, including their own long-term profitability.

You'll notice that recent Project Gutenberg books -- basically, any that have been through the Distributed Proofreading Project -- are much superior in quality to most backlist commercial ebooks. And they are at times working with books hundreds of years old, victims of age and worn type. They're proofread by humans -- why not go do a page? That makes all the difference. That's where the human eye is needed: checking the scan against the original. Not in reading through a computer file and clicking "bold" every time you see [bold text starts here].
Worldwalker is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Looking for examples of typos in eBooks Tonycole General Discussions 1 05-05-2010 04:23 AM
typos or mistakes in ebooks delcimai Sony Reader 15 02-14-2010 11:53 AM
Typos during conversion ddavtian Calibre 11 10-20-2008 12:57 AM
eBooks and Typos seldan Reading and Management 9 10-08-2007 12:35 PM
ebook typos sugarbear2403 Sony Reader 6 10-09-2006 11:47 PM


All times are GMT -4. The time now is 12:57 PM.


MobileRead.com is a privately owned, operated and funded community.