Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book General > Deals and Resources (No Self-Promotion or Affiliate Links)

Notices

Reply
 
Thread Tools Search this Thread
Old 12-20-2007, 03:28 PM   #1
rozie123
Enthusiast
rozie123 began at the beginning.
 
Posts: 32
Karma: 10
Join Date: Dec 2007
Device: Sony PRS505
Question about Formatting Downloaded e-Books

I am new to this forum and have not even received my new reader yet (it's under the tree!). But I have a question about formatting e-books that are downloaded from various sites. Some of the posts suggest that there may be numerous errors. Does this mean that I will have to go through the digitized version line by line, page by page, looking for errors? I am getting the Sony PRS 505. Is there a search feature that will help me check for errors?

Sorry if these questions have been asked before. But I want to get as much of a "head start" as I can creating my collection.
rozie123 is offline   Reply With Quote
Old 12-20-2007, 04:05 PM   #2
Sparrow
Wizard
Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.
 
Posts: 4,395
Karma: 1358132
Join Date: Nov 2007
Location: UK
Device: Palm TX, CyBook Gen3
Quote:
Originally Posted by rozie123 View Post
Does this mean that I will have to go through the digitized version line by line, page by page, looking for errors? I am getting the Sony PRS 505. Is there a search feature that will help me check for errors?
I'm quite new to this too, but it seems most people source from Project Gutenberg, where the texts may have some inaccuracies resulting from the OCR process.
There is also the Distributed Proofreaders site which produces checked text - I'd visit there first, as the texts are likely to be cleaner.

Another method is to download separate public domain texts of the same work, and run a text comparison to show where mismatches occur. You could then use your own judgement to select the correct version (usually the error is obvious), or refer to another version (if possible) to arbitrate.
The tricky part is to find two genuinely different sources (rather than two that have the same origin, and the same inaccuracies).
Sparrow is offline   Reply With Quote
Advert
Old 12-20-2007, 04:40 PM   #3
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
Quote:
Originally Posted by Sparrow View Post
I'm quite new to this too, but it seems most people source from Project Gutenberg, where the texts may have some inaccuracies resulting from the OCR process.
There is also the Distributed Proofreaders site which produces checked text - I'd visit there first, as the texts are likely to be cleaner.

Another method is to download separate public domain texts of the same work, and run a text comparison to show where mismatches occur. You could then use your own judgement to select the correct version (usually the error is obvious), or refer to another version (if possible) to arbitrate.
The tricky part is to find two genuinely different sources (rather than two that have the same origin, and the same inaccuracies).
The best source for these kinds of eBooks is right here at MobileRead. They are generally hand checked, prettied up, and already formatted for your Reader.

Dale
DaleDe is offline   Reply With Quote
Old 12-21-2007, 01:15 PM   #4
cmbs
non-believer
cmbs will become famous soon enoughcmbs will become famous soon enoughcmbs will become famous soon enoughcmbs will become famous soon enoughcmbs will become famous soon enoughcmbs will become famous soon enoughcmbs will become famous soon enough
 
Posts: 384
Karma: 713
Join Date: Dec 2007
Location: USA
Device: Cybook Gen 3, JetBook Lite
Quote:
Originally Posted by DaleDe View Post
The best source for these kinds of eBooks is right here at MobileRead. They are generally hand checked, prettied up, and already formatted for your Reader.

Dale
I most sincerely doubt that people who are posting 5 and more ebooks daily are actually comparing them word for word to a real book to correct mistakes before finalizing and uploading. Or maybe I'm misunderstanding what you mean by "hand checked."

I believe most of their books come from Gutenberg. In fact, I believe most of the free public domain books available on the internet come from Gutenberg.

It's my understanding that the Distributed Proofreaders works with Project Gutenberg. I can't say that I've ever sat down and compared a real book line by line with a Gutenberg text, but the Gutenberg books I've read seem fine to me. Very good in fact.

Last edited by cmbs; 12-21-2007 at 01:20 PM.
cmbs is offline   Reply With Quote
Old 12-21-2007, 03:42 PM   #5
rozie123
Enthusiast
rozie123 began at the beginning.
 
Posts: 32
Karma: 10
Join Date: Dec 2007
Device: Sony PRS505
It's a little disconcerting to think that e-books I download to my Reader may not be "clean," even from the pay sites! With the e-book downloads sometimes running in the hundreds or maybe "thousands" of pages (I like "big books!), that will require a lot of work!

I guess I thought that, although this technology is in its infancy, it was pretty much the same as printed books (although I have seen errors in them as well).
rozie123 is offline   Reply With Quote
Advert
Old 12-21-2007, 04:04 PM   #6
Sparrow
Wizard
Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.Sparrow ought to be getting tired of karma fortunes by now.
 
Posts: 4,395
Karma: 1358132
Join Date: Nov 2007
Location: UK
Device: Palm TX, CyBook Gen3
Quote:
Originally Posted by cmbs View Post
... I can't say that I've ever sat down and compared a real book line by line with a Gutenberg text, but the Gutenberg books I've read seem fine to me. Very good in fact.

I recently did a book comparison. The biggest problem with the Gutenberg text, compared to the pbook, was loads of missing commas. This made some of the sentences in the PG version pretty cumbersome, and rather detracted from the reading experince (imho).

There were other issues such as missing quotes, unpaired brackets and the occasional missing word - I find these type of errors a bit of an irritant; but there weren't enough of them to make the PG text unreadable.

The good news, regarding books posted here, is that readers can feedback any issues they spot to posters - so the versions here should get even better over time.
Sparrow is offline   Reply With Quote
Old 12-21-2007, 04:14 PM   #7
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
Quote:
Originally Posted by cmbs View Post
I most sincerely doubt that people who are posting 5 and more ebooks daily are actually comparing them word for word to a real book to correct mistakes before finalizing and uploading. Or maybe I'm misunderstanding what you mean by "hand checked."

I believe most of their books come from Gutenberg. In fact, I believe most of the free public domain books available on the internet come from Gutenberg.

It's my understanding that the Distributed Proofreaders works with Project Gutenberg. I can't say that I've ever sat down and compared a real book line by line with a Gutenberg text, but the Gutenberg books I've read seem fine to me. Very good in fact.
What they are doing is reading the book themselves and fixing problems in the reading of the book. In some cases where a problem can't be resolved a comparison to the original may be called for but if it reads ok then it is probably ok for all except the scholar that is studying the book, not just reading it. Note in some cases the book gets built so they can read it and then after it is read you will find a new version on the site. They also take input.


Dale
DaleDe is offline   Reply With Quote
Old 12-21-2007, 05:03 PM   #8
Patricia
Reader
Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.
 
Patricia's Avatar
 
Posts: 11,504
Karma: 8720163
Join Date: May 2007
Location: South Wales, UK
Device: Sony PRS-500, PRS-505, Asus EEEpc 4G
I have posted a fair number of books on this site. When I notice errors, I correct them. Some of my sources have been from very poor scans and have needed checking line by line. Others have looked good. With these I just run them through a spell-checker. I usually check against a paper copy and have been known to restore censored passages and missing foreign accents. I indicate when I have done so.

Inevitably, there are some errors. However, people have been kind enough to pm me. when this happens, I put up a corrected version.

I believe that my versions have fewer errors than, say, the Sony classics series, which is full of them. This is a problem with automatic conversions.
Patricia is offline   Reply With Quote
Old 12-21-2007, 11:55 PM   #9
RWood
Technogeezer
RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.
 
RWood's Avatar
 
Posts: 7,233
Karma: 1601464
Join Date: Nov 2006
Location: Virginia, USA
Device: Sony PRS-500
The flow is actually from Distributed Proofreaders to Project Gutenberg. The texts at DP are in the process of being corrected. When they are finished they are moved to PG and released. The newer texts contain far fewer errors than the older texts. They do issue corrections from time-to-time on the older texts.

As for the posting of many books in one day, Patricia and I (among others) are reviewing what we have posted before and making those texts available in eBookwise and Mobipocket editions where before we had posted only the Sony LRF and perhaps the Mobipocket.

Just because we may post a large number of books in one day does not mean that we have not taken the time to work on the text prior to posting. I will often gather a group of what I plan to post, review and correct as needed, and convert to the final formats before I even start to post. What I have done at other times is to get the material ready and then post as I create the final formats. It may take me several days to post the entire series such as Andrew Lang's Fairy Stories of Many Colors. (12 books, each in 3 formats, posted from 27 Sept 2007 to 01 Oct 2007)

Others like the Harvard Classics took months of research to find the original texts, work with image copies, convert through OCR, correct OCR, look at versions, etc. It was worth the effort. This series was posted over a period of months as each volume was completed.

Patricia has corrected and produced a wonderful collection of Voltaire. The range and depth of her ebooks is staggering. Harry has made some of the best ebooks I have ever read. He spends a great deal of time on each volume making it just right. His Sherlock Holmes Omnibus has stayed on my Reader since I first loaded it the day he posted it. In other volumes he has added material that you would not expect. For example, in his version of She in the Haggard Anthology Volume 2 the original PG text identified by name the Greek letters on a piece of pottery. Harry inserted the correct Ionic Greek letter which made the story that much more alive.

Many of the classics from the Sony store are poor quality texts that are machine converted to BBeB format. Often times with large white space margins and far too small type that even at the "L" setting is hard to read. (This is also true for many of the current bestsellers offered by Sony.)

As Patricia said, if we have errors in our books, tell us and we will do what we can to correct them.
RWood is offline   Reply With Quote
Old 12-22-2007, 01:41 AM   #10
Stanart
Addict
Stanart is a jewel in the roughStanart is a jewel in the roughStanart is a jewel in the roughStanart is a jewel in the roughStanart is a jewel in the roughStanart is a jewel in the roughStanart is a jewel in the roughStanart is a jewel in the roughStanart is a jewel in the roughStanart is a jewel in the roughStanart is a jewel in the rough
 
Stanart's Avatar
 
Posts: 223
Karma: 7385
Join Date: Aug 2007
Location: Central California
Device: Handspring Visor, Sony PRS-500 Reader
Formatting the books for uploading to this site is very time consuming. I can't begin to express how appreciative I am of the work done by Patricia, HarryT, RWood, Dr. Drib, Nate the Great, tsgreer, JSWolf, BenG and all the others. The quality of the books posted on this site are outstanding compared to others I've seen. Are they perfect? No, but then neither are some of the paper books that I've read. Most of the ebooks errors I've seen in books from this site are easily passed by and quickly forgotten.
Stanart is offline   Reply With Quote
Old 12-22-2007, 03:49 AM   #11
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
The modern PG texts are pretty good. The problem exists mainly with the older ones, which pre-date scanners, and were typed in by hand. Unfortunately this includes many of the most popular classics, which were obviously the books that people did first.

Eg, I'm currently reading my version of "Oliver Twist", which comes from the (very old) PG version. It's full of errors - on average 1 or 2 per page. I'm marking them all as I read (and comparing to a printed version in cases of doubt). When I've finished reading the book, I'll correct all the errors I've found, and post a new edition of the book. It won't be perfect, but it'll be a lot better than it was. If subsequent readers inform me about the errors that they in turn find, the eBook will carry on getting better and better.
HarryT is offline   Reply With Quote
Old 12-22-2007, 08:05 AM   #12
rozie123
Enthusiast
rozie123 began at the beginning.
 
Posts: 32
Karma: 10
Join Date: Dec 2007
Device: Sony PRS505
Wow! I commend all of you for your diligence and commitment. I'm not sure I will have the patience to do what you do, but I thank you all in advance! I suppose it will depend on how "annoying" the errors are. I used to teach high school English (in the Bronx no less!), so I am used to seeing a lot of spelling and grammatical errors (LOL).

I am practically jumping out of my skin with anticipation of opening my Reader and working on it. I guess it will be a constant "work in progress."
rozie123 is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Already downloaded books on kindle? rikdegraaff Amazon Kindle 7 09-12-2010 08:24 AM
[KOBO] Strip existing formatting to apply my own default formatting to all books digital_steve Calibre 2 08-10-2010 06:34 PM
Question about formatting of free books oksahmof2 General Discussions 9 06-23-2010 10:29 AM
PRS-600 question about downloaded books happy_terd Sony Reader 2 01-12-2010 01:29 PM
line formatting formatting question daesdaemar Workshop 9 02-06-2009 11:47 AM


All times are GMT -4. The time now is 04:36 PM.


MobileRead.com is a privately owned, operated and funded community.