12-20-2007, 03:28 PM | #1 |
Enthusiast
Posts: 32
Karma: 10
Join Date: Dec 2007
Device: Sony PRS505
|
Question about Formatting Downloaded e-Books
I am new to this forum and have not even received my new reader yet (it's under the tree!). But I have a question about formatting e-books that are downloaded from various sites. Some of the posts suggest that there may be numerous errors. Does this mean that I will have to go through the digitized version line by line, page by page, looking for errors? I am getting the Sony PRS 505. Is there a search feature that will help me check for errors?
Sorry if these questions have been asked before. But I want to get as much of a "head start" as I can creating my collection. |
12-20-2007, 04:05 PM | #2 | |
Wizard
Posts: 4,395
Karma: 1358132
Join Date: Nov 2007
Location: UK
Device: Palm TX, CyBook Gen3
|
Quote:
There is also the Distributed Proofreaders site which produces checked text - I'd visit there first, as the texts are likely to be cleaner. Another method is to download separate public domain texts of the same work, and run a text comparison to show where mismatches occur. You could then use your own judgement to select the correct version (usually the error is obvious), or refer to another version (if possible) to arbitrate. The tricky part is to find two genuinely different sources (rather than two that have the same origin, and the same inaccuracies). |
|
Advert | |
|
12-20-2007, 04:40 PM | #3 | |
Grand Sorcerer
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
Quote:
Dale |
|
12-21-2007, 01:15 PM | #4 | |
non-believer
Posts: 384
Karma: 713
Join Date: Dec 2007
Location: USA
Device: Cybook Gen 3, JetBook Lite
|
Quote:
I believe most of their books come from Gutenberg. In fact, I believe most of the free public domain books available on the internet come from Gutenberg. It's my understanding that the Distributed Proofreaders works with Project Gutenberg. I can't say that I've ever sat down and compared a real book line by line with a Gutenberg text, but the Gutenberg books I've read seem fine to me. Very good in fact. Last edited by cmbs; 12-21-2007 at 01:20 PM. |
|
12-21-2007, 03:42 PM | #5 |
Enthusiast
Posts: 32
Karma: 10
Join Date: Dec 2007
Device: Sony PRS505
|
It's a little disconcerting to think that e-books I download to my Reader may not be "clean," even from the pay sites! With the e-book downloads sometimes running in the hundreds or maybe "thousands" of pages (I like "big books!), that will require a lot of work!
I guess I thought that, although this technology is in its infancy, it was pretty much the same as printed books (although I have seen errors in them as well). |
Advert | |
|
12-21-2007, 04:04 PM | #6 | |
Wizard
Posts: 4,395
Karma: 1358132
Join Date: Nov 2007
Location: UK
Device: Palm TX, CyBook Gen3
|
Quote:
I recently did a book comparison. The biggest problem with the Gutenberg text, compared to the pbook, was loads of missing commas. This made some of the sentences in the PG version pretty cumbersome, and rather detracted from the reading experince (imho). There were other issues such as missing quotes, unpaired brackets and the occasional missing word - I find these type of errors a bit of an irritant; but there weren't enough of them to make the PG text unreadable. The good news, regarding books posted here, is that readers can feedback any issues they spot to posters - so the versions here should get even better over time. |
|
12-21-2007, 04:14 PM | #7 | |
Grand Sorcerer
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
Quote:
Dale |
|
12-21-2007, 05:03 PM | #8 |
Reader
Posts: 11,504
Karma: 8720163
Join Date: May 2007
Location: South Wales, UK
Device: Sony PRS-500, PRS-505, Asus EEEpc 4G
|
I have posted a fair number of books on this site. When I notice errors, I correct them. Some of my sources have been from very poor scans and have needed checking line by line. Others have looked good. With these I just run them through a spell-checker. I usually check against a paper copy and have been known to restore censored passages and missing foreign accents. I indicate when I have done so.
Inevitably, there are some errors. However, people have been kind enough to pm me. when this happens, I put up a corrected version. I believe that my versions have fewer errors than, say, the Sony classics series, which is full of them. This is a problem with automatic conversions. |
12-21-2007, 11:55 PM | #9 |
Technogeezer
Posts: 7,233
Karma: 1601464
Join Date: Nov 2006
Location: Virginia, USA
Device: Sony PRS-500
|
The flow is actually from Distributed Proofreaders to Project Gutenberg. The texts at DP are in the process of being corrected. When they are finished they are moved to PG and released. The newer texts contain far fewer errors than the older texts. They do issue corrections from time-to-time on the older texts.
As for the posting of many books in one day, Patricia and I (among others) are reviewing what we have posted before and making those texts available in eBookwise and Mobipocket editions where before we had posted only the Sony LRF and perhaps the Mobipocket. Just because we may post a large number of books in one day does not mean that we have not taken the time to work on the text prior to posting. I will often gather a group of what I plan to post, review and correct as needed, and convert to the final formats before I even start to post. What I have done at other times is to get the material ready and then post as I create the final formats. It may take me several days to post the entire series such as Andrew Lang's Fairy Stories of Many Colors. (12 books, each in 3 formats, posted from 27 Sept 2007 to 01 Oct 2007) Others like the Harvard Classics took months of research to find the original texts, work with image copies, convert through OCR, correct OCR, look at versions, etc. It was worth the effort. This series was posted over a period of months as each volume was completed. Patricia has corrected and produced a wonderful collection of Voltaire. The range and depth of her ebooks is staggering. Harry has made some of the best ebooks I have ever read. He spends a great deal of time on each volume making it just right. His Sherlock Holmes Omnibus has stayed on my Reader since I first loaded it the day he posted it. In other volumes he has added material that you would not expect. For example, in his version of She in the Haggard Anthology Volume 2 the original PG text identified by name the Greek letters on a piece of pottery. Harry inserted the correct Ionic Greek letter which made the story that much more alive. Many of the classics from the Sony store are poor quality texts that are machine converted to BBeB format. Often times with large white space margins and far too small type that even at the "L" setting is hard to read. (This is also true for many of the current bestsellers offered by Sony.) As Patricia said, if we have errors in our books, tell us and we will do what we can to correct them. |
12-22-2007, 01:41 AM | #10 |
Addict
Posts: 223
Karma: 7385
Join Date: Aug 2007
Location: Central California
Device: Handspring Visor, Sony PRS-500 Reader
|
Formatting the books for uploading to this site is very time consuming. I can't begin to express how appreciative I am of the work done by Patricia, HarryT, RWood, Dr. Drib, Nate the Great, tsgreer, JSWolf, BenG and all the others. The quality of the books posted on this site are outstanding compared to others I've seen. Are they perfect? No, but then neither are some of the paper books that I've read. Most of the ebooks errors I've seen in books from this site are easily passed by and quickly forgotten.
|
12-22-2007, 03:49 AM | #11 |
eBook Enthusiast
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
The modern PG texts are pretty good. The problem exists mainly with the older ones, which pre-date scanners, and were typed in by hand. Unfortunately this includes many of the most popular classics, which were obviously the books that people did first.
Eg, I'm currently reading my version of "Oliver Twist", which comes from the (very old) PG version. It's full of errors - on average 1 or 2 per page. I'm marking them all as I read (and comparing to a printed version in cases of doubt). When I've finished reading the book, I'll correct all the errors I've found, and post a new edition of the book. It won't be perfect, but it'll be a lot better than it was. If subsequent readers inform me about the errors that they in turn find, the eBook will carry on getting better and better. |
12-22-2007, 08:05 AM | #12 |
Enthusiast
Posts: 32
Karma: 10
Join Date: Dec 2007
Device: Sony PRS505
|
Wow! I commend all of you for your diligence and commitment. I'm not sure I will have the patience to do what you do, but I thank you all in advance! I suppose it will depend on how "annoying" the errors are. I used to teach high school English (in the Bronx no less!), so I am used to seeing a lot of spelling and grammatical errors (LOL).
I am practically jumping out of my skin with anticipation of opening my Reader and working on it. I guess it will be a constant "work in progress." |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Already downloaded books on kindle? | rikdegraaff | Amazon Kindle | 7 | 09-12-2010 08:24 AM |
[KOBO] Strip existing formatting to apply my own default formatting to all books | digital_steve | Calibre | 2 | 08-10-2010 06:34 PM |
Question about formatting of free books | oksahmof2 | General Discussions | 9 | 06-23-2010 10:29 AM |
PRS-600 question about downloaded books | happy_terd | Sony Reader | 2 | 01-12-2010 01:29 PM |
line formatting formatting question | daesdaemar | Workshop | 9 | 02-06-2009 11:47 AM |