Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book General > General Discussions

Notices

Reply
 
Thread Tools Search this Thread
Old 12-01-2012, 01:07 AM   #1
automa
Connoisseur
automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.
 
automa's Avatar
 
Posts: 93
Karma: 972092
Join Date: Jan 2012
Device: iPhone
Has anyone tried reading on mobile, books from Archive.org or Google books?

The free books of ePub or mobi available for download on Google Books and Archive. They have spelling errors and spacing errors and have no table of contents.

How do you cope with that? Any solutions?
automa is offline   Reply With Quote
Old 12-01-2012, 09:32 AM   #2
BWinmill
Wizard
BWinmill ought to be getting tired of karma fortunes by now.BWinmill ought to be getting tired of karma fortunes by now.BWinmill ought to be getting tired of karma fortunes by now.BWinmill ought to be getting tired of karma fortunes by now.BWinmill ought to be getting tired of karma fortunes by now.BWinmill ought to be getting tired of karma fortunes by now.BWinmill ought to be getting tired of karma fortunes by now.BWinmill ought to be getting tired of karma fortunes by now.BWinmill ought to be getting tired of karma fortunes by now.BWinmill ought to be getting tired of karma fortunes by now.BWinmill ought to be getting tired of karma fortunes by now.
 
Posts: 1,757
Karma: 14842230
Join Date: Sep 2011
Device: Sony PRS-T1
Most of those books are scans with OCR, so errors are bound to pop up. Alternatives are places like gutenberg.org, which has a smaller selection but they are proofread; or selecting formats that provide the scanned image (PDF, DjVu).
BWinmill is offline   Reply With Quote
Old 12-01-2012, 09:37 AM   #3
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 64,140
Karma: 42575773
Join Date: Nov 2006
Location: UK
Device: PW2, iPad Retina Mini, iPhone 4, MS Surface Pro, Kobo H2O, N7
The real value of "archive.org" (and Google books, too) is its PDF page scans. They are a wonderful resource for scanned copied of out-of-copyright books for proofing an eBook against.
HarryT is online now   Reply With Quote
Old 12-01-2012, 08:05 PM   #4
automa
Connoisseur
automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.
 
automa's Avatar
 
Posts: 93
Karma: 972092
Join Date: Jan 2012
Device: iPhone
I like the PDF page scans too, but the problem I have with that is that you cannot read it anywhere, you have to be on front of a computer.
automa is offline   Reply With Quote
Old 12-01-2012, 08:09 PM   #5
SteveEisenberg
Wizard
SteveEisenberg ought to be getting tired of karma fortunes by now.SteveEisenberg ought to be getting tired of karma fortunes by now.SteveEisenberg ought to be getting tired of karma fortunes by now.SteveEisenberg ought to be getting tired of karma fortunes by now.SteveEisenberg ought to be getting tired of karma fortunes by now.SteveEisenberg ought to be getting tired of karma fortunes by now.SteveEisenberg ought to be getting tired of karma fortunes by now.SteveEisenberg ought to be getting tired of karma fortunes by now.SteveEisenberg ought to be getting tired of karma fortunes by now.SteveEisenberg ought to be getting tired of karma fortunes by now.SteveEisenberg ought to be getting tired of karma fortunes by now.
 
Posts: 1,737
Karma: 12261824
Join Date: Jun 2008
Location: Philadelphia USA
Device: Kindle Keyboard 3G
I read one such nineteenth century eBook that's of local historical interest. I'd rather the book was proofread, but I did finish it.
SteveEisenberg is offline   Reply With Quote
Old 12-01-2012, 11:43 PM   #6
Joykins
Wizard
Joykins ought to be getting tired of karma fortunes by now.Joykins ought to be getting tired of karma fortunes by now.Joykins ought to be getting tired of karma fortunes by now.Joykins ought to be getting tired of karma fortunes by now.Joykins ought to be getting tired of karma fortunes by now.Joykins ought to be getting tired of karma fortunes by now.Joykins ought to be getting tired of karma fortunes by now.Joykins ought to be getting tired of karma fortunes by now.Joykins ought to be getting tired of karma fortunes by now.Joykins ought to be getting tired of karma fortunes by now.Joykins ought to be getting tired of karma fortunes by now.
 
Joykins's Avatar
 
Posts: 1,507
Karma: 6570458
Join Date: Jan 2010
Device: nook, kindle 4 NT, kindle PW2, iPhone
Of interest largely to proofreaders or scholars who cannot find the source materials elsewhere. Not suited to ereader devices.
Joykins is offline   Reply With Quote
Old 12-02-2012, 10:37 AM   #7
automa
Connoisseur
automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.
 
automa's Avatar
 
Posts: 93
Karma: 972092
Join Date: Jan 2012
Device: iPhone
Has anyone attempted to make a table of contents for one of these books? I thought archive.org or Google Books should implement software that would automatically generate table of contents for the all the files including ePub / mobi files.

What would be the best way to make a petition?
automa is offline   Reply With Quote
Old 12-02-2012, 10:41 AM   #8
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 64,140
Karma: 42575773
Join Date: Nov 2006
Location: UK
Device: PW2, iPad Retina Mini, iPhone 4, MS Surface Pro, Kobo H2O, N7
What do you need a petition for? It's easy enough to add your own TOC. But I think you misunderstand the purpose of these sites; the "raw" OCR that they do is never going to give you eBooks suitable for reading without manual cleaning up. That's not what they're there for. They're better regarded as resources for creating nice eBooks from, rather than giving you the finished article.

Why not create a few nicely-formatted and proof-read eBooks yourself and upload them to our library for everyone to enjoy?
HarryT is online now   Reply With Quote
Old 12-02-2012, 11:20 AM   #9
automa
Connoisseur
automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.
 
automa's Avatar
 
Posts: 93
Karma: 972092
Join Date: Jan 2012
Device: iPhone
I think it is time consuming for people to add table of contents one by one. It would be a lot faster if a bot automatically generated the table of contents.

As well as the random spaces between words, there are so much of them it would be time consuming to delete those spaces.

Editing the spelling errors would be understandable because only humans can do that while the other two I mentioned probably can be automated via programming.

Also I think even PDF books that you download from Google Books and Archive are hard to read because they don't give you table of contents which probably can be easily made with programming instead of having the reader manually put in the table of contents.

Last edited by automa; 12-02-2012 at 11:22 AM.
automa is offline   Reply With Quote
Old 12-02-2012, 11:40 AM   #10
elcreative
Wizard
elcreative ought to be getting tired of karma fortunes by now.elcreative ought to be getting tired of karma fortunes by now.elcreative ought to be getting tired of karma fortunes by now.elcreative ought to be getting tired of karma fortunes by now.elcreative ought to be getting tired of karma fortunes by now.elcreative ought to be getting tired of karma fortunes by now.elcreative ought to be getting tired of karma fortunes by now.elcreative ought to be getting tired of karma fortunes by now.elcreative ought to be getting tired of karma fortunes by now.elcreative ought to be getting tired of karma fortunes by now.elcreative ought to be getting tired of karma fortunes by now.
 
Posts: 2,884
Karma: 5875940
Join Date: Dec 2007
Device: PRS505, 600, 350, 650, Nexus 7, Note III, iPad 4 etc
Quote:
Originally Posted by automa View Post
I think it is time consuming for people to add table of contents one by one. It would be a lot faster if a bot automatically generated the table of contents.

As well as the random spaces between words, there are so much of them it would be time consuming to delete those spaces.

Editing the spelling errors would be understandable because only humans can do that while the other two I mentioned probably can be automated via programming.

Also I think even PDF books that you download from Google Books and Archive are hard to read because they don't give you table of contents which probably can be easily made with programming instead of having the reader manually put in the table of contents.
You've just given a reason why there aren't ToCs and why there are errors - time consuming - the material on these sites is produced and uploaded for free, no charge from people using their own free time for the benefit of others and thus providing the basis for others to fine-tune/add ToCs etc using their free time... if you were paying for the content then sure, grounds for complaint but free and provided out of others' goodwill then if you want better then you provide it yourself from some of your time...
elcreative is offline   Reply With Quote
Old 12-02-2012, 03:35 PM   #11
automa
Connoisseur
automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.automa ought to be getting tired of karma fortunes by now.
 
automa's Avatar
 
Posts: 93
Karma: 972092
Join Date: Jan 2012
Device: iPhone
It will be less time consuming for everyone else if a few developers working for archive.org and Google Books implement the feature.

In other words it takes much less time for developers to do some programming to save a ton of time for all the readers.

They must have programmers working for them already so the only reason I can think of why they are not implementing this feature is because of ignorance that it could be extremely useful, so that all we have to do is notify them of the extreme usefulness of such implementation, and it takes numbers to notify them. So we need a petition.
automa is offline   Reply With Quote
Old 12-02-2012, 04:17 PM   #12
elcreative
Wizard
elcreative ought to be getting tired of karma fortunes by now.elcreative ought to be getting tired of karma fortunes by now.elcreative ought to be getting tired of karma fortunes by now.elcreative ought to be getting tired of karma fortunes by now.elcreative ought to be getting tired of karma fortunes by now.elcreative ought to be getting tired of karma fortunes by now.elcreative ought to be getting tired of karma fortunes by now.elcreative ought to be getting tired of karma fortunes by now.elcreative ought to be getting tired of karma fortunes by now.elcreative ought to be getting tired of karma fortunes by now.elcreative ought to be getting tired of karma fortunes by now.
 
Posts: 2,884
Karma: 5875940
Join Date: Dec 2007
Device: PRS505, 600, 350, 650, Nexus 7, Note III, iPad 4 etc
Quote:
Originally Posted by automa View Post
It will be less time consuming for everyone else if a few developers working for archive.org and Google Books implement the feature.

In other words it takes much less time for developers to do some programming to save a ton of time for all the readers.

They must have programmers working for them already so the only reason I can think of why they are not implementing this feature is because of ignorance that it could be extremely useful, so that all we have to do is notify them of the extreme usefulness of such implementation, and it takes numbers to notify them. So we need a petition.
So true... but a few developers are not going to be the people who currently put stuff up for FREE... they'll want paying to do such stuff on a real time basis... and I doubt that those involved are ignorant of the state of things but thse people are VOLUNTEERS doing it for love in spare time... if you don't like what they do then you can ignore it and go elsewhere or improve it yourself as a volunteer, I'm sure your donation of your free time would be much appreciated... If you want more professional output then go to professional outlets and pay for the work...

Also why should people giving freely of their time be expected to give even more just so other people don't have to invest some of their time... it's not as though you have a right to a shiny professional product because you paid for it... I suppose it's symptomatic that if you get something for nothing then some just expect even more for the same price...

Last edited by elcreative; 12-02-2012 at 04:21 PM.
elcreative is offline   Reply With Quote
Old 12-02-2012, 05:12 PM   #13
SteveEisenberg
Wizard
SteveEisenberg ought to be getting tired of karma fortunes by now.SteveEisenberg ought to be getting tired of karma fortunes by now.SteveEisenberg ought to be getting tired of karma fortunes by now.SteveEisenberg ought to be getting tired of karma fortunes by now.SteveEisenberg ought to be getting tired of karma fortunes by now.SteveEisenberg ought to be getting tired of karma fortunes by now.SteveEisenberg ought to be getting tired of karma fortunes by now.SteveEisenberg ought to be getting tired of karma fortunes by now.SteveEisenberg ought to be getting tired of karma fortunes by now.SteveEisenberg ought to be getting tired of karma fortunes by now.SteveEisenberg ought to be getting tired of karma fortunes by now.
 
Posts: 1,737
Karma: 12261824
Join Date: Jun 2008
Location: Philadelphia USA
Device: Kindle Keyboard 3G
Quote:
Originally Posted by automa View Post
Editing the spelling errors would be understandable because only humans can do that while the other two I mentioned probably can be automated via programming.
Here is a popular program used to pre-process optical character recognition long text output before proofreading:

http://home.comcast.net/~thundergnat/guiprep.html

As you can see from my link, an enormous amount of work has already gone into this.

You may have good ideas for additional features. However, it is was easy, or even middling difficult, I think it would already have been done. People are undoubtedly work on some of the hard stuff.

I'm not sure, but a lot of non-proofread texts at archive.org have perhaps already gone through this sort of software.
SteveEisenberg is offline   Reply With Quote
Old 12-02-2012, 10:47 PM   #14
cromag
Surfing the alpha waves ~
cromag ought to be getting tired of karma fortunes by now.cromag ought to be getting tired of karma fortunes by now.cromag ought to be getting tired of karma fortunes by now.cromag ought to be getting tired of karma fortunes by now.cromag ought to be getting tired of karma fortunes by now.cromag ought to be getting tired of karma fortunes by now.cromag ought to be getting tired of karma fortunes by now.cromag ought to be getting tired of karma fortunes by now.cromag ought to be getting tired of karma fortunes by now.cromag ought to be getting tired of karma fortunes by now.cromag ought to be getting tired of karma fortunes by now.
 
cromag's Avatar
 
Posts: 4,964
Karma: 14177040
Join Date: Dec 2010
Location: New Jersey
Device: Jetbook Lite, Jetbook Mini, Netbook, and two Androids
Quote:
Originally Posted by automa View Post
It will be less time consuming for everyone else if a few developers working for archive.org and Google Books implement the feature.

In other words it takes much less time for developers to do some programming to save a ton of time for all the readers.

They must have programmers working for them already so the only reason I can think of why they are not implementing this feature is because of ignorance that it could be extremely useful, so that all we have to do is notify them of the extreme usefulness of such implementation, and it takes numbers to notify them. So we need a petition.
There are already people who clean up these raw scans and OCRs -- they're at Project Gutenberg. Archive provides the source and they do it as quickly as they can, to keep these books from disappearing. PG provides a quality final product, often from these very sources.

No problem with letting them know what you'd like, of course, but I don't think Archive is interested in doing that.
cromag is offline   Reply With Quote
Old 12-03-2012, 01:54 AM   #15
Billi
Wizard
Billi ought to be getting tired of karma fortunes by now.Billi ought to be getting tired of karma fortunes by now.Billi ought to be getting tired of karma fortunes by now.Billi ought to be getting tired of karma fortunes by now.Billi ought to be getting tired of karma fortunes by now.Billi ought to be getting tired of karma fortunes by now.Billi ought to be getting tired of karma fortunes by now.Billi ought to be getting tired of karma fortunes by now.Billi ought to be getting tired of karma fortunes by now.Billi ought to be getting tired of karma fortunes by now.Billi ought to be getting tired of karma fortunes by now.
 
Billi's Avatar
 
Posts: 3,242
Karma: 13235489
Join Date: Jun 2009
Location: Berlin
Device: Cybook, iRex, PB, Onyx
Quote:
Originally Posted by automa View Post
I like the PDF page scans too, but the problem I have with that is that you cannot read it anywhere, you have to be on front of a computer.
Why do you think so? There is a download option for the pdf scans, too.
Billi is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Archive.org adds Mobi format for most of 1.8m books Nate the great News 2 12-11-2009 03:01 PM
Images from Google Books, Internet Archive, etc. vivaldirules Upload Help 18 09-17-2009 10:00 AM
1.5m books in your pocket - with Google Books Mobile Alexander Turcic News 24 02-10-2009 02:12 PM
Reading Google Books Cito Bookeen 15 08-17-2008 08:44 PM


All times are GMT -4. The time now is 11:05 AM.


MobileRead.com is a privately owned, operated and funded community.