Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book General > News

Notices

Reply
 
Thread Tools Search this Thread
Old 10-29-2008, 10:21 AM   #16
maggotb0y
Connoisseur
maggotb0y can extract oil from cheesemaggotb0y can extract oil from cheesemaggotb0y can extract oil from cheesemaggotb0y can extract oil from cheesemaggotb0y can extract oil from cheesemaggotb0y can extract oil from cheesemaggotb0y can extract oil from cheesemaggotb0y can extract oil from cheesemaggotb0y can extract oil from cheese
 
Posts: 84
Karma: 1166
Join Date: Apr 2007
Location: New Jersey, Outside of Philadelphia
Device: Sony Reader
Keep in mind that another one of googles arms is working hard on the OCRopus project, which is designed to improve scanning of books. Most Commercial OCR systems available today are designed towards the scanning business documents. They make no secret that the driver for this project is the google books initiative.

I'm sure there will still be flaws in the scans (they'll never hit 100%), but the improvements will be very welcome, and if they are releasing non-drm'ed scans, then we'll be able to fix the scanning artifacts and re-convert for a reading device. This is a big improvement over the amazing amount of flaws that I find in DRMed eBooks, which I cannot correct and will continue to drive me crazy each time I read a book (I'm a cronic re-reader).
maggotb0y is offline   Reply With Quote
Old 10-29-2008, 10:27 AM   #17
Charbax
Addict
Charbax ought to be getting tired of karma fortunes by now.Charbax ought to be getting tired of karma fortunes by now.Charbax ought to be getting tired of karma fortunes by now.Charbax ought to be getting tired of karma fortunes by now.Charbax ought to be getting tired of karma fortunes by now.Charbax ought to be getting tired of karma fortunes by now.Charbax ought to be getting tired of karma fortunes by now.Charbax ought to be getting tired of karma fortunes by now.Charbax ought to be getting tired of karma fortunes by now.Charbax ought to be getting tired of karma fortunes by now.Charbax ought to be getting tired of karma fortunes by now.
 
Posts: 203
Karma: 550683
Join Date: Mar 2007
Yup, Google should provide collaborative manual corrections of OCR using basically an E-Ink reader that has an annotations feature like the Kindle or even better, one that has USB keyboard or bluetooth full sized foldable keyboard and where you can put up the E-ink reader on a stand on the table.

Google could also use this same collaborative manual corrections system for translations.

When millions of users get to participate in an automatic collaborative way, you can quickly get the full OCR and translations done.

In the same way as a Wiki, you get to log exactly which user corrected which words, this way you can hold users accountable and automatically block any attempt at vandalising texts. Also you can automatically compensate the work done by people to correct OCR and machine translation errors.

Last edited by Charbax; 10-29-2008 at 10:30 AM.
Charbax is offline   Reply With Quote
Advert
Old 10-29-2008, 10:31 AM   #18
jharker
Developer
jharker could sell banana peel slippers to a Deveel.jharker could sell banana peel slippers to a Deveel.jharker could sell banana peel slippers to a Deveel.jharker could sell banana peel slippers to a Deveel.jharker could sell banana peel slippers to a Deveel.jharker could sell banana peel slippers to a Deveel.jharker could sell banana peel slippers to a Deveel.jharker could sell banana peel slippers to a Deveel.jharker could sell banana peel slippers to a Deveel.jharker could sell banana peel slippers to a Deveel.jharker could sell banana peel slippers to a Deveel.
 
Posts: 345
Karma: 3473
Join Date: Apr 2007
Location: Brooklyn, NY, USA
Device: iRex iLiad v1, Blackberry Tour, Kindle DX, iPad.
Quote:
Originally Posted by TallMomof2 View Post
A scanned page image is essentially a photograph or picture of the page. Like a picture it is not seen as text (characters) by the ebook program. What you have to do is run the scanned pages through an OCR program to convert the images to text so that it is treated as text instead of an image. The "gotcha" is that conversion usually results in many errors that require a human to edit the text. I can't tell you how many ebooks I've read that are poorly converted scanned pages. And these are from legitimate publishers.
This is the reason I don't buy ebooks any more. Now I mostly read free books or books out of copyright. I once spent about $30 on ebooks and they all had major errors or flaws that were clearly OCR-related. Many would have been fixed by a spellcheck program or easily noticed by a human proofreader. One book was missing all of its quotation marks. It's going to be a while before I trust e-publishers enough to give them money again.

Hopefully Google can get OCRopus running well enough to make decent ebooks available...
jharker is offline   Reply With Quote
Old 10-29-2008, 10:35 AM   #19
DMcCunney
New York Editor
DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.
 
DMcCunney's Avatar
 
Posts: 6,384
Karma: 16540415
Join Date: Aug 2007
Device: PalmTX, Pocket eDGe, Alcatel Fierce 4, RCA Viking Pro 10, Nexus 7
Quote:
Originally Posted by zerospinboson View Post
an english translation?
Yes. As it happens, I found a much more usable HTML rendition here.
______
Dennis
DMcCunney is offline   Reply With Quote
Old 10-29-2008, 10:40 AM   #20
DMcCunney
New York Editor
DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.
 
DMcCunney's Avatar
 
Posts: 6,384
Karma: 16540415
Join Date: Aug 2007
Device: PalmTX, Pocket eDGe, Alcatel Fierce 4, RCA Viking Pro 10, Nexus 7
Quote:
Originally Posted by TallMomof2 View Post
A scanned page image is essentially a photograph or picture of the page. Like a picture it is not seen as text (characters) by the ebook program. What you have to do is run the scanned pages through an OCR program to convert the images to text so that it is treated as text instead of an image. The "gotcha" is that conversion usually results in many errors that require a human to edit the text. I can't tell you how many ebooks I've read that are poorly converted scanned pages. And these are from legitimate publishers.
Precisely. No OCR program is perfect. Ligatures are a special problem, and multi-column formats can throw the OCR software included with things like home scanners. Higher end professional gear does better, but it costs, and there will still be editing and proofreading to get good copy.

The publishers whose lacking work you read skimped on or eliminated the editing step to cut costs.

(And that's just for texts in the Roman alphabet. If the original book was in something else, all bets are off.)
______
Dennis
DMcCunney is offline   Reply With Quote
Advert
Old 10-29-2008, 10:51 AM   #21
Daithi
Publishers are evil!
Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.
 
Daithi's Avatar
 
Posts: 2,418
Karma: 36205264
Join Date: Mar 2008
Location: Rhode Island
Device: Various Kindles
After reading the Authors Guild FAQ there were a few things I found interesting.

1) Books that are in print will NOT be available to even preview, unless the rightsholder decides to participate in the program.

2) Books that are not in print will be available for preview, unless the rightsholder decides not to participate.

3) Only the Preview is available to us for FREE. If we want to view the whole book we need to pay for it, unless it is out of copyright or we are accessing the books through a library that is subscribed to the Google service. At this point in time there is no monthly subscription fee available to the general public.

4) How much will we need to pay to access a full view of the books? They are not saying.

Personally, I'm happy to see that more books are being made available to us. I also like the idea a previous poster made about Google offering their own eReader. Something like the Plastic Logic device, which can display full page PDFs, would be pretty cool. I really don't want to read PDF books on a backlit computer screen.

It may also be possible for Google to provide just the text of the book at some point. As others on this forum have already pointed out, the Google books are PDF books, and a PDF is generally just a scanned page. However, a PDF can also contain the machine readable text within the PDF. And most (or all) the Google PDFs already contain this text. This is how they know which books, and book pages, to display to you when you do a search. It is also how they know which words to highlight when they display the pages. At least that is my understanding.

Last edited by Daithi; 10-29-2008 at 10:54 AM.
Daithi is offline   Reply With Quote
Old 10-29-2008, 11:45 AM   #22
athlonkmf
Guru
athlonkmf ought to be getting tired of karma fortunes by now.athlonkmf ought to be getting tired of karma fortunes by now.athlonkmf ought to be getting tired of karma fortunes by now.athlonkmf ought to be getting tired of karma fortunes by now.athlonkmf ought to be getting tired of karma fortunes by now.athlonkmf ought to be getting tired of karma fortunes by now.athlonkmf ought to be getting tired of karma fortunes by now.athlonkmf ought to be getting tired of karma fortunes by now.athlonkmf ought to be getting tired of karma fortunes by now.athlonkmf ought to be getting tired of karma fortunes by now.athlonkmf ought to be getting tired of karma fortunes by now.
 
Posts: 714
Karma: 1014039
Join Date: May 2007
Device: Sony PRS-500, Sony PRS-505, Kindle 3, Sony PRS350, iPad 64GB
Quote:
Originally Posted by TallMomof2 View Post
A scanned page image is essentially a photograph or picture of the page. Like a picture it is not seen as text (characters) by the ebook program. What you have to do is run the scanned pages through an OCR program to convert the images to text so that it is treated as text instead of an image. The "gotcha" is that conversion usually results in many errors that require a human to edit the text. I can't tell you how many ebooks I've read that are poorly converted scanned pages. And these are from legitimate publishers.

Quote:
Originally Posted by Dave Berk View Post
Google should turn to a community based collaborative approach. Where anyone who contribute over a certain quota get time-limited access to the whole archive.
Quote:
Originally Posted by Charbax View Post

Google could also use this same collaborative manual corrections system for translations.

When millions of users get to participate in an automatic collaborative way, you can quickly get the full OCR and translations done.


That's why we've got a project like recaptcha

http://recaptcha.net/learnmore.html
athlonkmf is offline   Reply With Quote
Old 10-29-2008, 12:31 PM   #23
DMcCunney
New York Editor
DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.
 
DMcCunney's Avatar
 
Posts: 6,384
Karma: 16540415
Join Date: Aug 2007
Device: PalmTX, Pocket eDGe, Alcatel Fierce 4, RCA Viking Pro 10, Nexus 7
Quote:
Originally Posted by Daithi View Post
After reading the Authors Guild FAQ there were a few things I found interesting.

1) Books that are in print will NOT be available to even preview, unless the rightsholder decides to participate in the program.

2) Books that are not in print will be available for preview, unless the rightsholder decides not to participate.
No surprise. The lawsuit was in part brought by authors stung because Google hadn't specifically secured their permision to make their texts available.

Google was uning an "opt out" model. The Author's Guild wanted it "opt in".

Quote:
3) Only the Preview is available to us for FREE. If we want to view the whole book we need to pay for it, unless it is out of copyright or we are accessing the books through a library that is subscribed to the Google service. At this point in time there is no monthly subscription fee available to the general public.
I don't expect to see a general public subscription fee. Subscriptions for libraries are one thing. Offering it the the general public is more complex.

Quote:
4) How much will we need to pay to access a full view of the books? They are not saying.
Probably because it hasn't been decided, and may vary by book, depending on the terms required by the rights holder. (I can see some rights holders having a wildly optimistic view of how much anyone will pay to view their book...)

Quote:
Personally, I'm happy to see that more books are being made available to us. I also like the idea a previous poster made about Google offering their own eReader. Something like the Plastic Logic device, which can display full page PDFs, would be pretty cool. I really don't want to read PDF books on a backlit computer screen.
I can't see Google getting into the consumer hardware business. I can see Google offering the required software under an open source license, like the Android cell phone OS, that someone else can put into hardware to offer a reader.

Quote:
It may also be possible for Google to provide just the text of the book at some point. As others on this forum have already pointed out, the Google books are PDF books, and a PDF is generally just a scanned page. However, a PDF can also contain the machine readable text within the PDF. And most (or all) the Google PDFs already contain this text. This is how they know which books, and book pages, to display to you when you do a search. It is also how they know which words to highlight when they display the pages. At least that is my understanding.
You can get the text of the book now, but it's not terribly useful without a lot of editing and cleanup.
______
Dennis
DMcCunney is offline   Reply With Quote
Old 10-29-2008, 01:17 PM   #24
cjp
Zealot
cjp began at the beginning.
 
cjp's Avatar
 
Posts: 106
Karma: 10
Join Date: Oct 2008
Location: Saint Louis, Missouri USA
Device: HP Slate 500, Sony PRS-505, Kindle 3G+Wi-Fi, Droid
Quote:
Originally Posted by DMcCunney View Post
Even the ones you can read all of will be problematic for download/convert. They have PDFs which are essentially scanned page images, and a "View as plain text option", but the plain text would require a lot of editing and cleanup. (I grabbed a copy of Max Weber's _The Protestant Ethic and the Spirit of Capitalism_ with a view to converting, and gave up when I saw the work required.)
______
Dennis

The work around I've been using for some time now is to take the PDF scanned images and convert them using Adobe Pro. Although it renders a large MB file, you can then easily - and I mean easily and quickly - convert or copy into another document to an output of your choice. (Since I already used Adobe Pro extensively to generate PDF's, this was not an added expense for me.)
cjp is offline   Reply With Quote
Old 10-29-2008, 01:31 PM   #25
nekokami
fruminous edugeek
nekokami ought to be getting tired of karma fortunes by now.nekokami ought to be getting tired of karma fortunes by now.nekokami ought to be getting tired of karma fortunes by now.nekokami ought to be getting tired of karma fortunes by now.nekokami ought to be getting tired of karma fortunes by now.nekokami ought to be getting tired of karma fortunes by now.nekokami ought to be getting tired of karma fortunes by now.nekokami ought to be getting tired of karma fortunes by now.nekokami ought to be getting tired of karma fortunes by now.nekokami ought to be getting tired of karma fortunes by now.nekokami ought to be getting tired of karma fortunes by now.
 
nekokami's Avatar
 
Posts: 6,745
Karma: 551260
Join Date: Oct 2006
Location: Northeast US
Device: iPad, eBw 1150
Quote:
Originally Posted by DMcCunney View Post
I can't see Google getting into the consumer hardware business. I can see Google offering the required software under an open source license, like the Android cell phone OS, that someone else can put into hardware to offer a reader.
They could do this, especially if they wanted to offer some interactive services, e.g. the discussion features in DotReader. (In fact, Google might want to look at acquiring the DotReader project-- I think they could do some very interesting things with it. Anyone know anybody at Google to pass that suggestion along to?)

Or Google could just go with ePub, as someone has already suggested.

BBC coverage of the deal here: http://news.bbc.co.uk/2/hi/technology/7695507.stm
nekokami is offline   Reply With Quote
Old 10-30-2008, 12:12 AM   #26
Charbax
Addict
Charbax ought to be getting tired of karma fortunes by now.Charbax ought to be getting tired of karma fortunes by now.Charbax ought to be getting tired of karma fortunes by now.Charbax ought to be getting tired of karma fortunes by now.Charbax ought to be getting tired of karma fortunes by now.Charbax ought to be getting tired of karma fortunes by now.Charbax ought to be getting tired of karma fortunes by now.Charbax ought to be getting tired of karma fortunes by now.Charbax ought to be getting tired of karma fortunes by now.Charbax ought to be getting tired of karma fortunes by now.Charbax ought to be getting tired of karma fortunes by now.
 
Posts: 203
Karma: 550683
Join Date: Mar 2007
Consumer subscription is mentionned in the FAQ as one o the features that will be provided.

The question is how many books can Google get to be part of that subscription plan.

I expect for like $5 a month to start with, Google could provide unlimited access to books of which rights holders have opted in to be part of that full access subscription plan.

Quickly, millions of users will participate, and this will encourage more and more authors to be part of it, until it kind of becomes an industry standard for all authors to automatically make all of their works part of that full access subscription plan.

Later, a new law will provide that subscription plan to every citizen through taxes. Where everyone pays proportionally to their income and to their wealth.

How fast that transistion will take until everyone gets full unlimited access to the equivalent of Amazon Kindle Store, Project Gutenberg and Google Books put together. That will depend on how fast and how well Google integrates this new plan, and it depends on Google wanting to promote that subscription plan rather than on-demand pay-for-download plans. Perhaps Google like Apple thinks they can make more money selling each work at expensive prices by itself instead of providing the whole of it for a low affordable subscription price.

The new Rights Registry system that Google is setting up, it should provide a quick way for online bloggers and independent publishers to get their blogs, feeds and Google Docs publishings registered and quickly part of that exact same global access subscription plan.

Sure Google is not a hardware manufacturer yet. They think it's better to create the free open-source OS for other manufacturers to use. The problem is I think, if Google wanted to, they could be the best manufacturer in the world. Perhaps the best solution would be for Google to not only open their software, release it and give it away for free, but also to design the reference designs and also in fact mass manufacture the reference designs as well. And have the hardware reference designs also be totally open source if possible.

For Google to put some free, open, mass manufactured reference designs out there, and actually even sell them at cost price, skipping all intermediaries, directly from manufacturing to the consumers. This, I think, will enable even more manufacturers to take those designs and sell them.

Hopefully, Obama will take Eric Schmidt as country CTO, and this will let someone new come in and be the new CEO at Google, and change things a little, so Google becomes the USA's official tech company that not only provides the worlds best free open source OS, but also mass manufactures $100 laptops, $150 E-Ink readers, $100 Android pocket devices and $5000 Electric Cars.
Charbax is offline   Reply With Quote
Old 10-30-2008, 02:09 AM   #27
Studio717
Addict
Studio717 will become famous soon enoughStudio717 will become famous soon enoughStudio717 will become famous soon enoughStudio717 will become famous soon enoughStudio717 will become famous soon enoughStudio717 will become famous soon enough
 
Posts: 208
Karma: 575
Join Date: Oct 2006
Location: California
Device: Various Kindles, iPhone, iPad, Galaxy 10.1
I'm thrilled with this decision. Google Books has already provided me with hundreds of public domain books that I could not have found without them. And now with this agreement, the frustration of knowing a book is there, yet unavailable will be a thing of the past. No longer will I sigh in defeat when a book that's still in copyright even though it's out of print is denied me.

The used book business will suffer from this, though, and I am sad about that. I have been able to purchase quite a few OOP books that I first learned about through Google Books, some at fairly high cost, but now that book buying will be reduced considerably.

While I agree about wanting readers and text and all of that, to be honest, I'm so happy to just have the contents of the book available to me, it seems horribly ungrateful of me to complain about the format. I'll take PDFs over nothing any day!
Studio717 is offline   Reply With Quote
Old 10-30-2008, 09:53 AM   #28
Daithi
Publishers are evil!
Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.
 
Daithi's Avatar
 
Posts: 2,418
Karma: 36205264
Join Date: Mar 2008
Location: Rhode Island
Device: Various Kindles
Charbax,

I think you might be jumping the gun a bit. Their FAQ says, "The agreement allows for other services and uses, such as Print-On-Demand, Consumer Subscription and others, to be agreed in the future." As the agreement is currently written it only provides for library and corporate subscriptions. I kind of doubt that the rightsholders will agree to a $5 month subscription plan for the general public.
Daithi is offline   Reply With Quote
Old 10-30-2008, 10:03 AM   #29
Taylor514ce
Actively passive.
Taylor514ce ought to be getting tired of karma fortunes by now.Taylor514ce ought to be getting tired of karma fortunes by now.Taylor514ce ought to be getting tired of karma fortunes by now.Taylor514ce ought to be getting tired of karma fortunes by now.Taylor514ce ought to be getting tired of karma fortunes by now.Taylor514ce ought to be getting tired of karma fortunes by now.Taylor514ce ought to be getting tired of karma fortunes by now.Taylor514ce ought to be getting tired of karma fortunes by now.Taylor514ce ought to be getting tired of karma fortunes by now.Taylor514ce ought to be getting tired of karma fortunes by now.Taylor514ce ought to be getting tired of karma fortunes by now.
 
Taylor514ce's Avatar
 
Posts: 2,042
Karma: 478376
Join Date: Feb 2008
Location: US
Device: Sony PRS-505/LC
This is a settlement. Google was forced into this because of immense pressure from the publishing industry. Google's Book project was a massive, systematic raping of copyright, and everyone knew it. Everyone seems to be praising Google for this, but this is a settlement from a copyright-infringement lawsuit brought by the Authors Guild. Praise them, instead (there was also a separate suit by the Association of American Publishers on behalf of five big publishers). Google aren't the good guys, here. They violated copyright, and are side-stepping a judgment by this settlement.
Taylor514ce is offline   Reply With Quote
Old 10-30-2008, 11:18 AM   #30
Daithi
Publishers are evil!
Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.
 
Daithi's Avatar
 
Posts: 2,418
Karma: 36205264
Join Date: Mar 2008
Location: Rhode Island
Device: Various Kindles
Systematic raping of copyright?
Daithi is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
DOJ recommends rejecting Google Books settlement Daithi News 1 02-05-2010 04:06 PM
Le Guin accuses Authors Guild of 'deal with the devil' nick101 News 16 12-24-2009 10:44 PM
Authors Guild to Random House head: What's in the water over there? Nate the great News 8 12-16-2009 01:41 PM
Google books settlement update ekaser News 0 11-14-2009 11:16 AM
Google Book Settlement Site Is Up; Paying Authors $60 Per Scanned Book yagiz News 8 04-26-2009 01:43 AM


All times are GMT -4. The time now is 07:21 PM.


MobileRead.com is a privately owned, operated and funded community.