Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book General > News

Notices

Reply
 
Thread Tools Search this Thread
Old 04-01-2009, 06:29 AM   #76
AlexBell
Wizard
AlexBell ought to be getting tired of karma fortunes by now.AlexBell ought to be getting tired of karma fortunes by now.AlexBell ought to be getting tired of karma fortunes by now.AlexBell ought to be getting tired of karma fortunes by now.AlexBell ought to be getting tired of karma fortunes by now.AlexBell ought to be getting tired of karma fortunes by now.AlexBell ought to be getting tired of karma fortunes by now.AlexBell ought to be getting tired of karma fortunes by now.AlexBell ought to be getting tired of karma fortunes by now.AlexBell ought to be getting tired of karma fortunes by now.AlexBell ought to be getting tired of karma fortunes by now.
 
AlexBell's Avatar
 
Posts: 3,413
Karma: 13369310
Join Date: May 2008
Location: Launceston, Tasmania
Device: Sony PRS T3, Kobo Glo, Kindle Touch, iPad, Samsung SB 2 tablet
Quote:
Originally Posted by nrapallo View Post
Try some of the Top 100 EBooks using this link.
Thanks, Nick. I've found three to check out.

Regards, Alex
AlexBell is offline   Reply With Quote
Old 04-01-2009, 07:09 AM   #77
Hadrien
Feedbooks.com Co-Founder
Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.
 
Hadrien's Avatar
 
Posts: 2,263
Karma: 145123
Join Date: Nov 2006
Location: Paris, France
Device: Sony PRS-t-1/350/300/500/505/600/700, Nexus S, iPad
Quote:
Originally Posted by mikecook View Post
Seems like I've missed out on all the fun here! I won't jump in on the argument regarding UID's, although I am tempted ;-)

I just want to discuss the reason for the poor quality Gutenberg EPUB/MOBI auto formatted books.

If we're truthful with ourselves, Marcello's work on these conversions is actually a waste time! That doesn't mean I don't think he nor PG are doing a great job, but I believe they are focusing their efforts on the wrong thing.

The current books are somewhat ugly because the source files they have don't use a standard format -- automation needs a standard source format - once you have that, Marcello's job of creating EPUB, MOBI or whatever other format they desire, will be so much easier.

Now it seems that for several/many years there has been discussions within the PG community for a 'Master Format', but the powers-that-be kept refusing. I guess they are now paying the consequences of that decision.

Once you have a standard source format, such as XML based (a 'very strict' ASCII formatting and layout would be okay (the current PG .TXT files are a real hodge-podge), but still not a good as XML), it is relatively easy to convert to most any other format, automatically and with all the correct markup that your new reading system needs.

You may loose out on some hand-coded 'uniqueness' between books, but all that hard work the proofreaders have done can really start to shine.
Agree and this is exactly what I do on Feedbooks (and what you're working on with your TEI subset).
Hadrien is offline   Reply With Quote
Old 04-01-2009, 02:19 PM   #78
mikecook
Enthusiast
mikecook began at the beginning.
 
mikecook's Avatar
 
Posts: 35
Karma: 10
Join Date: Jun 2007
Location: United Kingdom
Device: iPad Mini, Nexus 7, Sony Reader, Kindle, and others.
Quote:
Originally Posted by Hadrien View Post
Agree and this is exactly what I do on Feedbooks (and what you're working on with your TEI subset).
Yep ... if they won't do it themselves then I guess it's up to us.
mikecook is offline   Reply With Quote
Old 04-01-2009, 02:46 PM   #79
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,293
Karma: 27111240
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I totally agree that from the perspective of doing automatic conversions having a strict source format is a great thing (not strictly necessary, but great). The problem is that the perspective of doing automatic conversion is not the only one, nor even the most important one.

The most important perspective is to grow the adoption of ebooks. And to do that you have to encourage content producers as well as consumers. Content producers in general like flexibilty, they like a format that lets them do what they want to do instead of telling them that they can do only a very limited subset of things. So there's a tradeoff between making life easy for automatic converters and giving content producers what they want
kovidgoyal is offline   Reply With Quote
Old 04-01-2009, 05:31 PM   #80
mikecook
Enthusiast
mikecook began at the beginning.
 
mikecook's Avatar
 
Posts: 35
Karma: 10
Join Date: Jun 2007
Location: United Kingdom
Device: iPad Mini, Nexus 7, Sony Reader, Kindle, and others.
Quote:
Originally Posted by kovidgoyal View Post
I totally agree that from the perspective of doing automatic conversions having a strict source format is a great thing (not strictly necessary, but great). The problem is that the perspective of doing automatic conversion is not the only one, nor even the most important one.

The most important perspective is to grow the adoption of ebooks. And to do that you have to encourage content producers as well as consumers. Content producers in general like flexibilty, they like a format that lets them do what they want to do instead of telling them that they can do only a very limited subset of things. So there's a tradeoff between making life easy for automatic converters and giving content producers what they want
But wouldn't the adoption of eBooks be expediated if the books actually looked bloody nice? Let's face it, the core PG format, being of Plain Vanilla ASCII, does not give nice looking books. Readable yes, nice...not a chance.

Please note, the content of my post was about Project Gutenberg, not the publishing industry.

Last edited by mikecook; 04-01-2009 at 05:34 PM. Reason: spelling error
mikecook is offline   Reply With Quote
Old 04-01-2009, 05:35 PM   #81
cerement
Groupie
cerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it is
 
cerement's Avatar
 
Posts: 170
Karma: 2000
Join Date: Apr 2008
Location: San José, CA
Device: Amazon Kindle 1, Sony PRS-300, Amazon Kindle 3
Quote:
Originally Posted by mikecook View Post
If we're truthful with ourselves, Marcello's work on these conversions is actually a waste time! That doesn't mean I don't think he nor PG are doing a great job, but I believe they are focusing their efforts on the wrong thing.

Now it seems that for several/many years there has been discussions within the PG community for a 'Master Format', but the powers-that-be kept refusing ...
With the addition of ePub and Mobi to existing HTML, does this mean Marcello has effectively abandoned PGTEI?
cerement is offline   Reply With Quote
Old 04-01-2009, 05:36 PM   #82
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
Quote:
Originally Posted by mikecook View Post
But wouldn't the adoption of eBooks be expediated if the books actually looked bloody nice? Let's face it, the core PG format, being of Plain Vanilla ASCII, does not give nice looking books. Readable yes, nice...not a chance.

Please note, the content of my post was about Project Gutenberg, not the publishing industry.
But the point made by kovidgoyal was actuallt the reasoning that Michael Hart used when he decided to use ordinary text format for project Gutenberg.
tompe is offline   Reply With Quote
Old 04-01-2009, 06:57 PM   #83
mikecook
Enthusiast
mikecook began at the beginning.
 
mikecook's Avatar
 
Posts: 35
Karma: 10
Join Date: Jun 2007
Location: United Kingdom
Device: iPad Mini, Nexus 7, Sony Reader, Kindle, and others.
Quote:
Originally Posted by tompe View Post
But the point made by kovidgoyal was actuallt the reasoning that Michael Hart used when he decided to use ordinary text format for project Gutenberg.
Sure, and for the 30 years from 1971, fair enough. Problem is, this is now 2009 not 2001!

@cerement
Sometime back Marcello was working on a new pg2tei version but I've not heard anything since. Maybe he realised full automation to TEI was impossible...there will always be some manual work.
mikecook is offline   Reply With Quote
Old 04-01-2009, 11:16 PM   #84
cerement
Groupie
cerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it is
 
cerement's Avatar
 
Posts: 170
Karma: 2000
Join Date: Apr 2008
Location: San José, CA
Device: Amazon Kindle 1, Sony PRS-300, Amazon Kindle 3
Quote:
Originally Posted by mikecook View Post
Sometime back Marcello was working on a new pg2tei version but I've not heard anything since. Maybe he realised full automation to TEI was impossible...there will always be some manual work.
I wasn't talking about the "pg2tei" conversion, I was talking about the PGTEI format. With the emphasis on ePub and Mobi, I was wondering if Marcello and/or Project Gutenberg had effectively abandoned TEI as a format (or even as a "master format")?
cerement is offline   Reply With Quote
Old 04-01-2009, 11:28 PM   #85
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,293
Karma: 27111240
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by cerement View Post
I wasn't talking about the "pg2tei" conversion, I was talking about the PGTEI format. With the emphasis on ePub and Mobi, I was wondering if Marcello and/or Project Gutenberg had effectively abandoned TEI as a format (or even as a "master format")?
What would be the advantage for PG in adopting a highly structured master format? Remeber that the main driver for the growth of PG is contributions from volunteers. Without specialised tools to generate the master format, expecting volunteers to produce it would be, to put it mildly, overly optimistic. And remeber that one time contribution isn't the end of it either, texts are often updated and new versions uploaded, so the master format has to remain easy to edit.
kovidgoyal is offline   Reply With Quote
Old 04-02-2009, 03:20 AM   #86
cerement
Groupie
cerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it is
 
cerement's Avatar
 
Posts: 170
Karma: 2000
Join Date: Apr 2008
Location: San José, CA
Device: Amazon Kindle 1, Sony PRS-300, Amazon Kindle 3
Quote:
Originally Posted by kovidgoyal View Post
What would be the advantage for PG in adopting a highly structured master format?
A "master format" simplifies and speeds up automated conversion processes drastically. Adobe Photoshop doesn't have a separate process for converting from each colorspace to any other colorspace. Photoshop's "master format" is Lab. All color conversions are done from colorspace to Lab, and then from Lab to the new colorspace.

Generating plaintext from XML is trivial, generating pretty-printed plaintext from XML is just as easy. Generating whichever XML variant from plaintext is harder and relies on the volunteers, but when you already have volunteers churning out half a dozen formats, some automated, some not, choosing a secondary "master format" (since primary is plaintext) would focus the volunteer work and allow easier automatic generation of multiple output formats (a la Feedbooks).

As an example: imagine if Calibre allowed 6 formats, both as input and as generated output. With an internal "master format", you need 12 conversion templates, 6 for input format to master format, and 6 for master format to output format. Without a master format, that would be 30 conversion templates (excluding a self-to-self conversion). With a master format, adding a 7th format would mean only adding 2 new conversion templates. Without would mean adding 12 new conversion templates.

Quote:
Originally Posted by kovidgoyal View Post
Without specialised tools to generate the master format, expecting volunteers to produce it would be, to put it mildly, overly optimistic.
That had been the initial idea behind Marcello creating the PGTEI DTD based off of TEI-Lite. PGTEI was close enough to HTML to be familiar to most PG users and could be either easily handcoded or converted from (X)HTML with minimal search-and-replace.

And as mikecook mentioned, there's certainly been plenty of arguments already, pro and con, for PG to adopt a master format. Currently, it looks like that master format is HTML for generating ePub, Mobi, and Plucker, but PG's HTML file quality varies even more randomly than their plaintext quality.
cerement is offline   Reply With Quote
Old 04-02-2009, 09:16 AM   #87
mikecook
Enthusiast
mikecook began at the beginning.
 
mikecook's Avatar
 
Posts: 35
Karma: 10
Join Date: Jun 2007
Location: United Kingdom
Device: iPad Mini, Nexus 7, Sony Reader, Kindle, and others.
What you must realise Kovid is that over 50% of the current gutenberg.org titles have come from the various Distributed Proofreaders channels, where they use there own software to allow people to proof read - no technical knowledge required.

Although I don't know the exact numbers, I don't believe too many new PG titles have come from outside the DP websites within the last few years. DP have had various discussions on using TEI as a master, but had said that it was not good enough to record the books properly...like .TXT is!

Saying that, they have produced quite a few TEI versions over the last couple of years, I'm just not sure whether they are from individuals or their own system.

@cerement
I think the idea was to use pg2tei to convert the back catalogue. I presume the DTD was for newer books.

I don't know if he ever had plans for this to be used as a master, or whether is was just another format to add to the archives, like they now have with EPUB and MOBI.
mikecook is offline   Reply With Quote
Old 04-02-2009, 01:15 PM   #88
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,293
Karma: 27111240
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by mikecook View Post
Although I don't know the exact numbers, I don't believe too many new PG titles have come from outside the DP websites within the last few years. DP have had various discussions on using TEI as a master, but had said that it was not good enough to record the books properly...like .TXT is!
By which I assume that they mean TEI is not expressive enough?
kovidgoyal is offline   Reply With Quote
Old 04-02-2009, 02:58 PM   #89
Highlander
Junior Member
Highlander began at the beginning.
 
Posts: 1
Karma: 28
Join Date: Apr 2009
Device: none
Quote:
Originally Posted by cerement View Post
With the addition of ePub and Mobi to existing HTML, does this mean Marcello has effectively abandoned PGTEI?
PGTEI is at one end of the production chain, while EPUB is at the other end: PGTEI -> XHTML -> (EPUB | Mobi | Plucker). PGTEI 0.5 will be released soon after the EPUB / Mobi / Plucker conversion is working.

--
Marcello
Highlander is offline   Reply With Quote
Old 04-02-2009, 04:33 PM   #90
cerement
Groupie
cerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it is
 
cerement's Avatar
 
Posts: 170
Karma: 2000
Join Date: Apr 2008
Location: San José, CA
Device: Amazon Kindle 1, Sony PRS-300, Amazon Kindle 3
Quote:
Originally Posted by Highlander View Post
PGTEI is at one end of the production chain, while EPUB is at the other end: PGTEI -> XHTML -> (EPUB | Mobi | Plucker). PGTEI 0.5 will be released soon after the EPUB / Mobi / Plucker conversion is working.
Sweet! Thanks for the info!
cerement is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Project Gutenberg Australia ballast Deals and Resources (No Self-Promotion or Affiliate Links) 9 07-31-2010 04:18 PM
Project Gutenberg levi_john Workshop 17 07-26-2010 06:02 PM
How are the mobi and epub files at Project Gutenberg? ficbot General Discussions 2 04-16-2010 06:57 PM
What's wrong with Project Gutenberg? mtravellerh News 13 04-22-2009 03:17 AM


All times are GMT -4. The time now is 07:28 AM.


MobileRead.com is a privately owned, operated and funded community.