09-30-2007, 06:21 PM | #31 |
Feedbooks.com Co-Founder
Posts: 2,263
Karma: 145123
Join Date: Nov 2006
Location: Paris, France
Device: Sony PRS-t-1/350/300/500/505/600/700, Nexus S, iPad
|
Here's a screenshot in FBReader. CSS and TOC are not yet supported in FBReader but overall, it works fine (I love the fact that hyphenation is software based on FBReader).
For those of you using an iLiad, this should be sweet: you'll be able to directly download our epub files using our iLiad software and open it thanks to the next port of FBReader. |
09-30-2007, 08:35 PM | #32 | |
Grand Sorcerer
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
Quote:
Dale |
|
10-01-2007, 12:07 PM | #33 |
Banned
Posts: 269
Karma: -273
Join Date: Sep 2006
Location: los angeles
|
hadrien, your books look very nice! congratulations!
on average, how long does it take you to work up a book, say from project gutenberg, to put into your database? 5-10 minutes, 15-30 minutes, 1-2 hours, 2-4 hours? -bowerbird Last edited by bowerbird; 10-01-2007 at 12:09 PM. |
10-01-2007, 01:30 PM | #34 | |
Feedbooks.com Co-Founder
Posts: 2,263
Karma: 145123
Join Date: Nov 2006
Location: Paris, France
Device: Sony PRS-t-1/350/300/500/505/600/700, Nexus S, iPad
|
Quote:
The good thing is that unlike fully manually created books, as soon as we add a new output, it's available on ALL of our books (and we still get a full TOC, footnotes etc...). And we also make advanced use of the metadata: you can browse the website in many different ways, we've got an API that makes it possible for any application or website to interact with Feedbooks (our iLiad application for example) and a personal recommendation system. Anyone can contribute to adding books on Feedbooks: making the process easier will be one of our goals in the upcoming months. Next output will be something totally different, not e-paper related and it should appeal to another crowd too. |
|
10-01-2007, 02:30 PM | #35 |
Banned
Posts: 269
Karma: -273
Join Date: Sep 2006
Location: los angeles
|
hadrien-
thanks... i did notice that, on the older project gutenberg e-texts, which used all-upper-case to indicate italics, you haven't fixed that... where can i get information on your a.p.i. for external apps? -bowerbird |
10-01-2007, 02:38 PM | #36 |
Groupie
Posts: 189
Karma: 793
Join Date: Oct 2006
|
Out of interest (I've just been spending way too much time restoring the accents in the PG text of Nostromo)). Do you have dictionary software that will restore accents automatically?
|
10-01-2007, 04:08 PM | #37 | |
Feedbooks.com Co-Founder
Posts: 2,263
Karma: 145123
Join Date: Nov 2006
Location: Paris, France
Device: Sony PRS-t-1/350/300/500/505/600/700, Nexus S, iPad
|
Quote:
bowerbird: On Project Gutenberg, italics are indicated with _ not all caps. I'll take a look at what all caps is used for exactly, guess that's another thing that we could add to our preprocessing. |
|
10-01-2007, 05:52 PM | #38 |
Banned
Posts: 269
Karma: -273
Join Date: Sep 2006
Location: los angeles
|
actually, hadrien, i am extremely familiar with project gutenberg e-texts.
and the one thing i can tell you is that they're _consistently_ inconsistent. so yes, some early books used all-caps for italics, rather than underscores. and along the way, a variety of characters were used beside underscores... and up until 2003 or so, when i became a severe pain-in-the-neck to them on these issues, they didn't even feel any need to mark italics consistently... even worse, they used all-caps for bold as well, and likewise felt no need to be consistent with that either. (sometimes they didn't mark bold at all.) i know all this because i have been working for some time now on means of interpreting the p.g. e-texts in a way that restores the structural information. the same type of work you do when you put texts into your database, except i leave them as text. (so ordinary humans can continue to work with them...) i've invented a form of non-markup markup -- i call it "zen markup language", or z.m.l. (it's two steps more advanced than x.m.l.) -- where such structural information is represented by a simple set of unobtrusive light-markup rules. for instance, a regular chapter-header is preceded by 4 blank lines and followed by 2 blank lines, thus allowing a viewer-application (which i've also programmed) to automatically form a table of contents that is auto-hot-linked to the chapters... other simple rules -- easy enough to be understood by a fourth-grader -- underlie all of the other structures that are commonly found in books... you can see work that i've done, in action, by visiting this web-page: > http://z-m-l.com/go/vl3.pl you'll be particular interested in the "test-suite" and "rules" examples... i believe intelligent viewer-programs intepreting plain-ascii input e-texts and presenting them in typographically-sophisticated ways is _the_ future. the publishing companies, of course, in an attempt to raise the cost of entry, will try to force e-books into the complexity of heavy-markup, but i believe the revolution into self-publishing will push back with light-markup systems. authors don't want to battle steep learning curves. they just want to write... -bowerbird |
10-01-2007, 07:33 PM | #39 |
Connoisseur
Posts: 66
Karma: 614
Join Date: Jul 2007
Location: New York
Device: Sony PRS-505, iLiad Book Edition
|
There may actually be some consistency, at least, in PG's inconsistency. In some texts, they seem to distinguish between italics used in the original for emphasis, represented in the PG text by all caps, and italics used for other purposes (setting off foreign words and phrases, titles, etc), represented in the PG text by fore-and-aft underscores.
PG texts also use all caps to represent original small caps and caps-and-small. |
10-01-2007, 09:36 PM | #40 |
Banned
Posts: 269
Karma: -273
Join Date: Sep 2006
Location: los angeles
|
could be. it's hard to know without looking at the scans.
and even if you have the scans, the fact that p.g. has rewrapped the text makes it hard to do the comparison. it ends up it's easier to re-o.c.r., and use the p.g. e-text to do corrections. thank goodness google is scanning... and it ends up that leaving the all-upper-case words is not all that bad. it accomplishes the emphasis purpose. but there are a raft of problems like this, such as the failure to indicate the lines that shouldn't be wrapped (e.g., in address-blocks, tables, signature-blocks, etc.) oh well, it's been a puzzle to occupy my mind... :+) -bowerbird |
10-02-2007, 01:09 AM | #41 |
Grand Sorcerer
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
Many of the problems are due to the idea that you can exchange data in text format. This is fallacious for books, particular novels where dialog is involved. Most ever book I post takes extensive looks and modification to fix things that were already supposed to be ok.
Dale |
10-02-2007, 04:40 AM | #42 |
Banned
Posts: 269
Karma: -273
Join Date: Sep 2006
Location: los angeles
|
dale, i'm not sure i understand your point. got any examples?
-bowerbird |
10-02-2007, 04:57 AM | #43 | ||
Groupie
Posts: 189
Karma: 793
Join Date: Oct 2006
|
Quote:
Quote:
Last edited by andym; 10-02-2007 at 04:59 AM. |
||
10-02-2007, 05:41 AM | #44 |
Banned
Posts: 269
Karma: -273
Join Date: Sep 2006
Location: los angeles
|
andy said:
> Though be grateful for the fact that > the text is out there at all and > you don't have to OCR it yourself! well heck, i'm _extremely_ grateful for project gutenberg. as the forerunner of _all_ the net collaboration projects, including wikipedia, it has _tremendous_ value to me... so that's first and foremost. having said that, however, o.c.r. ain't difficult these days. scanning (and all that it entails, including rounding up a hard-copy to scan) is the hardest part of the equation, and google (and others) are taking care of all that hassle. but yeah, as i said, correcting that o.c.r. is where all the p.g. e-texts will come in handy, in the next cyberlibrary. > Also you can see the issue from the point of view > of the original transcribers as well. For example > I've just been restoring the italics in the PG text of Nostromo, > and very often the transcriber users initial caps for a word > that was originally in italics - probably a more elegant and > reader-friendly solution than using forward slashes for italicized words. well, maybe. the problem is, though, that it's an ambiguous coding, so it becomes impossible to restore things to their original state... a forward-slashes method -- while maybe not "reader-friendly" -- would have at least been unambiguous enough to easily un-do... > I don't understand why you would need a new mark-up, > correctly used, html mark-up [eg h1 for the book title > h2 for the part or section title and h3 for the chapter] > gives you all the semantic information you need. well, the problem with .html is that its obtrusive markup makes it hard to maintain (e.g., correct, edit, compare, update, re-mix, etc.), as well as to read in the underlying "master" format. do a view-source on this page: > http://z-m-l.com/go/test-suite.html then compare that source-html to this page: > http://z-m-l.com/go/test-suite.zml particularly since the .zml file actually _generated_ the .html one, i think it's pretty easy to tell which file would be easier to maintain, especially with a library of thousands of e-texts (let alone millions). and then of course when you ratchet up the difficulty to the level of the .epub format, where each e-text file needs accompanying files, you're just asking for trouble. in my view, complex formats like that are simply the old-guard dinosaur publishing-houses attempting to raise the cost-of-entry for us "amateur" newbies, whose new capacity for self-publishing will totally and completely subvert their business. they're attempting to find a way to maintain their status as middlemen, so they can continue to siphon off a good percentage of the revenue... > Personally I believe that plain vanilla html > (or its baby siblings markdown, textile etc) is the new ascii. markdown and textfile are both light-markup systems, and thus of the same type as my zen markup language. (except my z.m.l. is even less obtrusive than they are.) but yes, this is the way of the future. authors want to write, not be caught up in unnecessary complexities of file-formats. -bowerbird |
10-02-2007, 10:41 AM | #45 | |
Grand Sorcerer
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
Quote:
Other dialog problems include accent marks and trying to show dialects in the text. These are tough with a full font collection and are made much more difficult using only ascii characters. Bold, italics and special symbols get lost in translation to ascii. Surely you have noticed this. Many period books use unusual spelling and other specialized but unusual constructions with foreign words that can fool spell checkers requiring intervention that seems not to get done in the process. Dale |
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Feedbooks epub problem? | tg2010 | ePub | 2 | 12-28-2009 05:30 AM |
ePub on the iPhone with Stanza/Feedbooks | Hadrien | Apple Devices | 70 | 11-21-2008 12:15 PM |
O'Reilly to support multi-format e-books, goes ePub | Alexander Turcic | News | 30 | 06-20-2008 10:58 PM |
Mobipocket/Kindle support on Feedbooks | Hadrien | Deals and Resources (No Self-Promotion or Affiliate Links) | 19 | 12-20-2007 11:44 PM |
PRS-500 Template & extended font support at Feedbooks (poll) | Hadrien | Sony Reader Dev Corner | 9 | 05-12-2007 12:04 PM |