Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 09-30-2007, 06:21 PM   #31
Hadrien
Feedbooks.com Co-Founder
Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.
 
Hadrien's Avatar
 
Posts: 2,265
Karma: 145123
Join Date: Nov 2006
Location: Paris, France
Device: Sony PRS-t-1/350/300/500/505/600/700, Nexus S, iPad
Here's a screenshot in FBReader. CSS and TOC are not yet supported in FBReader but overall, it works fine (I love the fact that hyphenation is software based on FBReader).

For those of you using an iLiad, this should be sweet: you'll be able to directly download our epub files using our iLiad software and open it thanks to the next port of FBReader.
Attached Thumbnails
Click image for larger version

Name:	FB.jpg
Views:	257
Size:	197.3 KB
ID:	5980  
Hadrien is offline   Reply With Quote
Old 09-30-2007, 08:35 PM   #32
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 9,616
Karma: 5071748
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2
Quote:
Originally Posted by Hadrien View Post
OK, I'll try it with IE too, see if there's any error in the xml file, but it should be 100% valid xhtml now that I've fixed everything and run it through tidy.
You might try renaming the file to a .xhtml extension and see if IE likes it better. Raw xml needs more that just a stylesheet to be read correctly.

Dale
DaleDe is offline   Reply With Quote
Old 10-01-2007, 12:07 PM   #33
bowerbird
Banned
bowerbird has been very, very naughtybowerbird has been very, very naughtybowerbird has been very, very naughty
 
Posts: 269
Karma: -273
Join Date: Sep 2006
Location: los angeles
hadrien, your books look very nice! congratulations!

on average, how long does it take you to work up a book,
say from project gutenberg, to put into your database?
5-10 minutes, 15-30 minutes, 1-2 hours, 2-4 hours?

-bowerbird

Last edited by bowerbird; 10-01-2007 at 12:09 PM.
bowerbird is offline   Reply With Quote
Old 10-01-2007, 01:30 PM   #34
Hadrien
Feedbooks.com Co-Founder
Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.
 
Hadrien's Avatar
 
Posts: 2,265
Karma: 145123
Join Date: Nov 2006
Location: Paris, France
Device: Sony PRS-t-1/350/300/500/505/600/700, Nexus S, iPad
Quote:
Originally Posted by bowerbird View Post
hadrien, your books look very nice! congratulations!

on average, how long does it take you to work up a book,
say from project gutenberg, to put into your database?
5-10 minutes, 15-30 minutes, 1-2 hours, 2-4 hours?

-bowerbird
It's a 5-15mn thing... Unless you're adding War & Peace or Les Misérables of course ^_^

The good thing is that unlike fully manually created books, as soon as we add a new output, it's available on ALL of our books (and we still get a full TOC, footnotes etc...). And we also make advanced use of the metadata: you can browse the website in many different ways, we've got an API that makes it possible for any application or website to interact with Feedbooks (our iLiad application for example) and a personal recommendation system.

Anyone can contribute to adding books on Feedbooks: making the process easier will be one of our goals in the upcoming months.

Next output will be something totally different, not e-paper related and it should appeal to another crowd too.
Hadrien is offline   Reply With Quote
Old 10-01-2007, 02:30 PM   #35
bowerbird
Banned
bowerbird has been very, very naughtybowerbird has been very, very naughtybowerbird has been very, very naughty
 
Posts: 269
Karma: -273
Join Date: Sep 2006
Location: los angeles
hadrien-

thanks...

i did notice that, on the older project gutenberg e-texts, which
used all-upper-case to indicate italics, you haven't fixed that...

where can i get information on your a.p.i. for external apps?

-bowerbird
bowerbird is offline   Reply With Quote
Old 10-01-2007, 02:38 PM   #36
andym
Groupie
andym has learned how to read e-booksandym has learned how to read e-booksandym has learned how to read e-booksandym has learned how to read e-booksandym has learned how to read e-booksandym has learned how to read e-booksandym has learned how to read e-books
 
Posts: 189
Karma: 793
Join Date: Oct 2006
Quote:
Originally Posted by Hadrien View Post
It's a 5-15mn thing... Unless you're adding War & Peace or Les Misérables of course ^_^.
Out of interest (I've just been spending way too much time restoring the accents in the PG text of Nostromo)). Do you have dictionary software that will restore accents automatically?
andym is offline   Reply With Quote
Old 10-01-2007, 04:08 PM   #37
Hadrien
Feedbooks.com Co-Founder
Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.
 
Hadrien's Avatar
 
Posts: 2,265
Karma: 145123
Join Date: Nov 2006
Location: Paris, France
Device: Sony PRS-t-1/350/300/500/505/600/700, Nexus S, iPad
Quote:
Originally Posted by andym View Post
Out of interest (I've just been spending way too much time restoring the accents in the PG text of Nostromo)). Do you have dictionary software that will restore accents automatically?
Well... We're using a dictionnary for hyphenation on PDF files. We're not changing any accents yet, guess it could be added on our todo list for preprocessing with also curly quotes.

bowerbird: On Project Gutenberg, italics are indicated with _ not all caps. I'll take a look at what all caps is used for exactly, guess that's another thing that we could add to our preprocessing.
Hadrien is offline   Reply With Quote
Old 10-01-2007, 05:52 PM   #38
bowerbird
Banned
bowerbird has been very, very naughtybowerbird has been very, very naughtybowerbird has been very, very naughty
 
Posts: 269
Karma: -273
Join Date: Sep 2006
Location: los angeles
actually, hadrien, i am extremely familiar with project gutenberg e-texts.
and the one thing i can tell you is that they're _consistently_ inconsistent.

so yes, some early books used all-caps for italics, rather than underscores.
and along the way, a variety of characters were used beside underscores...
and up until 2003 or so, when i became a severe pain-in-the-neck to them
on these issues, they didn't even feel any need to mark italics consistently...

even worse, they used all-caps for bold as well, and likewise felt no need
to be consistent with that either. (sometimes they didn't mark bold at all.)

i know all this because i have been working for some time now on means of
interpreting the p.g. e-texts in a way that restores the structural information.
the same type of work you do when you put texts into your database, except
i leave them as text. (so ordinary humans can continue to work with them...)

i've invented a form of non-markup markup -- i call it "zen markup language",
or z.m.l. (it's two steps more advanced than x.m.l.) -- where such structural
information is represented by a simple set of unobtrusive light-markup rules.

for instance, a regular chapter-header is preceded by 4 blank lines and followed
by 2 blank lines, thus allowing a viewer-application (which i've also programmed)
to automatically form a table of contents that is auto-hot-linked to the chapters...

other simple rules -- easy enough to be understood by a fourth-grader --
underlie all of the other structures that are commonly found in books...
you can see work that i've done, in action, by visiting this web-page:
> http://z-m-l.com/go/vl3.pl
you'll be particular interested in the "test-suite" and "rules" examples...

i believe intelligent viewer-programs intepreting plain-ascii input e-texts
and presenting them in typographically-sophisticated ways is _the_ future.

the publishing companies, of course, in an attempt to raise the cost of entry,
will try to force e-books into the complexity of heavy-markup, but i believe
the revolution into self-publishing will push back with light-markup systems.
authors don't want to battle steep learning curves. they just want to write...

-bowerbird
bowerbird is offline   Reply With Quote
Old 10-01-2007, 07:33 PM   #39
akiburis
Connoisseur
akiburis will become famous soon enoughakiburis will become famous soon enoughakiburis will become famous soon enoughakiburis will become famous soon enoughakiburis will become famous soon enoughakiburis will become famous soon enough
 
Posts: 67
Karma: 614
Join Date: Jul 2007
Location: New York
Device: Sony PRS-505, iLiad Book Edition
There may actually be some consistency, at least, in PG's inconsistency. In some texts, they seem to distinguish between italics used in the original for emphasis, represented in the PG text by all caps, and italics used for other purposes (setting off foreign words and phrases, titles, etc), represented in the PG text by fore-and-aft underscores.

PG texts also use all caps to represent original small caps and caps-and-small.
akiburis is offline   Reply With Quote
Old 10-01-2007, 09:36 PM   #40
bowerbird
Banned
bowerbird has been very, very naughtybowerbird has been very, very naughtybowerbird has been very, very naughty
 
Posts: 269
Karma: -273
Join Date: Sep 2006
Location: los angeles
could be. it's hard to know without looking at the scans.
and even if you have the scans, the fact that p.g. has
rewrapped the text makes it hard to do the comparison.
it ends up it's easier to re-o.c.r., and use the p.g. e-text
to do corrections. thank goodness google is scanning...

and it ends up that leaving the all-upper-case words is
not all that bad. it accomplishes the emphasis purpose.

but there are a raft of problems like this, such as the
failure to indicate the lines that shouldn't be wrapped
(e.g., in address-blocks, tables, signature-blocks, etc.)

oh well, it's been a puzzle to occupy my mind... :+)

-bowerbird
bowerbird is offline   Reply With Quote
Old 10-02-2007, 01:09 AM   #41
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 9,616
Karma: 5071748
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2
Many of the problems are due to the idea that you can exchange data in text format. This is fallacious for books, particular novels where dialog is involved. Most ever book I post takes extensive looks and modification to fix things that were already supposed to be ok.

Dale
DaleDe is offline   Reply With Quote
Old 10-02-2007, 04:40 AM   #42
bowerbird
Banned
bowerbird has been very, very naughtybowerbird has been very, very naughtybowerbird has been very, very naughty
 
Posts: 269
Karma: -273
Join Date: Sep 2006
Location: los angeles
dale, i'm not sure i understand your point. got any examples?

-bowerbird
bowerbird is offline   Reply With Quote
Old 10-02-2007, 04:57 AM   #43
andym
Groupie
andym has learned how to read e-booksandym has learned how to read e-booksandym has learned how to read e-booksandym has learned how to read e-booksandym has learned how to read e-booksandym has learned how to read e-booksandym has learned how to read e-books
 
Posts: 189
Karma: 793
Join Date: Oct 2006
Quote:
Originally Posted by bowerbird View Post
actually, hadrien, i am extremely familiar with project gutenberg e-texts.
and the one thing i can tell you is that they're _consistently_ inconsistent.

so yes, some early books used all-caps for italics, rather than underscores.
and along the way, a variety of characters were used beside underscores...
and up until 2003 or so, when i became a severe pain-in-the-neck to them
on these issues, they didn't even feel any need to mark italics consistently...

even worse, they used all-caps for bold as well, and likewise felt no need
to be consistent with that either. (sometimes they didn't mark bold at all.)
Amen to all of that. Though be grateful for the fact that the text is out there at all and you don't have to OCR it yoursel! Also you can see the issue from the point of view of the original transcribers as well. For example I've just been restoring the italics in the PG text of Nostromo, and very often the transcriber users initial caps for a word that was originally in italics - probably a more elegant and reader-friendly solution than using forward slashes for italicized words.

Quote:
i've invented a form of non-markup markup -- i call it "zen markup language",
or z.m.l. (it's two steps more advanced than x.m.l.) -- where such structural
information is represented by a simple set of unobtrusive light-markup rules.

for instance, a regular chapter-header is preceded by 4 blank lines and followed
by 2 blank lines, thus allowing a viewer-application (which i've also programmed)
to automatically form a table of contents that is auto-hot-linked to the chapters...

other simple rules -- easy enough to be understood by a fourth-grader --
underlie all of the other structures that are commonly found in books...
you can see work that i've done, in action, by visiting this web-page:
> http://z-m-l.com/go/vl3.pl
you'll be particular interested in the "test-suite" and "rules" examples...

i believe intelligent viewer-programs intepreting plain-ascii input e-texts
and presenting them in typographically-sophisticated ways is _the_ future.

the publishing companies, of course, in an attempt to raise the cost of entry,
will try to force e-books into the complexity of heavy-markup, but i believe
the revolution into self-publishing will push back with light-markup systems.
authors don't want to battle steep learning curves. they just want to write...

-bowerbird
I don't understand why you would need a new mark-up, correctly used, html mark-up [eg h1 for the book title h2 for the part or section title and h3 for the chapter] gives you all the semantic information you need. (Poetry is another story). Personally I believe that plain vanilla html (or its baby siblings markdown, textile etc) is the new ascii.

Last edited by andym; 10-02-2007 at 04:59 AM.
andym is offline   Reply With Quote
Old 10-02-2007, 05:41 AM   #44
bowerbird
Banned
bowerbird has been very, very naughtybowerbird has been very, very naughtybowerbird has been very, very naughty
 
Posts: 269
Karma: -273
Join Date: Sep 2006
Location: los angeles
andy said:
> Though be grateful for the fact that
> the text is out there at all and
> you don't have to OCR it yourself!

well heck, i'm _extremely_ grateful for project gutenberg.
as the forerunner of _all_ the net collaboration projects,
including wikipedia, it has _tremendous_ value to me...

so that's first and foremost.

having said that, however, o.c.r. ain't difficult these days.
scanning (and all that it entails, including rounding up
a hard-copy to scan) is the hardest part of the equation,
and google (and others) are taking care of all that hassle.

but yeah, as i said, correcting that o.c.r. is where all the
p.g. e-texts will come in handy, in the next cyberlibrary.


> Also you can see the issue from the point of view
> of the original transcribers as well. For example
> I've just been restoring the italics in the PG text of Nostromo,
> and very often the transcriber users initial caps for a word
> that was originally in italics - probably a more elegant and
> reader-friendly solution than using forward slashes for italicized words.

well, maybe. the problem is, though, that it's an ambiguous coding,
so it becomes impossible to restore things to their original state...

a forward-slashes method -- while maybe not "reader-friendly" --
would have at least been unambiguous enough to easily un-do...


> I don't understand why you would need a new mark-up,
> correctly used, html mark-up [eg h1 for the book title
> h2 for the part or section title and h3 for the chapter]
> gives you all the semantic information you need.

well, the problem with .html is that its obtrusive markup makes it
hard to maintain (e.g., correct, edit, compare, update, re-mix, etc.),
as well as to read in the underlying "master" format.

do a view-source on this page:
> http://z-m-l.com/go/test-suite.html

then compare that source-html to this page:
> http://z-m-l.com/go/test-suite.zml

particularly since the .zml file actually _generated_ the .html one,
i think it's pretty easy to tell which file would be easier to maintain,
especially with a library of thousands of e-texts (let alone millions).

and then of course when you ratchet up the difficulty to the level of
the .epub format, where each e-text file needs accompanying files,
you're just asking for trouble. in my view, complex formats like that
are simply the old-guard dinosaur publishing-houses attempting to
raise the cost-of-entry for us "amateur" newbies, whose new capacity
for self-publishing will totally and completely subvert their business.
they're attempting to find a way to maintain their status as middlemen,
so they can continue to siphon off a good percentage of the revenue...


> Personally I believe that plain vanilla html
> (or its baby siblings markdown, textile etc) is the new ascii.

markdown and textfile are both light-markup systems,
and thus of the same type as my zen markup language.
(except my z.m.l. is even less obtrusive than they are.)

but yes, this is the way of the future. authors want to write,
not be caught up in unnecessary complexities of file-formats.

-bowerbird
bowerbird is offline   Reply With Quote
Old 10-02-2007, 10:41 AM   #45
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 9,616
Karma: 5071748
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2
Quote:
Originally Posted by bowerbird View Post
dale, i'm not sure i understand your point. got any examples?

-bowerbird
For example dialogs often attempt to indicate pauses and interruptions. In some cases this is done with a dash symbol. In the recent biography of Buffalo Bill that I just posted the source used hyphens for everything. In some cases I have seen pg books with double hyphens which is easy to deal with but in this book only single hyphens were used everywhere. The book was a mess of real hyphens needed for compound words and hyphens used when a dash was needed. I had to manually find every instance and make a decision in each case. In formatting a hyphen will hyphenate at the end of a line but a dash typically will not, causing ugly breaks in the text flow.

Other dialog problems include accent marks and trying to show dialects in the text. These are tough with a full font collection and are made much more difficult using only ascii characters. Bold, italics and special symbols get lost in translation to ascii. Surely you have noticed this.

Many period books use unusual spelling and other specialized but unusual constructions with foreign words that can fool spell checkers requiring intervention that seems not to get done in the process.

Dale
DaleDe is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Feedbooks epub problem? tg2010 ePub 2 12-28-2009 05:30 AM
ePub on the iPhone with Stanza/Feedbooks Hadrien Apple Devices 70 11-21-2008 12:15 PM
O'Reilly to support multi-format e-books, goes ePub Alexander Turcic News 30 06-20-2008 10:58 PM
Mobipocket/Kindle support on Feedbooks Hadrien Deals, Freebies, and Resources (No Self-Promotion) 19 12-20-2007 11:44 PM
PRS-500 Template & extended font support at Feedbooks (poll) Hadrien Sony Reader Dev Corner 9 05-12-2007 12:04 PM


All times are GMT -4. The time now is 12:02 PM.


MobileRead.com is a privately owned, operated and funded community.