View Full Version : Feedbooks to support ePub format


Bob Russell
09-18-2007, 10:59 AM
Fans of Feedbooks (http://www.feedbooks.com/) will be happy to learn that the new ePub standard (http://www.mobileread.com/forums/showthread.php?t=13729&highlight=epub)will soon be supported in addition to PDF A4, Sony Reader, iLiad and custom PDF.

Teleblog (http://www.teleread.org/blog/?p=7121) is reporting the following tidbit... "'Feedbooks will have on-the-fly .epub files this month,' reports Hadrien. 'We’ll enable this feature on books first and then on RSS feeds too. In the future, we’ll work on improving the overall look of these .epub files and would also like to add a ‘custom .epub’ feature on the website, where anyone will be able to easily customize the CSS and layout of the book.'

From the Feedbooks site, "Feedbooks is a project in development in Paris, France. Our main focus is to create a complete experience for e-ink readers and other portable reading devices... Made with e-paper devices in mind (Sony PRS-500 & iRex iLiad), Feedbooks is a complete experience for mobile reading. Anyone can easily publish content, and customize the formatting."

Hadrien
09-26-2007, 08:45 AM
Still a work in progress, but here's a sample file that our development website can currently generate. Might have a few bugs, metadatas and the footnotes are missing, but overall it works quite well already and you get a pretty good idea of what our .epub files will look like.

Expect the epub output to be available as a public beta this week.

PS: Could Alexander or any admin add epub support for the uploads ?

ricdiogo
09-26-2007, 11:56 AM
Can't open it using Digital Editions Hadrien. I see the TOC but when I click the chapters I can't read them.

Hadrien
09-26-2007, 01:00 PM
Can't open it using Digital Editions Hadrien. I see the TOC but when I click the chapters I can't read them.

Oh I've tried it and it works, but Adobe DE is a very slow and inefficient software, and War & Peace is 350+ chapters. I'll replace it with something lighter.

wallcraft
09-26-2007, 10:21 PM
Having .epub extension files available for download would help with web-based epub readers like OpenBerg Lector (http://www.mobileread.com/forums/showthread.php?t=14212) (and Digital Editions for that matter).

There is a real need for a ePub compliance checker. I'm not sure Dostoyevsky-The_Gambler is a valid .epub. It fails using Lector for example. FBReader can read it, but it currently lacks TOC support which is a bit limiting.

Hadrien
09-27-2007, 01:26 AM
Having .epub extension files available for download would help with web-based epub readers like OpenBerg Lector (http://www.mobileread.com/forums/showthread.php?t=14212) (and Digital Editions for that matter).

There is a real need for a ePub compliance checker. I'm not sure Dostoyevsky-The_Gambler is a valid .epub. It fails using Lector for example. FBReader can read it, but it currently lacks TOC support which is a bit limiting.

Well, we don't have any validation tool yet, so we created our files based on the examples provided by IDPF and the docs mostly.
They work in DE, although DE is kinda slow if your .xml file is too big. Haven't tried yet with FBReader but FBReader currently lacks both TOC and CSS support which is quite limiting.
We're correcting a few things and we'll add this output on the website for beta testing (I guess today or tomorrow).

Keep in mind that this is pure beta testing for the moment, if you find any problem with the files, we'll correct them.

PS: Just tried OpenBerg and found a small bug in our file. Corrected the whole thing manually and now it opens perfectly in OpenBerg.

Hadrien
09-27-2007, 12:23 PM
Ok. Here's another screenshot, in OpenBerg this time.

I can already notice a problem with DE: I had to put my paragraphs in <p></p> but also add a <br /> tag between them for DE. In OpenBerg, it is displayed correctly, which mean that I get blank lines (that I don't want in my file).

DaleDe
09-27-2007, 12:31 PM
Well, we don't have any validation tool yet, so we created our files based on the examples provided by IDPF and the docs mostly.
They work in DE, although DE is kinda slow if your .xml file is too big. Haven't tried yet with FBReader but FBReader currently lacks both TOC and CSS support which is quite limiting.
We're correcting a few things and we'll add this output on the website for beta testing (I guess today or tomorrow).

Keep in mind that this is pure beta testing for the moment, if you find any problem with the files, we'll correct them.

PS: Just tried OpenBerg and found a small bug in our file. Corrected the whole thing manually and now it opens perfectly in OpenBerg.

I just tried The Gambler (I don't know where OpenBerg is). I looked at it in digital editions and I converted it to my eb1150 as a test. Do you mind if I finish the conversion to eb1150 and then post it at mobileread? Here are some comments.

1. The digital editions is really slow to bring up the document.
2. I was surprise the zip container had no compression. Was this done to try and increase speed? Normally you should be able to publish with the .epub extension with full zip compression.
3. The file extensions were xml instead of xhtml. Is there a reason for this? It might slow down the processing but I am not sure.
4. The note.xml file was included but I do not think it was referenced anywhere.
5. The final feedbooks page has a very light graphic. Did this come across ok in the readers you are targeting? It looked too light when converted to 16 level gray scale on my device. I had to darken it a bit.
6. The TOC did not come across to my eb1150 version but I can regenerate it manually. It is probably a limit in my system. It worked ok in digital editions.
7. There are a lot of " marks with space on both sides in the document. This is one of the nits that I hate. I corrected it in my copy of the xml file and can send it to you if you wish.
8. Are you planning to fix the quotes to curly quotes?
9. There are some characters in the file that my reader did not recognize but that may just be my reader.

I hope this helps.
Dale

Hadrien
09-27-2007, 12:45 PM
I just tried The Gambler (I don't know where OpenBerg is). I looked at it in digital editions and I converted it to my eb1150 as a test. Do you mind if I finish the conversion to eb1150 and then post it at mobileread? Here are some comments.

1. The digital editions is really slow to bring up the document.
2. I was surprise the zip container had no compression. Was this done to try and increase speed? Normally you should be able to publish with the .epub extension with full zip compression.
3. The file extensions were xml instead of xhtml. Is there a reason for this? It might slow down the processing but I am not sure.
4. The note.xml file was included but I do not think it was referenced anywhere.
5. The final feedbooks page has a very light graphic. Did this come across ok in the readers you are targeting? It looked too light when converted to 16 level gray scale on my device. I had to darken it a bit.
6. The TOC did not come across to my eb1150 version but I can regenerate it manually. It is probably a limit in my system. It worked ok in digital editions.
7. There are a lot of " marks with space on both sides in the document. This is one of the nits that I hate. I corrected it in my copy of the xml file and can send it to you if you wish.
8. Are you planning to fix the quotes to curly quotes?
9. There are some characters in the file that my reader did not recognize but that may just be my reader.

I hope this helps.
Dale

Hello Dale,

1) I noticed that we had a few tags that were not correctly closed, that's why it wasn't working on OpenBerg but only 1-2 were concerned and not in the main text. DE parser seems to be really bad, no speed on both OpenBerg and FBReader.
2) You're not supposed to compress the files, zip is just a container here. Some software might support compression, but it might slow down the whole thing and we have to make those files fully compliant (that's what a beta test is for...).
3) I don't think that .xml or .xhtml changes anything.
4) We left a few files from the example we used to create our files (myantonia), that's why there's a few metadata and files left. We're removing all references now and cleaning the whole thing (I took this file while developing the output).
5) The graphic looked ok on my Sony Reader. Might darken it a bit though...
6) TOC worked perfectly with all the files I've tested but only DE seems to support TOC.
7) This is how the text itself was uploaded, I guess we could add another layer of formatting though, and this is a constant work in progress.
8) Same here, if we could easily identify the curly quotes we could add this. In some situation, it's hard to automatically fix this.
9) Haven't seen any unusual characters using DE or OpenBerg. Might be linked to the fact that ePub is fully UTF-8 ?

Post any conversion that you want to. Did you manually converted the file ? Automatically ? Which output are you using ?

Thanks for the feedback,
I'll post an updated file soon.

DaleDe
09-27-2007, 02:11 PM
Hello Dale,
Post any conversion that you want to. Did you manually converted the file ? Automatically ? Which output are you using ?

Thanks for the feedback,
I'll post an updated file soon.

I manually converted the file but it wasn't much effort. I have been exploring the need to support epub and in spite of what the technical releases and publications put out on epub ebook technologies does not directly support epub on any existing products. They probably will on some new version that they have in their labs but not on anything they have released.

Here is what I did to convert the file.
1. unzipped the container.
2. created my own opf file as ebook publisher 2.2.5 won't read an epub one.
3. renamed all the extensions from .xml to .odf (html would also have worked).
4. imported the documents and arranged them. It then compiled ok with some warnings.
5. corrected css files that import other files. My tools will not accept quotes around the imported filenames.

It compiled ok with a few known warnings on unsupported structures. ebook publisher is back in the dark ages for support of css.

I will need to build a toc to make it releasable by me but it looks good so far. I also need to correct some spacing problems on the title page but the data is ok.

My chapter titles do not have the box around them but otherwise my book looks similar to the one you produced. My paragraphs are not indented but I can fix this.

Dale

wallcraft
09-27-2007, 03:27 PM
You're not supposed to compress the files, zip is just a container here. Everything EXCEPT the first file (mimetype) can be, and will typically be, compressed. Any reader that supports ePub will support compressed content. The requirement that mimetype not be compressed is so that programs like Linux's file command can recognize the type by looking at a file's first few bytes. See the .epub format (http://www.mobileread.com/forums/showthread.php?t=11784) thread for more useful links.

Hadrien
09-27-2007, 04:59 PM
Everything EXCEPT the first file (mimetype) can be, and will typically be, compressed. Any reader that supports ePub will support compressed content. The requirement that mimetype not be compressed is so that programs like Linux's file command can recognize the type by looking at a file's first few bytes. See the .epub format (http://www.mobileread.com/forums/showthread.php?t=11784) thread for more useful links.

OK, I'll switch this to compression except for the mimetype then. Files will be much smaller this way !

JSWolf
09-27-2007, 05:44 PM
Ok. Here's another screenshot, in OpenBerg this time.

I can already notice a problem with DE: I had to put my paragraphs in <p></p> but also add a <br /> tag between them for DE. In OpenBerg, it is displayed correctly, which mean that I get blank lines (that I don't want in my file).
Does the .epub books need those line spaces between paragraphs? If so, I really won't be reading them as I find them rather annoying on a large enough screen. Can the quotes and apostrophes be converted to the curly kind?

Hadrien
09-27-2007, 05:53 PM
Does the .epub books need those line spaces between paragraphs? If so, I really won't be reading them as I find them rather annoying on a large enough screen. Can the quotes and apostrophes be converted to the curly kind?

I don't like blank lines either. Take a look at the DE screenshots that I posted.

I had to do this to get this result on DE: <p>text</p><br /><p>text2</p>

Of course, because of this, I get blank lines in OpenBerg...

Curly kind once again can be done two different ways: people uploading the text already include curly quote, or we could add some extra pre-processing on the text.

andym
09-27-2007, 06:41 PM
It all seems to be working fine on DE for me (I'm on OS X though that shouldn't make a difference).

It's late, and no doubt I'm missing something but I don't understand the problem with the spaces between paragraphs problem. Most of the epubs I have use the normal html syntax (ie no break tags), and dsiplay as they should - ie the paragraphs flowing one after another without white space. The one exception is White Fang which uses CSS styles to give an extra margin at the beginning of paragraphs. If you need to have spaces between paragraphs that seems the better way to do it.

If you have any problems or issues with DE it would be well worth posting to the Adobe DE forum - in my experience the project engineers do seem to read and respond to posts and I'm sure they'd be happy to help.

Dale

OpenBerg is here (on Sourceforge)

http://openberg.sourceforge.net/

DaleDe
09-27-2007, 06:54 PM
Does the .epub books need those line spaces between paragraphs? If so, I really won't be reading them as I find them rather annoying on a large enough screen. Can the quotes and apostrophes be converted to the curly kind?

The blank lines are put in by css processing and under control of the author (publisher). I just posted The Gambler in eb1150 format for anyone that is interested. I add the curly quotes and apostrophes and cleaned up some other nits. http://www.mobileread.com/forums/showthread.php?t=14235

Hadrien
09-27-2007, 07:29 PM
I've updated the file once again and I think I'll put the ePub output on the website now. With a larger number of books, additional bugs could pop-up and we're hunting them right now ^^

Edit: It's online: http://www.feedbooks.com/discover
No cache for the epub files for the moment, it might take a few seconds the first time that you generate a file.

DaleDe
09-27-2007, 10:44 PM
I've updated the file once again and I think I'll put the ePub output on the website now. With a larger number of books, additional bugs could pop-up and we're hunting them right now ^^

Edit: It's online: http://www.feedbooks.com/discover
No cache for the epub files for the moment, it might take a few seconds the first time that you generate a file.

Great idea. It looks like anyone can now retrieve all of the books at the site in this format. Wonderful news.

Dale

Hadrien
09-28-2007, 09:42 AM
Great idea. It looks like anyone can now retrieve all of the books at the site in this format. Wonderful news.

Dale

All of the public domain books indeed. We'll add it for user generated content and RSS feeds once we're over with this first beta for ePub.

LOL2005
09-29-2007, 05:31 AM
I have tried to make several custom pdf and always got the following error message:

File does not begin with '%PDF-'


I like the the project :book2:

Hadrien
09-29-2007, 10:02 AM
I have tried to make several custom pdf and always got the following error message:

File does not begin with '%PDF-'


I like the the project :book2:

OK. I'll fix this problem, we certainly removed or replaced something while updating the website to support ePub...

Edit: It should be working now.

LOL2005
09-29-2007, 11:12 AM
Works fine now, thanks for the very quick fix
Cheers

:clap::clap::clap:

Dan23
09-29-2007, 07:24 PM
I posted this as a comment under the Great Gatsby on feedbooks, but a number of .epub hare being generated incorrectly. Great Gatsby and Art of War are two. The link to certain chapters after a certain chapter all point towards the same incorrect point when I open those two books in Digital editions. In Great Gatsby chapter 5 and up link to the same page as page 2 in about. I was not able to find the rest of the book within digital editions. The problem is not present in the pdfs

Dan23
09-29-2007, 08:36 PM
Other books with this problem (not a comprehensive list) are:
Woolf - Mrs. Dalloway.epub
Thoreau - On the Duty of Civil Disobediance.epub
Plato - Gorgias.epub
Orwell - Animal Farm.epub
Joyce - Ulysses.epub
When I open fb.xml in opera after unzipping it gives errors of the following nature "XML parsing failed: syntax error (Line: 1369, Character: 29)" (This is for Great Gatsby)

Hadrien
09-29-2007, 11:21 PM
Other books with this problem (not a comprehensive list) are:
Woolf - Mrs. Dalloway.epub
Thoreau - On the Duty of Civil Disobediance.epub
Plato - Gorgias.epub
Orwell - Animal Farm.epub
Joyce - Ulysses.epub
When I open fb.xml in opera after unzipping it gives errors of the following nature "XML parsing failed: syntax error (Line: 1369, Character: 29)" (This is for Great Gatsby)

Tried some of these: the whole text is present in the fb.xml but there's xhtml errors at some point (usually a problem with a <p> and <b> tag or a <p> and <i> tag) that explain the errors while clicking on a chapter. I'll update the flex or run the fb.xml through tidy to get rid of these problems...

Hadrien
09-30-2007, 01:53 PM
Fixed. All the books that you listed are now correctly working in OpenBerg.

Dan23
09-30-2007, 04:37 PM
This is a totally separate issue that I just stumbled upon by accident but when I try opening any fb.xml file for any epub in IE or Maxthon (ie based) I get a message of the following nature. It does not affect Digital Editions in any way and no such errors appear in Opera:

The XML page cannot be displayed
Cannot view XML input using style sheet. Please correct the error and then click the Refresh button, or try again later.


--------------------------------------------------------------------------------

Parameter entity must be defined before it is used. Error processing resource 'http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd...

%xhtml-prefw-redecl.mod;
-^

Hadrien
09-30-2007, 04:46 PM
This is a totally separate issue that I just stumbled upon by accident but when I try opening any fb.xml file for any epub in IE or Maxthon (ie based) I get a message of the following nature. It does not affect Digital Editions in any way and no such errors appear in Opera:

The XML page cannot be displayed
Cannot view XML input using style sheet. Please correct the error and then click the Refresh button, or try again later.


--------------------------------------------------------------------------------

Parameter entity must be defined before it is used. Error processing resource 'http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd...

%xhtml-prefw-redecl.mod;
-^

Well first of all these are not supposed to be opened in IE, but in DE/OpenBerg/FBReader instead ^_^

Style sheet problem ? Maybe you just unzipped fb.xml without the stylesheet directory ?

Dan23
09-30-2007, 05:23 PM
Nope, I extracted the whole epub first and then tried opening fb.xml. I don't plan on opening it with IE in the future, but in case it was a problem I thought you should know. It is present in every ePub I tried. For all of them fb.xml opens fine in Opera and Firefox - so there may be some sort of error in the xml that Opera and Firefox overlooks but IE doesn't.

Hadrien
09-30-2007, 05:53 PM
Nope, I extracted the whole epub first and then tried opening fb.xml. I don't plan on opening it with IE in the future, but in case it was a problem I thought you should know. It is present in every ePub I tried. For all of them fb.xml opens fine in Opera and Firefox - so there may be some sort of error in the xml that Opera and Firefox overlooks but IE doesn't.

OK, I'll try it with IE too, see if there's any error in the xml file, but it should be 100% valid xhtml now that I've fixed everything and run it through tidy.

Hadrien
09-30-2007, 06:21 PM
Here's a screenshot in FBReader. CSS and TOC are not yet supported in FBReader but overall, it works fine (I love the fact that hyphenation is software based on FBReader).

For those of you using an iLiad, this should be sweet: you'll be able to directly download our epub files using our iLiad software and open it thanks to the next port of FBReader.

DaleDe
09-30-2007, 08:35 PM
OK, I'll try it with IE too, see if there's any error in the xml file, but it should be 100% valid xhtml now that I've fixed everything and run it through tidy.

You might try renaming the file to a .xhtml extension and see if IE likes it better. Raw xml needs more that just a stylesheet to be read correctly.

Dale

bowerbird
10-01-2007, 12:07 PM
hadrien, your books look very nice! congratulations!

on average, how long does it take you to work up a book,
say from project gutenberg, to put into your database?
5-10 minutes, 15-30 minutes, 1-2 hours, 2-4 hours?

-bowerbird

Hadrien
10-01-2007, 01:30 PM
hadrien, your books look very nice! congratulations!

on average, how long does it take you to work up a book,
say from project gutenberg, to put into your database?
5-10 minutes, 15-30 minutes, 1-2 hours, 2-4 hours?

-bowerbird

It's a 5-15mn thing... Unless you're adding War & Peace or Les Misérables of course ^_^

The good thing is that unlike fully manually created books, as soon as we add a new output, it's available on ALL of our books (and we still get a full TOC, footnotes etc...). And we also make advanced use of the metadata: you can browse the website in many different ways, we've got an API that makes it possible for any application or website to interact with Feedbooks (our iLiad application for example) and a personal recommendation system.

Anyone can contribute to adding books on Feedbooks: making the process easier will be one of our goals in the upcoming months.

Next output will be something totally different, not e-paper related and it should appeal to another crowd too.

bowerbird
10-01-2007, 02:30 PM
hadrien-

thanks...

i did notice that, on the older project gutenberg e-texts, which
used all-upper-case to indicate italics, you haven't fixed that...

where can i get information on your a.p.i. for external apps?

-bowerbird

andym
10-01-2007, 02:38 PM
It's a 5-15mn thing... Unless you're adding War & Peace or Les Misérables of course ^_^.

Out of interest (I've just been spending way too much time restoring the accents in the PG text of Nostromo)). Do you have dictionary software that will restore accents automatically?

Hadrien
10-01-2007, 04:08 PM
Out of interest (I've just been spending way too much time restoring the accents in the PG text of Nostromo)). Do you have dictionary software that will restore accents automatically?

Well... We're using a dictionnary for hyphenation on PDF files. We're not changing any accents yet, guess it could be added on our todo list for preprocessing with also curly quotes.

bowerbird: On Project Gutenberg, italics are indicated with _ not all caps. I'll take a look at what all caps is used for exactly, guess that's another thing that we could add to our preprocessing.

bowerbird
10-01-2007, 05:52 PM
actually, hadrien, i am extremely familiar with project gutenberg e-texts.
and the one thing i can tell you is that they're _consistently_ inconsistent.

so yes, some early books used all-caps for italics, rather than underscores.
and along the way, a variety of characters were used beside underscores...
and up until 2003 or so, when i became a severe pain-in-the-neck to them
on these issues, they didn't even feel any need to mark italics consistently...

even worse, they used all-caps for bold as well, and likewise felt no need
to be consistent with that either. (sometimes they didn't mark bold at all.)

i know all this because i have been working for some time now on means of
interpreting the p.g. e-texts in a way that restores the structural information.
the same type of work you do when you put texts into your database, except
i leave them as text. (so ordinary humans can continue to work with them...)

i've invented a form of non-markup markup -- i call it "zen markup language",
or z.m.l. (it's two steps more advanced than x.m.l.) -- where such structural
information is represented by a simple set of unobtrusive light-markup rules.

for instance, a regular chapter-header is preceded by 4 blank lines and followed
by 2 blank lines, thus allowing a viewer-application (which i've also programmed)
to automatically form a table of contents that is auto-hot-linked to the chapters...

other simple rules -- easy enough to be understood by a fourth-grader --
underlie all of the other structures that are commonly found in books...
you can see work that i've done, in action, by visiting this web-page:
> http://z-m-l.com/go/vl3.pl
you'll be particular interested in the "test-suite" and "rules" examples...

i believe intelligent viewer-programs intepreting plain-ascii input e-texts
and presenting them in typographically-sophisticated ways is _the_ future.

the publishing companies, of course, in an attempt to raise the cost of entry,
will try to force e-books into the complexity of heavy-markup, but i believe
the revolution into self-publishing will push back with light-markup systems.
authors don't want to battle steep learning curves. they just want to write...

-bowerbird

akiburis
10-01-2007, 07:33 PM
There may actually be some consistency, at least, in PG's inconsistency. In some texts, they seem to distinguish between italics used in the original for emphasis, represented in the PG text by all caps, and italics used for other purposes (setting off foreign words and phrases, titles, etc), represented in the PG text by fore-and-aft underscores.

PG texts also use all caps to represent original small caps and caps-and-small.

bowerbird
10-01-2007, 09:36 PM
could be. it's hard to know without looking at the scans.
and even if you have the scans, the fact that p.g. has
rewrapped the text makes it hard to do the comparison.
it ends up it's easier to re-o.c.r., and use the p.g. e-text
to do corrections. thank goodness google is scanning...

and it ends up that leaving the all-upper-case words is
not all that bad. it accomplishes the emphasis purpose.

but there are a raft of problems like this, such as the
failure to indicate the lines that shouldn't be wrapped
(e.g., in address-blocks, tables, signature-blocks, etc.)

oh well, it's been a puzzle to occupy my mind... :+)

-bowerbird

DaleDe
10-02-2007, 01:09 AM
Many of the problems are due to the idea that you can exchange data in text format. This is fallacious for books, particular novels where dialog is involved. Most ever book I post takes extensive looks and modification to fix things that were already supposed to be ok.

Dale

bowerbird
10-02-2007, 04:40 AM
dale, i'm not sure i understand your point. got any examples?

-bowerbird

andym
10-02-2007, 04:57 AM
actually, hadrien, i am extremely familiar with project gutenberg e-texts.
and the one thing i can tell you is that they're _consistently_ inconsistent.

so yes, some early books used all-caps for italics, rather than underscores.
and along the way, a variety of characters were used beside underscores...
and up until 2003 or so, when i became a severe pain-in-the-neck to them
on these issues, they didn't even feel any need to mark italics consistently...

even worse, they used all-caps for bold as well, and likewise felt no need
to be consistent with that either. (sometimes they didn't mark bold at all.)

Amen to all of that. Though be grateful for the fact that the text is out there at all and you don't have to OCR it yoursel! Also you can see the issue from the point of view of the original transcribers as well. For example I've just been restoring the italics in the PG text of Nostromo, and very often the transcriber users initial caps for a word that was originally in italics - probably a more elegant and reader-friendly solution than using forward slashes for italicized words.

i've invented a form of non-markup markup -- i call it "zen markup language",
or z.m.l. (it's two steps more advanced than x.m.l.) -- where such structural
information is represented by a simple set of unobtrusive light-markup rules.

for instance, a regular chapter-header is preceded by 4 blank lines and followed
by 2 blank lines, thus allowing a viewer-application (which i've also programmed)
to automatically form a table of contents that is auto-hot-linked to the chapters...

other simple rules -- easy enough to be understood by a fourth-grader --
underlie all of the other structures that are commonly found in books...
you can see work that i've done, in action, by visiting this web-page:
> http://z-m-l.com/go/vl3.pl
you'll be particular interested in the "test-suite" and "rules" examples...

i believe intelligent viewer-programs intepreting plain-ascii input e-texts
and presenting them in typographically-sophisticated ways is _the_ future.

the publishing companies, of course, in an attempt to raise the cost of entry,
will try to force e-books into the complexity of heavy-markup, but i believe
the revolution into self-publishing will push back with light-markup systems.
authors don't want to battle steep learning curves. they just want to write...

-bowerbird

I don't understand why you would need a new mark-up, correctly used, html mark-up [eg h1 for the book title h2 for the part or section title and h3 for the chapter] gives you all the semantic information you need. (Poetry is another story). Personally I believe that plain vanilla html (or its baby siblings markdown, textile etc) is the new ascii.

bowerbird
10-02-2007, 05:41 AM
andy said:
> Though be grateful for the fact that
> the text is out there at all and
> you don't have to OCR it yourself!

well heck, i'm _extremely_ grateful for project gutenberg.
as the forerunner of _all_ the net collaboration projects,
including wikipedia, it has _tremendous_ value to me...

so that's first and foremost.

having said that, however, o.c.r. ain't difficult these days.
scanning (and all that it entails, including rounding up
a hard-copy to scan) is the hardest part of the equation,
and google (and others) are taking care of all that hassle.

but yeah, as i said, correcting that o.c.r. is where all the
p.g. e-texts will come in handy, in the next cyberlibrary.


> Also you can see the issue from the point of view
> of the original transcribers as well. For example
> I've just been restoring the italics in the PG text of Nostromo,
> and very often the transcriber users initial caps for a word
> that was originally in italics - probably a more elegant and
> reader-friendly solution than using forward slashes for italicized words.

well, maybe. the problem is, though, that it's an ambiguous coding,
so it becomes impossible to restore things to their original state...

a forward-slashes method -- while maybe not "reader-friendly" --
would have at least been unambiguous enough to easily un-do...


> I don't understand why you would need a new mark-up,
> correctly used, html mark-up [eg h1 for the book title
> h2 for the part or section title and h3 for the chapter]
> gives you all the semantic information you need.

well, the problem with .html is that its obtrusive markup makes it
hard to maintain (e.g., correct, edit, compare, update, re-mix, etc.),
as well as to read in the underlying "master" format.

do a view-source on this page:
> http://z-m-l.com/go/test-suite.html

then compare that source-html to this page:
> http://z-m-l.com/go/test-suite.zml

particularly since the .zml file actually _generated_ the .html one,
i think it's pretty easy to tell which file would be easier to maintain,
especially with a library of thousands of e-texts (let alone millions).

and then of course when you ratchet up the difficulty to the level of
the .epub format, where each e-text file needs accompanying files,
you're just asking for trouble. in my view, complex formats like that
are simply the old-guard dinosaur publishing-houses attempting to
raise the cost-of-entry for us "amateur" newbies, whose new capacity
for self-publishing will totally and completely subvert their business.
they're attempting to find a way to maintain their status as middlemen,
so they can continue to siphon off a good percentage of the revenue...


> Personally I believe that plain vanilla html
> (or its baby siblings markdown, textile etc) is the new ascii.

markdown and textfile are both light-markup systems,
and thus of the same type as my zen markup language.
(except my z.m.l. is even less obtrusive than they are.)

but yes, this is the way of the future. authors want to write,
not be caught up in unnecessary complexities of file-formats.

-bowerbird

DaleDe
10-02-2007, 10:41 AM
dale, i'm not sure i understand your point. got any examples?

-bowerbird

For example dialogs often attempt to indicate pauses and interruptions. In some cases this is done with a dash symbol. In the recent biography of Buffalo Bill that I just posted the source used hyphens for everything. In some cases I have seen pg books with double hyphens which is easy to deal with but in this book only single hyphens were used everywhere. The book was a mess of real hyphens needed for compound words and hyphens used when a dash was needed. I had to manually find every instance and make a decision in each case. In formatting a hyphen will hyphenate at the end of a line but a dash typically will not, causing ugly breaks in the text flow.

Other dialog problems include accent marks and trying to show dialects in the text. These are tough with a full font collection and are made much more difficult using only ascii characters. Bold, italics and special symbols get lost in translation to ascii. Surely you have noticed this.

Many period books use unusual spelling and other specialized but unusual constructions with foreign words that can fool spell checkers requiring intervention that seems not to get done in the process.

Dale

bowerbird
10-02-2007, 02:46 PM
dale said:
> The book was a mess of real hyphens needed for compound words
> and hyphens used when a dash was needed.

yes, that's the type of ambiguous coding that needs to be avoided...


> Bold, italics and special symbols get lost in translation to ascii.
> Surely you have noticed this.

i sure have. italics and bold (except on headers) must be marked.
and for my own mirror of the p.g. library, i will probably use utf8,
so special symbols won't be a problem.


> period books use unusual spelling and other specialized
> but unusual constructions with foreign words that can
> fool spell checkers requiring intervention that seems
> not to get done in the process.

i've built a spell-checker designed specifically for this task.
when the time is right, i'll release it to the public...

-bowerbird

bowerbird
10-07-2007, 02:25 AM
hadrien said:
> we've got an API that makes it possible for any
> application or website to interact with Feedbooks

so, hadrien, i've built an app that will let people
download books from your site, even en masse.

do you discourage indiscriminate downloading?

would you like for me to distribute the program?

-bowerbird

Hadrien
10-07-2007, 01:36 PM
hadrien said:
> we've got an API that makes it possible for any
> application or website to interact with Feedbooks

so, hadrien, i've built an app that will let people
download books from your site, even en masse.

do you discourage indiscriminate downloading?

would you like for me to distribute the program?

-bowerbird

I really don't mind much. I've noticed that since we've added ePub support, 2 IPs at least, downloaded all of our books as ePub.

Our API wasn't created for such a purpose though, and keep in mind that there's still a few things missing, like footnotes or hyphenation in our current ePub output. The API is here for those who'd like to integrate Feedbooks into another application, website etc...

I'll post a full page explaining how things work tomorrow.

The only things that would be better for us, is having people logged in on the website before they download anything (through us or a third party application). This way we can improve our recommendation system (and with already 20k+ books available on PG, I believe we REALLY need a recommendation system on a public domain website).

Hadrien
10-09-2007, 10:07 AM
where can i get information on your a.p.i. for external apps?


OK, the help page for the API is almost ready, it'll be online today. In the meantime, here's a list of the actions available:
initializing: test the login/password, get the list of the formats available
subscription: display the subscriptions for a user (RSS feeds, newspapers etc...)
search: display the list of books available for any given keyword
similar: display a list of books that are considered similar
recommendation: display a random list of featured books if you're not logged in, otherwise, it'll display personal recommendations
favorites: display the list of favorite books for a user
history: display previously downloaded books for a user

Hadrien
10-10-2007, 08:32 AM
Here's the help page for the API: http://www.feedbooks.com/help/newsstand_api

I'll add a few examples of ruby code, C and C# applications that are using our API in the upcoming weeks.

Here's a few examples of the things you could do with this API:

A downloader for your newspapers: auto-sync your files to your device
Make a Flash widget, displaying your favorite books
A book search engine and downloader for your favorite e-book reader
A book recommendation system for your website
etc...

bowerbird
10-11-2007, 04:52 PM
thanks hadrien, i'll look it over sometime soon...

-bowerbird

JSWolf
10-11-2007, 05:34 PM
bowerbird, why not just use html to do the markup?

bowerbird
10-11-2007, 07:50 PM
ok, hadrien, i took a look. but i guess i'd already figured it all out on my own... :+)

jswolf said:
> why not just use html to do the markup?

i don't understand the question. so i'll just make up an answer. ;+)

i don't like .html books, because the browser is a lousy reader-app.
if you want to see what i think a reader-app needs to do, go here:
> http://onlinebooks.library.upenn.edu/webbin/bparchive?year=2004&post=2004-01-08,3

that post is from way back in 2004, but the list hasn't changed much.
unfortuntely, browser capabilities haven't improved that much either,
at least from the standpoint of how well a browser meets my criteria.

so i'd reject a browser/html workflow entirely, if not for the fact that
putting material on the web is a very handy way of making it _public_.

as far as _applying_ .html markup, i think it's a royal pain in the rear.
that's why i invented my own form of light-markup, so i could obtain
the main functionality i want from documents without doing markup...

however, it's not like i've become resigned to a browser/html world.
i'm now creating web-aware offline applications that can _get_stuff_
from the web to display it themselves, so you don't need a browser.

if i understand hadrien correctly, that's why feedbooks has an a.p.i.,
so developers can give users an alternative to using a web-browser.

if you can save the user from being forced to make an inconvenient
trip to the browser, why not? convenience is the name of the game.

of course, this approach of an app going directly to a webpage and
grabbing data isn't unusual nowadays; rss-readers do it all the time.
the difference is, they expect the documents they receive to be .html.
my applications expect received documents to be in my .zml format.

i still think it's important to have the ability to convert .zml into .html,
so the documents can be put on the web for users who _cannot_run_
offline apps (e.g., because they can't install them on a work machine);
but for the vast majority of people, i think the enhanced functionality
of my offline viewer-program will win them over, especially when it's
combined with the ease of not having to hassle with applying markup.

for my part, i'm converting the e-texts in project gutenberg into .zml,
and will be mounting them on my own mirror as a .zml demonstration.

i fully expect the long-term maintenance of .zml files will be low-cost,
so i think i'll be able to maintain the entire library all by my lonesome,
even as project gutenberg continues to pump new titles into the world.

-bowerbird

JSWolf
10-11-2007, 10:18 PM
But if you do the markup in html, you can have a program then convert to whatever format you want. What does your zml format support that html does not?

bowerbird
10-11-2007, 10:43 PM
jswolf said:
> But if you do the markup in html, you can have a program
> then convert to whatever format you want.

if _you_ want to use z.m.l. to create .html, and then convert it into something else,
that's totally fine with me. but i have no desire to do all that conversion busy-work.
or apply any markup in the first place. i want to focus on the writing, and that's all.
i want the _machine_ to format it nicely for me, and apply the markup if i need any.


> What does your zml format support that html does not?

z.m.l. doesn't support anything that .html won't do. in fact, it does a whole lot less.
but the things it _does_ do are all the things that are commonly required by books...

my overall perspective is that _formats_ are highly overrated. the _real_magic_ is
in the _applications_, not the format. my focus is on putting the smarts in the apps,
not in the formats. i want to make the format as dirt-simple to create as possible,
and then have the program do all the grunt-work of providing excellent functionality.

remember how openreader was gonna be this magical format that was going to end
david rothmans "tower of ebabel"? it never had an application that implemented it,
so it went nowhere. until you have an application for a format, the format is useless.

so if .html does the job for you, and you don't mind doing .html markup, go for it...

but me, i'm looking for something simpler. even if i have to program the thing myself.

-bowerbird

bowerbird
10-11-2007, 10:48 PM
so the question should be, "what does a zml-viewer do that a browser won't do?"

and for the answer to that question, go visit that web-page i pointed you to earlier.

a web-browser will do maybe 1/4 of my requirements, half if it's lucky.

my zml-viewer will do 95% of them, 100% eventually...

so i think it's gonna be a better e-book viewer than a web-browser.
which ain't saying much, since a browser is such a lousy e-book app.

i think my viewer will be better than most other e-book viewer-apps,
but there's not much use saying that until you can try it out for youself.
and even then, what will matter to you is not _my_ opinion, but yours...

-bowerbird

igorsk
10-12-2007, 07:10 AM
I wish you started to use capital letters and leave line wrapping to the forum. Then maybe more people would read your posts instead of skipping them over.

bowerbird
10-12-2007, 02:53 PM
do you mind if people "skip over" my posts? because i don't... :+)

-bowerbird

bowerbird
10-13-2007, 03:27 PM
hadrien, next week i'll release my app that facilitates downloading feedbooks books,
and i wanna make sure you're fully ok with it before i do, so i'll ask one more time...

-bowerbird

Hadrien
10-13-2007, 03:45 PM
hadrien, next week i'll release my app that facilitates downloading feedbooks books,
and i wanna make sure you're fully ok with it before i do, so i'll ask one more time...

-bowerbird

How does it facilitates downloading ?

bowerbird
10-13-2007, 07:37 PM
hadrien-

click the appropriate radio-button to select the canned format you want.
click another button to download every book on your site. you're done...

-bowerbird

Hadrien
10-14-2007, 03:52 PM
Well... I don't really see the point of downloading the whole site... Here's a few things I need to say about this:
- depending on how your software is coded, it could be considered as hammering and get the IP address banned: send me your code, I'll check the software behavior
- custom PDF are generated on the fly, and some of them can be 1.000+ pages. Imagine what it would mean to generate thousands of these at the same time...
- a program that would download every book for an author or a list would make much more sense than strictly downloading the whole website
- our e-book generation system is frequently updated and improved

ricdiogo
10-14-2007, 04:05 PM
Well... I don't really see the point of downloading the whole site (...) a program that would download every book for an author or a list would make much more sense than strictly downloading the whole website
Main point in downloading the whole website is: if it gets offline for some reason (eg.: you decide to stop the project or you have copyright judicial issues) people would still have the whole collection in their machines. Maybe you should periodically create several DVDs that people could download from the website and from P2P networks. You could also make such DVDs for given authors or special hits.

bowerbird
10-14-2007, 04:13 PM
hadrien said:
> I don't really see the point of downloading the whole site...

oh. i thought you had expressed the wish to make downloading easier.
i got the impression that the fbreader people are working on it for you.
wasn't that one of the points of releasing your a.p.i.?


> Here's a few things I need to say about this:
> - depending on how your software is coded, it could be
> considered as hammering and get the IP address banned:

well, obviously, if you're going to _ban_ someone for using it,
then you don't want me to release the program. just say so...


> send me your code, I'll check the software behavior

i can program the "behavior" to be whatever you want it to be.
so just tell me what you want it to be. that's why i asked you.


> - custom PDF are generated on the fly

i specifically said this would only apply to canned formats...


> - a program that would download every book for an author
> or a list would make much more sense than strictly
> downloading the whole website

well, that might or might not be a future route for the program,
but in terms of "facilitating downloads", fewer clicks are better.

and if an end-user steps through every author, it's the same thing.

(and i won't do the work of making the program _less_useful_ --
by requiring more work from users to do the exact same thing --
until i find out if there's even any demand for it in the first place.)


> - our e-book generation system is frequently updated and improved

if end-users think they need the updated improved versions,
they'd just delete their local files and run the program again.
(the app checks for a local copy before it downloads a book,
so the person can re-run the program later to get new books.)

***

again, i don't have to release this program. i'm doing it _for_you_,
because you expressed desires to make downloading "more ipodish".

i had the skeleton of the program written already, for p.g. e-texts,
so i modified it... but if you'd rather have people come to your site
and download the files manually, that's perfectly acceptable to me...

-bowerbird

Hadrien
10-14-2007, 04:35 PM
Main point in downloading the whole website is: if it gets offline for some reason (eg.: you decide to stop the project or you have copyright judicial issues) people would still have the whole collection in their machines. Maybe you should periodically create several DVDs that people could download from the website and from P2P networks. You could also make such DVDs for given authors or special hits.

I agree, we could dump the cache into a .torrent file once in a while.

oh. i thought you had expressed the wish to make downloading easier.
i got the impression that the fbreader people are working on it for you.
wasn't that one of the points of releasing your a.p.i.?

Well, the first program using the API is the one we've created for the iLiad.
The API is here to help those who'd like to integrate Feedbooks into existing applications or websites, mass downloading is closer to a "wget -r" than an API.

well, obviously, if you're going to _ban_ someone for using it,
then you don't want me to release the program. just say so...

I'm not gonna ban anyone because they're downloading all of our files. Main problem here is how someone is downloading all these files. If it's with a queue, it's not a problem at all. But if dozens of people are running such a program, and the program downloads multiple files at the same time, it would canibalize the ressources, making the service much slower for our users.

bowerbird
10-14-2007, 04:45 PM
hadrien-

i am not getting the clarity i need, so i'll shelve the plan.
think about it for a couple months, and if you decide you
want me to proceed with the program, let me know then.

-bowerbird

Hadrien
10-14-2007, 05:00 PM
hadrien-

i am not getting the clarity i need, so i'll shelve the plan.
think about it for a couple months, and if you decide you
want me to proceed with the program, let me know then.

-bowerbird

Well, for full site downloading I'd rather create a zip available on the website.

But for selective downloads, let's say a list, all of the books available for an author or for a genre etc... I'm OK with a downloader as long as it's using some kind of queue.

Only thing that get me worried with such software is the fact that it can cannibalize resources that some other users might need. All I want is to have the best experience for our users...

For the moment we have a dedicated server, not multiple dedicated servers yet though I hope we'll have dedicated resources for the e-book generation side of the service soon.

ricdiogo: You're part of PG right ? I remember reading on Teleread that PG was looking for a way to support epub files. A link to Feedbooks when the book is available on our side might be a solution for this problem.