View Full Version : Upload ebooks in HTML format


ProDigit
02-19-2009, 11:44 AM
Hi,
I was just thinking and pondering upon the file formats of the ebook readers, and thought a lot of my work has been to create/modify an HTML,to only later convert it to LRF for the Sony reader.

Here's where I got troubled, imagine if next year Sony would bring a new ebook reader on the market (with eg: larger screen or color or whatever) that no longer would display LRF files!
then a lot of the books on the forum will serve for the few of us who managed to snatch a PRS reader or the likes.. But that number would decrease over time.

Most of the books on the forum (apart from PDF books) come from html (sources); and get reconverted to .prc, .lit, .lrf, .... files.

Also,imagine soon there will be a new reader on the market with yet another fileformat.
I thought it better to have a standard library in HTML or .doc (perhaps every book; html/doc; compressed to .zip or .rar) and upload them!

.doc files are relatively easy to convert to html; but lrf files are not easy to convert to other fileformats. I mean, I suspect there'll be some layout issues which one would want to be able to edit in eg: HTML format, before converting it to the appropriate ebook format.

Besides once a book is in HTML,it can very easily converted to other formats!
It would literally need a user having a device, snatch the HTML,convert it,and upload the book.

What do you think? Would it benefit to do so or not?

AnemicOak
02-19-2009, 11:59 AM
For me I try to keep copies of my stuff in Lit as it gives you the HTML, but also any images all stored in one file that can easily be exploded at a later date as needed. For me it's the perfect storage format.

DixieGal
02-19-2009, 12:20 PM
I try to backup my books in a bunch of formats and save to a flash drive. I especially like to have a copy of everything that I can possibly get in Word, pdf, and rtf. I figure that in a worst case scenario, I can always read them on a PC, should I ever want to re-read them.

Ea
02-19-2009, 12:21 PM
I think it would be a great idea. I've actually downloaded a couple of mobi books from here, only to open the file in Mobipocket Creator to get the HTML file. This is the main reason I don't go here for books (except that there hasn't been many I've been interested in in the first place).

Jellby
02-19-2009, 12:38 PM
Besides once a book is in HTML,it can very easily converted to other formats!
It would literally need a user having a device, snatch the HTML,convert it,and upload the book.

What do you think? Would it benefit to do so or not?

Better than HTML or doc, upload them in ePUB format. It's standard, it's the future and you can get the text and formatting just as easily (it's already a zipped set XHTML+CSS files).

Steven Lyle Jordan
02-19-2009, 12:51 PM
This is why ePub is a good e-book format: It is XML (based on HTML), and therefore easily converted to other formats... or easily displayed out-of-the-box. Many new readers can read ePub, and more are adopting ePub along with their proprietary format. And an html-based format will, as you suggest, have better longetivity than most proprietary formats.

In terms of long-term document storage, the more documents we store in ePub, the better.

ProDigit
02-19-2009, 03:10 PM
.. It's standard, it's the future....

No it's not!
So far there's no guarantee manufacturers will go for this or another format once the new devices are on the market(larger screen/resolution, perhaps colors...);
there still is NO ebook standard to date (at least, last I knew of )

besides, which program formats a book in epub format?(just out of lack of knowledge I ask)...

I know Book designer can create LRF files, and I heard that the Sony reader can read epub, but I don't think I've seen it so far.
Is it a format, when I have it on my sony reader, I can copy and read it straight from a kindle, cybook, iliad or Jinke/Bebook system?

AnemicOak
02-19-2009, 03:17 PM
No it's not!
So far there's no guarantee manufacturers will go for this or another format once the new devices are out there;
there still is NO ebook standard to date (at least, last I knew of )

besides, which program formats a book in epub format?(just out of lack of knowledge I ask)...

I know Book designer can create LRF files, and I heard that the Sony reader can read epub, but I don't think I've seen it so far.
Is it a format, when I have it on my sony reader, I can copy and read it straight from a kindle, cybook, iliad or Jinke/Bebook system?


Yes, the Sony (505 & 700) can read ePub. No, the Kindle, Cybook et all can't (at least not yet, it's supposed to be coming for at least some of them). You can put the ePub into Calibre & in a few seconds have a Mobi or LRF file (maybe Lit too, I'm not sure) and a lot of devices, besides the Sony, will read Mobi.

ePub certainly seems to have a good chance for the broadest amount of support so far.

pilotbob
02-19-2009, 03:24 PM
No it's not!

It certainly is a "standard" in the technical sense of the word. But it is by no means ubiquitous yet, which is what is needed.

BOb

zelda_pinwheel
02-19-2009, 03:25 PM
Better than HTML or doc, upload them in ePUB format. It's standard, it's the future and you can get the text and formatting just as easily (it's already a zipped set XHTML+CSS files).
yes, exactly. :)
No it's not!
So far there's no guarantee manufacturers will go for this or another format once the new devices are on the market(larger screen/resolution, perhaps colors...);
there still is NO ebook standard to date (at least, last I knew of )
hm, the International Digital Publishing Forum (http://www.idpf.org/), which is the International Trade and Standards Oranization for the Digital Publishing Industry and whose mission is defining epub, the new industry standard format, might disagree with you on that. and more and more publishers are publishing in epub, sometimes exclusively in epub.

besides, which program formats a book in epub format?(just out of lack of knowledge I ask)...

I know Book designer can create LRF files, and I heard that the Sony reader can read epub, but I don't think I've seen it so far.
Is it a format, when I have it on my sony reader, I can copy and read it straight from a kindle, cybook, iliad or Jinke/Bebook system?
as others have said, Sony 505 and 700 series can read epub natively, no conversion necessary. iphone and ipod touch can read epub natively using an app like stanza. epub can also be easily converted to other formats like mobipocket for cybooks or kindles, etc., particularly now that epub drm has been cracked.

please read at least the first page of this thread (http://www.mobileread.com/forums/showthread.php?t=36489) to find out more about epub.

ProDigit
02-19-2009, 03:33 PM
It certainly is a "standard" in the technical sense of the word. But it is by no means ubiquitous yet, which is what is needed.

BOb

I stand corrected...
You understood what I meant.. LRF and PRC and LIT are standards too. But it's that 'ubiquitous' word I didn't have in my dictionary yet :o

zelda_pinwheel
02-19-2009, 03:35 PM
I stand corrected...
You understood what I meant.. LRF and PRC and LIT are standards too. But it's that 'ubiquitous' word I didn't have in my dictionary yet :o

i think you are confusing "standard" with "common". lrf, prc and lit are common formats, but they are by no means standard.

please do read the first page of the thread i linked to in my last post. it is a very good introduction to epub.

pilotbob
02-19-2009, 03:35 PM
But it's that 'ubiquitous' word I didn't have in my dictionary yet :o

I like the word ubiquitous, I am trying to makes its usage ubiquitous.

BOb

zelda_pinwheel
02-19-2009, 03:36 PM
I like the word ubiquitous, I am trying to makes its usage ubiquitous.

BOb
i'll join that mission. the ubiquitous use of precise words makes me happy. ;)

ProDigit
02-19-2009, 03:46 PM
I like the word ubiquitous, I am trying to makes its usage ubiquitous.

BOb
replace [CTRL]+[V] with "ubiquitous"

To me it seems [CTRL]+[V],that you would use the word [CTRL]+[V] to make it more [CTRL]+[V]! At this moment my brains are still processing [CTRL]+[V],and they only got the "[CTRL]" part right. The "+[V]"part will soon fall in place!
It's such a hard word I had to store it in my cache using [CTRL]+[C],in order to write it down([CTRL]+[V])!
I even had to look it up in a dictionary!

Now I feel much more comfortable, and even can say it without errors!:)
[CTRL]+[V] !
uhm,I meant, "ubiquitous" yes "ubiquitous"(Ctrl+v)!

Sweetpea
02-20-2009, 03:22 AM
Better than HTML or doc, upload them in ePUB format. It's standard, it's the future and you can get the text and formatting just as easily (it's already a zipped set XHTML+CSS files).

I'd say if that were a standard, there would have been readers for all platforms. So far, I've not found one reader that can read epub on my PDA (Windows Mobile).

For my own books, I use HTML as my base, and transform it to Mobi to read.

DaleDe
02-20-2009, 12:16 PM
I'd say if that were a standard, there would have been readers for all platforms. So far, I've not found one reader that can read epub on my PDA (Windows Mobile).

For my own books, I use HTML as my base, and transform it to Mobi to read.

Well the point is that a base can easily be ePUB since you can simply unzip it to retrieve the internal files. The probleml with HTML is it is often not self contained. No images, CSS in a separate file etc. It needs to be zipped to contain it. As a matter of fact some readers such as Hanlin can actually read an eBook inside a zip file, depending on the format inside the zip file.

Dale

zelda_pinwheel
02-20-2009, 12:23 PM
Well the point is that a base can easily be ePUB since you can simply unzip it to retrieve the internal files. The probleml with HTML is it is often not self contained. No images, CSS in a separate file etc. It needs to be zipped to contain it. As a matter of fact some readers such as Hanlin can actually read an eBook inside a zip file, depending on the format inside the zip file.

Dale
yes, and in fact epub is technically just an xhtml file in a zip container ; it can be opened using winzip or winrar, just like a zip file.

zelda_pinwheel
02-20-2009, 12:24 PM
replace [CTRL]+[V] with "ubiquitous"

To me it seems [CTRL]+[V],that you would use the word [CTRL]+[V] to make it more [CTRL]+[V]! At this moment my brains are still processing [CTRL]+[V],and they only got the "[CTRL]" part right. The "+[V]"part will soon fall in place!
It's such a hard word I had to store it in my cache using [CTRL]+[C],in order to write it down([CTRL]+[V])!
I even had to look it up in a dictionary!

Now I feel much more comfortable, and even can say it without errors!:)
[CTRL]+[V] !
uhm,I meant, "ubiquitous" yes "ubiquitous"(Ctrl+v)!
:pandalol:

Ea
02-20-2009, 12:27 PM
yes, and in fact epub is technically just an xhtml file in a zip container ; it can be opened using winzip or winrar, just like a zip file.
That sounds great. I was a little worried at first that it would be yet another format to unwrap (like mobi) if I only wanted the HTML, but now I feel okay with it. And it is better to wrap HTML files - many books would come with multiple files.

zelda_pinwheel
02-20-2009, 01:03 PM
That sounds great. I was a little worried at first that it would be yet another format to unwrap (like mobi) if I only wanted the HTML, but now I feel okay with it. And it is better to wrap HTML files - many books would come with multiple files.
yep. :)

you might want to take a look at this thread (http://www.mobileread.com/forums/showthread.php?t=36489) to learn more about epub.

HarryT
02-20-2009, 02:01 PM
It certainly is a "standard" in the technical sense of the word. But it is by no means ubiquitous yet, which is what is needed.

BOb

As the saying goes, the nice things about "standards" is that there are so many different ones to choose from :).

Jellby
02-20-2009, 02:19 PM
When I said it's a "standard" what I really meant was that its specifications are defined and open (no reverse-engineering or hidden features) and that it does not depend on a single corporation. Perhaps support is not widely available yet, but I thought the thread was about formats that could be used for archive and as a source for conversions...

zelda_pinwheel
02-20-2009, 03:19 PM
When I said it's a "standard" what I really meant was that its specifications are defined and open (no reverse-engineering or hidden features) and that it does not depend on a single corporation.
that's what i mean when i say standard as well, with the added detail that it has also been adopted officially by the ebook industry as a whole in an attempt to gain some unity of file formats.
Perhaps support is not widely available yet, but I thought the thread was about formats that could be used for archive and as a source for conversions...
epub is already excellent for archive and conversion purposes and more and more devices are supporting it natively as well, so conversion is not always necessary and will be so less and less.

ProDigit
02-20-2009, 11:45 PM
so what's the file extension?

Sweetpea
02-21-2009, 03:36 AM
yes, and in fact epub is technically just an xhtml file in a zip container ; it can be opened using winzip or winrar, just like a zip file.

Hmm, I must go look into that then...

Jellby
02-21-2009, 05:22 AM
so what's the file extension?

.epub usually.

(Actually, the ePub reader widget for Opera (http://widgets.opera.com/widget/10312/) needs to have ePub files renamed as .zip, which shows they are just zip files with a particular content).

busybee
03-01-2009, 04:59 AM
I agree with ProDigit's initial post and support HTML for archival and source purposes. EPub is a close second, but after converting an illustrated HTML book to EPub, adding ".zip" and unzipping it, I notice only that important elements of my original formatting had been lost, so I am not interested in collecting ebooks in EPub format (in effect converting common HTML into obscure XML). Since some 9,000 of Project Gutenberg's collection (in English) is already in HTML, and since HTML IS ubiquitous/universal (as distinct from EPub's putative should be, can be, will be), I don't see a better alternative at present to zipped HTML.

I'm serious about collecting books in HTML and hope MR is the place this can be done. The only support needed would be to add, to the E-Books Upload section, a forum called, "HTML Books." Is this too much to ask? Along this line, almost all the books posted to the "Other Books" forum are LIT, so it looks like a "LIT Books" section is needed as well.

I'm not interested in collecting books in multiple uneditable formats. I've uploaded some MOBI books, but HTML was the source, and I suspect that the prolific uploaders who post four versions of each book are also working from HTML source. I'd rather upload and download editable HTML source. Who wouldn't?

ProDigit
03-01-2009, 03:18 PM
Thanks busybee for your affirmation.
From HTML perspective, once a book is in the library, it can easily be snatched by eg: me, who can convert it to LRF for my Sony reader; and by someone else who perhaps has a Kindle or Iliad, to convert it to their desired format if necessary.

One of the issues we're facing is that upto today almost no electronic book reader can read HTML, and/or compressed HTML (unless they are running some version of Windows CE).

Then there's the issue of different versions of zip and rar,where later versions are incompatible with previous ones.
Last time I used winzip,they where on version 8, and this was around the millennium nearly 9 years ago. Every version had an improved algorythm, compressing better then the version before. I only fear that once a reader would be compatible reading zip or rar archives, that later archives (zip or rar files) will not be compatible with the reader's internal decoder version.

winzip today is on version 12, assuming that's about a new version every 2 years, with 1 new update and 1 new version in 2008.
Winrar is about the same, currently at v3.80 doing a new version about every year.

The issue is newer versions are compatible with older versions, but older Winzip/rar decompressors are not always compatible with newer compressed archives.

Another issue "might" be that winzip or winrar may not be a good format for electronic reading devices; though I am not 100%sure about that.
meaning, in some cases the device will need to decompress the file fully before it's able to display the book compressed in it. Or,may need more computing power. Winzip/rar are much more intense then LRF,or other formats (filesizes can become smaller on winzip/rar)

I yet have to see a device that is capable of reading HTML within a ZIP file directly. That would mean hardware capable on the fly decoding capabilities. It'd probably take more than an ARM 200Mhz processor to be able to do that!

There is no DRM on HTML and archives (YET), but there can be password protected archives.
A simple methode can be incorporated by whenever a book is downloaded from a store, a password will be provided with it. Generating a different encryption on every download.
I hope DRM will not go over to ZIP or RAR!

So those are the negative points; but on a positive note, HTML compressed in an archive, is the best open format to reconvert!

I found it best to convert .doc and .rtf using openoffice writer to HTML.
Then with notepad++ take a 5 minute trim to get the unnecessary html codes removed (like eg:language, div style and some span class/font/..) .
Within 10 minutes you can have a reasonable clean HTML version of a book!
Which a lot of devices running Windows CE will be able to read, be it then when uncompressed.

then again some people will need to take a basic course in HTML encoding, unlike eg: converting a MS word document via calibre, which takes no knowledge of HTML or codes.

Ea
03-01-2009, 03:40 PM
Actually, on top of my head I can think of three readers that support HTML; Hanlin (+ clones), CyBook and Irex iLiad - and I'm sure there's more - but, and this is a big one, they only support pretty basic HTML. That would, however, not be a big problem if you could convert easily into clean HTML. And that is not my own experience. I've looking far and wide for a tool for this - Mac or Windows, it's okay - but I have yet to find something that does not require clean up with the connected risk of messing up. So far, I've found Mobipocket Creator to be the best, but the resulting HTML still requires removal of a unecessary markup, just as what you suggest. Not the best solution.

I understand the problems with zipping files - if we could get around that and still keep the HTML files, I think it would be best.

netseeker
03-01-2009, 03:49 PM
ProDigit, your first issue doesn't seem to be a "real" issue:
"Zip" itself isn't a compression format itself, Zip allows different compression formats to be used. Thats why sometimes incompatibilities can occur. Let's say we would use a zip-container with deflate as compression format then we wouldn't see any incompatibilities because deflate is specificied within RFC 1951 and widely used and accepted.

The other issue you mentioned - the hardware - seems to be another non-issue too:
ePub isn't much more than (X)HTML, images and some meta informations within a zip-container. If ePub does work on mobile reading devices then other (X)HTML-renderers using compressed (X)HTML would work too. And as Ea already pointed out there are some current reading devices which already do offer some kind of HTML-support out of the box.

RWood
03-09-2009, 11:35 AM
The intent of the download section has always been that you could directly load the file into your reader. If Sony changes from LRF to XYZ, then we will produce XYZ files. As ePub gains footing and more and more readers support ePub directly we will host more and more ePub files at MR. Already there is a section for ePub. Some of the book creators (like me for example) have yet to make this addition to their format presentations. Others, like mtravellerh, have been offering ePub for some time.

Sweetpea
03-09-2009, 12:28 PM
I agree with ProDigit's initial post and support HTML for archival and source purposes. EPub is a close second, but after converting an illustrated HTML book to EPub, adding ".zip" and unzipping it, I notice only that important elements of my original formatting had been lost, so I am not interested in collecting ebooks in EPub format (in effect converting common HTML into obscure XML). Since some 9,000 of Project Gutenberg's collection (in English) is already in HTML, and since HTML IS ubiquitous/universal (as distinct from EPub's putative should be, can be, will be), I don't see a better alternative at present to zipped HTML.

I've been busy with my library last weekend. I wanted to see if I could make an epub out of a HTML book (which is my source to format them to mobi which I use on my reader).

I finally ended up with a directory, which I can zip to create an epub, or I can use the .opf to create the mobi books. All the content I had in my original book I have in my epub book. I did have to change my HTML, as I used <a name"x"></a> for my TOC bookmarks. And epub doesn't allow that. So, I had to add those name="x" attributes to my header element. But, that's all I had to change. I didn't use XML, and I use only 1 HTML page for my content (not 1 page per chapter). The only thing epub wants is the directory structure and some meta files.

Now, I will need plenty of time to pour all my HTML files into the epub directory structure...

ProDigit
03-09-2009, 10:47 PM
I've been busy with my library last weekend. I wanted to see if I could make an epub out of a HTML book (which is my source to format them to mobi which I use on my reader).

I finally ended up with a directory, which I can zip to create an epub, or I can use the .opf to create the mobi books. All the content I had in my original book I have in my epub book. I did have to change my HTML, as I used <a name"x"></a> for my TOC bookmarks. And epub doesn't allow that. So, I had to add those name="x" attributes to my header element. But, that's all I had to change. I didn't use XML, and I use only 1 HTML page for my content (not 1 page per chapter). The only thing epub wants is the directory structure and some meta files.

Now, I will need plenty of time to pour all my HTML files into the epub directory structure...

How does Epub handle bookmarks and hyperlinks? (not <a name=x>x</a> or <a href='x'>x</a>?

Sweetpea
03-10-2009, 04:31 AM
How does Epub handle bookmarks and hyperlinks? (not <a name=x>x</a> or <a href='x'>x</a>?

You can still use <a href="#x">y</a> to point to <span id="x">x</span>. So, instead of putting a <a name="x"> around it, use a span with an id.