LRF output - Page 16

bkilian · 07-09-2007, 03:38 PM

I have a suggestion that I think should be fairly easy to implement, and would increase the usefulness of html2lrf significantly (to me, at least

)

I would like a way to essentially "store" the command line options I want to use on a particular book, so that the next time I convert it, I don't have to remember exactly what I used.

Now HTML already has a perfect way to store metadata, so why not store it in the HTML file itself?

I suggest a set of <meta/> variables that you could stick in the <head> of the html file, and html2lrf uses them to determine what it's settings should be.
For example:
<meta name="publisher" content="Baen Books" />
<meta name="title" content="Wind Rider's Oath" />
<meta name="author" content="David Weber" />
<meta name="cover" content="0743488210_Cover.jpg" />
<meta name="font-delta" content="-0.5" />

There's no reason I can think of that any of your command line settings wouldn't work in the <meta> section, so a person could essentially store not only the content, but how to replicate the content correctly in the same file.

Whether you decide this is something you want to implement or not, I'm probably going to start annotating my HTML files in this way. I figure as long as I keep to the same text you use in your command line variables, I have a better than even chance of it just working if you at some point decide to implement it. (Or I could, if I was bothered to learn Python, but I don't really have a weekend free in the near future

)

kovidgoyal · 07-09-2007, 03:53 PM

An interesting idea, can you tell me why you need this feature?

bkilian · 07-09-2007, 05:12 PM

Quote:

Originally Posted by kovidgoyal

An interesting idea, can you tell me why you need this feature?

Well, I have a vast (Read: in the hundreds) library of Baen E-books, and I like to keep at least one human readable version around at all times. It used to be the RTF, and I'd read it into Book Designer, spend half an hour or more tweaking it, save it as Book Designer's html0 format (so I didn't have to do the tweaking each time) and then make the LRF.
But Book Designer has it's own set of problems, involving weird character conversions, it's insistence on reformatting my content, and it's insistence on using Windows

I have converted all my books multiple times as bugs get fixed in book designer.

So when I found out that html2lrf does a quite passable conversion on the .LIT html, (as long as I remove the useless table of contents html first) without me doing a huge amount of searching through the book to make sure it was doing the right thing, I jumped on the chance, only to be stumped by the fact that there's no way my human readable archive can contain all the information needed to perform the conversion correctly.

I essentially want to be able to automate the conversion of a number of titles in one go, and it's impossible to do with your current command line driven method of specifying metadata. Essentially, if you add a feature at some point that would benefit me, I'd like to be able to reconvert all my books without having to do it all manually.

(One other feature that would be helpful in this endeavour would be the ability to specify the sony ebook ID, so I can ensure that it stays constant across multiple conversions)

In a more general sense, the feature could be useful in an automated website that, for instance, adds a dedication to an ebook for the user before packing it up, although this seems a bit contrived

kovidgoyal · 07-09-2007, 05:22 PM

Hmm why not just use a shell script? Save it in the archive with the html files. And save the metadata in an opf file. html2lrf will read the metadata from the opf automatically when you convert, and the commandline settings will be stored in the script file.

bkilian · 07-09-2007, 05:39 PM

That just increases the number of files I have to keep track of from 1+images to 3+images per book, with the added disadvantage that I then have a script file that is probably not portable across systems, and I have to learn the format of an OPF file.

As opposed to adding 5 or 6 <meta> lines to an html, it seems quite unwieldy.

Oh, it looks like html2lrf converts & #333; (ō) to a space. I'm assuming the sony font doesn't have this character. Is there some way to define a conversion list for special characters?

kovidgoyal · 07-09-2007, 05:57 PM

I keep my human readable html rar'ed up, that way I don't really care about how many files the archive contains. If you write the script in a cross platform language (like python) you don't have to worry about portability either. And opf is just a simple XML file. Not really anything to learn and its likely to be around for a while as well.

No you can't specify custom character conversion without editing the source code, but you can embed a font that can handle the special characters into the LRF.

bkilian · 07-09-2007, 06:18 PM

Quote:

Originally Posted by kovidgoyal

I keep my human readable html rar'ed up, that way I don't really care about how many files the archive contains. If you write the script in a cross platform language (like python) you don't have to worry about portability either. And opf is just a simple XML file. Not really anything to learn and its likely to be around for a while as well.

No you can't specify custom character conversion without editing the source code, but you can embed a font that can handle the special characters into the LRF.

That's a bummer. Looks like I'll have to be doing a bunch of find/replace in my files before conversion. Does anyone have a list of the actual characters the sony font understands?

If I was going to bother learning python, I'd just modify html2lrf myself instead of writing script files.

In fact I might still do that if I feel energetic in the next few weeks.

Does the OPF file have to be named anything special? if not, do I have to have a different directory for every book? I believe the program can read through zip files, but I haven't been able to make that work (and could the cover image also be in the zip file?)

C:\EBooks\SRC\Weber\Bahzell\t\Wind_Riders_Oath\t>h tml2lrf Wind_Riders_Oath.zip
Traceback (most recent call last):
File "convert_from.py", line 1406, in <module>
File "convert_from.py", line 1341, in main
File "convert_from.py", line 1141, in process_file
File "convert_from.py", line 1380, in get_path
File "libprs500\__init__.pyo", line 52, in extract
File "libprs500\libunzip.pyo", line 45, in extract
File "os.pyo", line 172, in makedirs
WindowsError: [Error 3] The system cannot find the path specified: ''

kovidgoyal · 07-09-2007, 06:31 PM

I would recommend against changing the html2lrf source code as you'd have to maintain the change through new versions of libprs500. What would make more sense is to write a wrapper around html2lrf that makes the changes to the html file before calling html2lrf.

Yeah html2lrf supports both zip and rar archives. The opf file doesn't need to be named anything special (as long as it has a .opf extension) and the archive can contain the cover.

That error looks like another windows incompatibility bug. Open bug report and attach the zip file and I'll fix it.

bkilian · 07-09-2007, 06:57 PM

Quote:

Originally Posted by kovidgoyal

I would recommend against changing the html2lrf source code as you'd have to maintain the change through new versions of libprs500. What would make more sense is to write a wrapper around html2lrf that makes the changes to the html file before calling html2lrf.

Yeah html2lrf supports both zip and rar archives. The opf file doesn't need to be named anything special (as long as it has a .opf extension) and the archive can contain the cover.

That error looks like another windows incompatibility bug. Open bug report and attach the zip file and I'll fix it.

I maintained my modifications to the ircII client through 8 years of changes, libprs500 would be a breeze

I do, however take your point. So you don't feel it's valuable to have some way to maintain the same settings for a file across multiple conversions. That's fair, I suspect I'm an extreme case. (I did however just do it again, I reconverted to test something, and forgot to specify --disable-autorotation or --header)

I'll enter a bug for the zipfile problem. Does the html file inside the zip file need to be named anything special?

bkilian · 07-09-2007, 07:16 PM

Also, do you have any plans on adding definition list parsing?
<dl>
<dt> blah </dt>
<dd> blah 2 </dd>
</dl>

I can convert them into other types of lists, but I'd prefer not to, and if you do plan on adding it, then I'll just wait.

(You can see an example of them used in the html file in the zip I included with that bug report)

kovidgoyal · 07-09-2007, 07:35 PM

As far as maintaining settings like --header (which I suspect you need over all conversions not on a per file basis) the new GUI will take care of that as it will remember conversion defaults.

Adding support for definition lists is trivial and I'll do it in the next release.

Yeah I don't think it makes sense to add support for per file defaults to html2lrf.

theswede · 07-09-2007, 07:52 PM

I agree; metatags in the HTML is certainly the way to store metadata. Manual synchronization and scripts is not only a waste of time, but error prone, and it means that when I zip up a few files to take along on my work laptop, the metadata is gone. It should be embedded.

I might code this into html2lrf myself just to be able to do things the right way.

kovidgoyal · 07-09-2007, 08:06 PM

Ah but opf files are likely to become the standard for ebook metadata.

theswede · 07-09-2007, 08:10 PM

Then I'll write a tool which extracts them from embedded metadata in the unlikely event that is ever needed. Or embeds them in the header, as meta tags. Which is pretty much what is proposed here.

I will not mess with scripts and extra files. An ebook should, as far as possible, contain all its required metadata. I know me; I will never maintain an external file. It will rot, and I'll end up having to reorganize my books after a year or so. If I embed the metadata, it's *done*, for the lifetime of the ebook.

kovidgoyal · 07-09-2007, 08:20 PM

erm the epub standard is basically an xhtml file with opf and image files in a zipped container, so it is a single file and since zip is universally understood, you can regard it as pretty much human readable as well.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Change font of header for LRF Output on PRS 505	duckbill	Calibre	3	05-15-2010 11:07 AM
Pissed off with LRF formatting: LRF/LRS clean tool?	grimborg	LRF	8	02-15-2010 01:14 PM
Fonts for LRF output	krischik	Calibre	1	10-03-2009 05:01 AM
CBZ > LRF (LRF>HTML/MOBI????)	sideburnt	Calibre	4	09-15-2009 06:44 AM
libprs500 Issues Converting .LIT to .LRF - .LRF crashes everything	vasbinde	Calibre	6	02-14-2008 12:16 PM

07-09-2007, 03:38 PM	#226
bkilian Zealot Posts: 131 Karma: 24870 Join Date: Oct 2006 Device: Sony PRS/505	I have a suggestion that I think should be fairly easy to implement, and would increase the usefulness of html2lrf significantly (to me, at least ) I would like a way to essentially "store" the command line options I want to use on a particular book, so that the next time I convert it, I don't have to remember exactly what I used. Now HTML already has a perfect way to store metadata, so why not store it in the HTML file itself? I suggest a set of <meta/> variables that you could stick in the <head> of the html file, and html2lrf uses them to determine what it's settings should be. For example: <meta name="publisher" content="Baen Books" /> <meta name="title" content="Wind Rider's Oath" /> <meta name="author" content="David Weber" /> <meta name="cover" content="0743488210_Cover.jpg" /> <meta name="font-delta" content="-0.5" /> There's no reason I can think of that any of your command line settings wouldn't work in the <meta> section, so a person could essentially store not only the content, but how to replicate the content correctly in the same file. Whether you decide this is something you want to implement or not, I'm probably going to start annotating my HTML files in this way. I figure as long as I keep to the same text you use in your command line variables, I have a better than even chance of it just working if you at some point decide to implement it. (Or I could, if I was bothered to learn Python, but I don't really have a weekend free in the near future )

07-09-2007, 03:53 PM	#227
kovidgoyal creator of calibre Posts: 46,147 Karma: 29626604 Join Date: Oct 2006 Location: Mumbai, India Device: Various	An interesting idea, can you tell me why you need this feature?

07-09-2007, 05:22 PM	#229
kovidgoyal creator of calibre Posts: 46,147 Karma: 29626604 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Hmm why not just use a shell script? Save it in the archive with the html files. And save the metadata in an opf file. html2lrf will read the metadata from the opf automatically when you convert, and the commandline settings will be stored in the script file.

07-09-2007, 05:39 PM	#230
bkilian Zealot Posts: 131 Karma: 24870 Join Date: Oct 2006 Device: Sony PRS/505	That just increases the number of files I have to keep track of from 1+images to 3+images per book, with the added disadvantage that I then have a script file that is probably not portable across systems, and I have to learn the format of an OPF file. As opposed to adding 5 or 6 <meta> lines to an html, it seems quite unwieldy. Oh, it looks like html2lrf converts & #333; (ō) to a space. I'm assuming the sony font doesn't have this character. Is there some way to define a conversion list for special characters?

07-09-2007, 05:57 PM	#231
kovidgoyal creator of calibre Posts: 46,147 Karma: 29626604 Join Date: Oct 2006 Location: Mumbai, India Device: Various	I keep my human readable html rar'ed up, that way I don't really care about how many files the archive contains. If you write the script in a cross platform language (like python) you don't have to worry about portability either. And opf is just a simple XML file. Not really anything to learn and its likely to be around for a while as well. No you can't specify custom character conversion without editing the source code, but you can embed a font that can handle the special characters into the LRF.

07-09-2007, 06:31 PM	#233
kovidgoyal creator of calibre Posts: 46,147 Karma: 29626604 Join Date: Oct 2006 Location: Mumbai, India Device: Various	I would recommend against changing the html2lrf source code as you'd have to maintain the change through new versions of libprs500. What would make more sense is to write a wrapper around html2lrf that makes the changes to the html file before calling html2lrf. Yeah html2lrf supports both zip and rar archives. The opf file doesn't need to be named anything special (as long as it has a .opf extension) and the archive can contain the cover. That error looks like another windows incompatibility bug. Open bug report and attach the zip file and I'll fix it.

07-09-2007, 07:16 PM	#235
bkilian Zealot Posts: 131 Karma: 24870 Join Date: Oct 2006 Device: Sony PRS/505	Also, do you have any plans on adding definition list parsing? <dl> <dt> blah </dt> <dd> blah 2 </dd> </dl> I can convert them into other types of lists, but I'd prefer not to, and if you do plan on adding it, then I'll just wait. (You can see an example of them used in the html file in the zip I included with that bug report)

07-09-2007, 07:35 PM	#236
kovidgoyal creator of calibre Posts: 46,147 Karma: 29626604 Join Date: Oct 2006 Location: Mumbai, India Device: Various	As far as maintaining settings like --header (which I suspect you need over all conversions not on a per file basis) the new GUI will take care of that as it will remember conversion defaults. Adding support for definition lists is trivial and I'll do it in the next release. Yeah I don't think it makes sense to add support for per file defaults to html2lrf.

07-09-2007, 07:52 PM	#237
theswede Junior Member Posts: 4 Karma: 10 Join Date: Jul 2007 Device: Sony Reader	I agree; metatags in the HTML is certainly the way to store metadata. Manual synchronization and scripts is not only a waste of time, but error prone, and it means that when I zip up a few files to take along on my work laptop, the metadata is gone. It should be embedded. I might code this into html2lrf myself just to be able to do things the right way.

07-09-2007, 08:06 PM	#238
kovidgoyal creator of calibre Posts: 46,147 Karma: 29626604 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Ah but opf files are likely to become the standard for ebook metadata.

07-09-2007, 08:10 PM	#239
theswede Junior Member Posts: 4 Karma: 10 Join Date: Jul 2007 Device: Sony Reader	Then I'll write a tool which extracts them from embedded metadata in the unlikely event that is ever needed. Or embeds them in the header, as meta tags. Which is pretty much what is proposed here. I will not mess with scripts and extra files. An ebook should, as far as possible, contain all its required metadata. I know me; I will never maintain an external file. It will rot, and I'll end up having to reorganize my books after a year or so. If I embed the metadata, it's done, for the lifetime of the ebook.

07-09-2007, 08:20 PM	#240
kovidgoyal creator of calibre Posts: 46,147 Karma: 29626604 Join Date: Oct 2006 Location: Mumbai, India Device: Various	erm the epub standard is basically an xhtml file with opf and image files in a zipped container, so it is a single file and since zip is universally understood, you can regard it as pretty much human readable as well.