Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 07-09-2007, 08:28 PM   #241
theswede
Junior Member
theswede began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Jul 2007
Device: Sony Reader
That's three or more files in a zip archive. I will make the change to the code for my own use, as I do not intend to mess with several files, but keep everything in one file as opposed to an archive. I may also add .gz pipe support, if I can be bothered. And a ~/.html2lrf file which can store defaults, since it's a pain to retype them all the time.

Besides, html2lrf doesn't accept such a zip file either, last I checked, so that doesn't help, really.

That said, I want to thank you for making html2lrf. It's saved me a lot of time and is an excellent tool. If it wasn't, I wouldn't be bugging you about it.

txt2lrf however doesn't work for me. It never finishes. =(
theswede is offline   Reply With Quote
Old 07-09-2007, 08:35 PM   #242
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,356
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Well to each his own. If you're having problems with any of the tools, please open bug reports so I can fix them.
kovidgoyal is offline   Reply With Quote
Advert
Old 07-09-2007, 08:40 PM   #243
theswede
Junior Member
theswede began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Jul 2007
Device: Sony Reader
Yes, to each his own, and it's nice with tools which accept data in all manner of forms.

Not sure what to write in a bug report, txt2lrf has never worked for me, it always just works forever using 99% cpu. lit2lrf and html2lrf work eminently though. Haven't found any manual, so not sure what I'm doing wrong, or what might be wrong with the input files; I just invoke txt2lrf with no parameters except the txt file, and it sits there running.
theswede is offline   Reply With Quote
Old 07-09-2007, 08:50 PM   #244
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,356
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Well attach the txt file
kovidgoyal is offline   Reply With Quote
Old 07-09-2007, 10:24 PM   #245
bkilian
Zealot
bkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notes
 
Posts: 131
Karma: 24870
Join Date: Oct 2006
Device: Sony PRS/505
I was just reading one of the files I had converted using html2lrf, and I noticed that it leaves a blank line between paragraphs.
This is ok for some books, but the book I converted from only used indent to signify a new paragraph, and I'd prefer it that way (yours uses indent and a blank line). Is there some way to change this behaviour? (Note, I'd still want an empty paragraph to appear as a blank line, but currently it appears as two blank lines)

Now you see why I'm interested in some way to store per-book settings? Some books need a font embedded, others don't, some need autorotate off, some need this new paragraph format, some do better with a different border width, etc.
As the number of options and ways to convert increase, a way to remember the options used on a particular book will become more important.

If there's some way for me to add this into the OPF, I'd be happy to go that route, it works really well for the normal metadata, so we could essentially extend the <x-metadata> structure to include html2lrf settings.
bkilian is offline   Reply With Quote
Advert
Old 07-09-2007, 10:34 PM   #246
bkilian
Zealot
bkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notes
 
Posts: 131
Karma: 24870
Join Date: Oct 2006
Device: Sony PRS/505
Also, I noticed that while you read the OPF, you ignore the <spine> directive, which tells you in which order to render the html files. Now this is probably a good thing for books that use links to link to other files, but in the case of Baen books, it would be a good idea to use it, since then I woudn't have to extract the LIT, remove the _top htm (because, seriously, who needs a rendered table of contents when the reader provides one for you) and then convert. I could just feed it the LIT. (or in my perfect case, extract the lit, add the correct metadata to the html or OPF file, and then repack the LIT.)

Note that this is definately not a high priority for me, I'm perfectly willing to go on unpacking and modifying, it's just something I noticed when I was looking at the OPF format.
bkilian is offline   Reply With Quote
Old 07-09-2007, 10:50 PM   #247
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,356
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
It shouldn't leave a blank line between paragraphs unless there's an extra <br /> or </p> tag. For e.g. there wont be a blank line converting the following html
<p>para one</p>
<p>para two</p>

Adding per book settings to the OPF file seems like a good way to go. Open a bug report and I'll get around to it eventually. It's going to have to wait till I write a proper OPF parser, which in turn is probably not going to happen till after 0.4.0.

Also open a bug report for the <spine> and I'll add a --use-spine option.

And finally open a bug report for the declaration lists so I dont forget.
kovidgoyal is offline   Reply With Quote
Old 07-10-2007, 01:16 AM   #248
bkilian
Zealot
bkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notes
 
Posts: 131
Karma: 24870
Join Date: Oct 2006
Device: Sony PRS/505
Quote:
Originally Posted by kovidgoyal View Post
It shouldn't leave a blank line between paragraphs unless there's an extra <br /> or </p> tag. For e.g. there wont be a blank line converting the following html
<p>para one</p>
<p>para two</p>
Ooh, you're right, it's the crazy baen html. They add a <p> between every other <p> like this:

Code:
  <p>
   <a id="p9" name="p9">
   </a>
  </p>
  <p onmouseover="PNo(9)">"Don't just stand there like a whore at a wedding, Master Holderman! Trim that foresheet! It's slacker than those idlers you call seamen!" </p>
  <p>
   <a id="p10" name="p10">
   </a>
  </p>
What exactly does the --baen option do? Will it strip out this silliness? (For some reason, Microsoft reader completely ignores the extra <p> and renders with no gaps)

Quote:
Originally Posted by kovidgoyal View Post
Adding per book settings to the OPF file seems like a good way to go. Open a bug report and I'll get around to it eventually. It's going to have to wait till I write a proper OPF parser, which in turn is probably not going to happen till after 0.4.0.

Also open a bug report for the <spine> and I'll add a --use-spine option.

And finally open a bug report for the declaration lists so I dont forget.
Willdo.
bkilian is offline   Reply With Quote
Old 07-10-2007, 02:36 AM   #249
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,356
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
No it wont, it basically removes some extra page-breaks by running a couple of regexps over the HTML before processing it. If you can come up with a regexp that matches this case and doesn't affect anything else, I could add it.
kovidgoyal is offline   Reply With Quote
Old 07-10-2007, 09:29 AM   #250
Xenophon
curmudgeon
Xenophon ought to be getting tired of karma fortunes by now.Xenophon ought to be getting tired of karma fortunes by now.Xenophon ought to be getting tired of karma fortunes by now.Xenophon ought to be getting tired of karma fortunes by now.Xenophon ought to be getting tired of karma fortunes by now.Xenophon ought to be getting tired of karma fortunes by now.Xenophon ought to be getting tired of karma fortunes by now.Xenophon ought to be getting tired of karma fortunes by now.Xenophon ought to be getting tired of karma fortunes by now.Xenophon ought to be getting tired of karma fortunes by now.Xenophon ought to be getting tired of karma fortunes by now.
 
Xenophon's Avatar
 
Posts: 1,487
Karma: 5748190
Join Date: Jun 2006
Location: Redwood City, CA USA
Device: Kobo Aura HD, (ex)nook, (ex)PRS-700, (ex)PRS-500
Quote:
Originally Posted by bkilian View Post
Well, I have a vast (Read: in the hundreds) library of Baen E-books, and I like to keep at least one human readable version around at all times.
[SNIP]
So when I found out that html2lrf does a quite passable conversion on the .LIT html, (as long as I remove the useless table of contents html first) without me doing a huge amount of searching through the book to make sure it was doing the right thing, I jumped on the chance, only to be stumped by the fact that there's no way my human readable archive can contain all the information needed to perform the conversion correctly.

I essentially want to be able to automate the conversion of a number of titles in one go, and it's impossible to do with your current command line driven method of specifying metadata. Essentially, if you add a feature at some point that would benefit me, I'd like to be able to reconvert all my books without having to do it all manually.
[SNIP]
I also have hundreds of Baen books. And I convert them automatically, using liprs500's command line tools.

What I've done is write a few shell scripts which I then run in a terminal window on my Mac OS X box -- although they should work fine (with small edits) on any Unix-ish system.

The first script takes a single input directory holding one Baen html eBook, and converts it into lrf, placing the output into the specified output directory. This script knows how to find the cover JPG, how to find the _toc file (which is what I use as the master input for the conversion), and also knows my favorite settings.

The second script takes a directory containing N subdirs (each as above) plus a single 'mapping' file. The mapping file holds one line per book specifying the mapping from input-dir to output-dir (this is how I manage storing books in directories by author, rather than by Baen's release date). It invokes the first script for each book.

The final script simply takes a list of directories suitable for input to the second, and does the obvious invocation.

The upshot of all this is that when Kovid releases a new html2lrf that has a feature I care about, it's a single command line to re-convert all my Baen eBooks.

I've been quick-and-dirty with the script building, so my file paths are built in rather than read from a config file (or whatever) and there's minimal error checking. I'm happy to share if anyone is interested.
Xenophon is offline   Reply With Quote
Old 07-10-2007, 02:20 PM   #251
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,356
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Released 0.3.67 with support for definition lists and a fix for the handling of zip files.
kovidgoyal is offline   Reply With Quote
Old 07-10-2007, 04:53 PM   #252
bkilian
Zealot
bkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notes
 
Posts: 131
Karma: 24870
Join Date: Oct 2006
Device: Sony PRS/505
Quote:
Originally Posted by kovidgoyal View Post
Released 0.3.67 with support for definition lists and a fix for the handling of zip files.
Dude, you are one of the most responsive devs I've ever seen, and I've seen a lot of devs

As to the Python regular expression, One I've found that only matches paragraphs containing only an <a id...></a> seems to work on the baen books I've tried it on.
<p>\s*<a id.*?>\s*</a>\s*</p>
bkilian is offline   Reply With Quote
Old 07-10-2007, 05:17 PM   #253
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,356
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
That's coz I use libprs500 a lot myself and I want it to be as bug free as possible :-) I essentially use all you guys as free bug hunters as bug-hunting is something I'm extremely lazy about. And those two fixes were about 10 lines of code.

But aren't those id elements referred to by some links in the rest of the file?
kovidgoyal is offline   Reply With Quote
Old 07-10-2007, 06:07 PM   #254
bkilian
Zealot
bkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notesbkilian can name that song in three notes
 
Posts: 131
Karma: 24870
Join Date: Oct 2006
Device: Sony PRS/505
Quote:
Originally Posted by kovidgoyal View Post
That's coz I use libprs500 a lot myself and I want it to be as bug free as possible :-) I essentially use all you guys as free bug hunters as bug-hunting is something I'm extremely lazy about. And those two fixes were about 10 lines of code.

But aren't those id elements referred to by some links in the rest of the file?
Not when it's the only element inside a <p>. The regex grabs every <p> that _only_ has a single <a> in it and only if the <a> starts with "id" and has no text (only whitespace between <a> and </a>). I can't think of, nor did I see, any useful use of that particular combination. We can always make it even more picky by requiring the <a> to have no "href", but I don't think it's necessary.

Edit: Oh, you're asking if the paragraph indicators (which is what these are) are used by anything else? No. They're used in the "web" reading version to update the silly "index" box. (Check http://www.webscription.net/10.1125/...0671318470.htm and move your mouse down the page. You can type a number into the box and it'll jump to that paragraph.) The html in the LIT doesn't have the javascript to enable this.

Note that the pure html versions don't have <p> surrounding the <a> elements, so they don't render, it's really only an issue with the files they include in their LIT versions (I suspect the OEB DTD requires the surrounding <p>).

Last edited by bkilian; 07-10-2007 at 06:19 PM.
bkilian is offline   Reply With Quote
Old 07-10-2007, 06:14 PM   #255
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,356
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I meant aren't there <a href> elements that refer to that id? So that removing the id would make those links not work. THough I suppose I could just remove the <p> and keep the <a>
kovidgoyal is offline   Reply With Quote
Reply

Tags
html2lrf, libprs500


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Change font of header for LRF Output on PRS 505 duckbill Calibre 3 05-15-2010 11:07 AM
Pissed off with LRF formatting: LRF/LRS clean tool? grimborg LRF 8 02-15-2010 01:14 PM
Fonts for LRF output krischik Calibre 1 10-03-2009 05:01 AM
CBZ > LRF (LRF>HTML/MOBI????) sideburnt Calibre 4 09-15-2009 06:44 AM
libprs500 Issues Converting .LIT to .LRF - .LRF crashes everything vasbinde Calibre 6 02-14-2008 12:16 PM


All times are GMT -4. The time now is 10:25 PM.


MobileRead.com is a privately owned, operated and funded community.