Thread: LRF output
View Single Post
Old 08-16-2008, 03:27 AM   #725
pepak
Guru
pepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura about
 
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
Thanks to both of you.

Another question: I bought a book scanner and have been using it to convert my paper books into ebooks in HTML format (because I consider it the best in regards of current and future functionality). I noticed several strange things when converting them into LRF using HTML2LRF on Windows and using that LRF on my Sony Reader PRS-505. Please note: I do not use GUI - I convert from command line and then copy the LRF to the Reader using file management utility.

1) "author-sort" doesn't seem to have any effect. I use command line such as
Code:
--author="Steve Perry" --author-sort="PERRY STEVE"
but in the books-by-author the book gets sorted among "S", not among "P".

2) I just can't understand chapter detection and TOC generation: I use <h2> tag for marking chapters, as in
Code:
<h2 id="contents">Table of Contents</h2>
<h2 id="chapter-10">The Attack</h2>
(Note: The id="contents" in the example refers to a hand-crafted TOC for the HTML file, which I will call html-toc further on. My problem relates to the TOC as displayed by the reader, which I will call lrf-toc.)

The command line is:
Code:
--chapter-regex=^
(this is real ^; I had to prepend it by another ^ for use in batch files)

I took it to mean that ANY h[1-6] tag would be considered a new chapter. Curiously enough, in my example above <h2 id="chapter-10"> gets detected as a chapter but <h2 id="contents"> does not. I thought maybe the regexp didn't get used so as an experiment, I renamed that chapter-10 to xxxpter-10, expecting it not to appear in lrf-toc. Strangely enough, it DID get detected. Only that <h2 id="contents"> seems to be ignored.

3) Another problem with chapter detection: I have a book which has 10 chapters and a whole lot of footnotes. I used a <ol> list at the end of the document to store all notes:
Code:
<ol id="notes">
  <li>
    <p id="note-1">Footnote 1</p>
  </li>
  <li>
    <p id="note-2">Footnote 2</p>
  </li>
</ol>
and the command line:
Code:
--force-page-break-before-tag="h2|p id="
(because if I don't use page breaks, the links just won't work correctly in LRF; in case you wonder why I used <p id="..." instead of the sematically better <li id="...">, it's because in the latter case the links won't work correctly even with page breaks).

Two strange things happen:
(i) All footnotes get recognized as chapters (!), so I get some 90 chapters instead of 10 in the lrf-toc.
(ii) Despite the force-page-break, there are as many footnotes per page as can fit (!) and still the links work correctly in the LRF (!!!). I don't complain about it, this result is actually very useful, but I find it strange that with <h2> chapters I need to keep each at the start of its own page to make it work but with <li><p> I can have many on the same page and still they work.

Are these expected behaviors due to some property of LRF which I am not familiar with or are these bugs and I should create a new ticket for them? (In that case, is it possible to send the demo file privately? I do not want to infringe on someone's copyright by posting a book into a public section)
pepak is offline   Reply With Quote