MobileRead Forums - View Single Post

pepak · 08-16-2008, 03:27 AM

Thanks to both of you.

Another question: I bought a book scanner and have been using it to convert my paper books into ebooks in HTML format (because I consider it the best in regards of current and future functionality). I noticed several strange things when converting them into LRF using HTML2LRF on Windows and using that LRF on my Sony Reader PRS-505. Please note: I do not use GUI - I convert from command line and then copy the LRF to the Reader using file management utility.

1) "author-sort" doesn't seem to have any effect. I use command line such as

Code:

--author="Steve Perry" --author-sort="PERRY STEVE"

but in the books-by-author the book gets sorted among "S", not among "P".

2) I just can't understand chapter detection and TOC generation: I use <h2> tag for marking chapters, as in

Code:

<h2 id="contents">Table of Contents</h2>
<h2 id="chapter-10">The Attack</h2>

(Note: The id="contents" in the example refers to a hand-crafted TOC for the HTML file, which I will call html-toc further on. My problem relates to the TOC as displayed by the reader, which I will call lrf-toc.)

The command line is:

Code:

--chapter-regex=^

(this is real ^; I had to prepend it by another ^ for use in batch files)

I took it to mean that ANY h[1-6] tag would be considered a new chapter. Curiously enough, in my example above <h2 id="chapter-10"> gets detected as a chapter but <h2 id="contents"> does not. I thought maybe the regexp didn't get used so as an experiment, I renamed that chapter-10 to xxxpter-10, expecting it not to appear in lrf-toc. Strangely enough, it DID get detected. Only that <h2 id="contents"> seems to be ignored.

3) Another problem with chapter detection: I have a book which has 10 chapters and a whole lot of footnotes. I used a <ol> list at the end of the document to store all notes:

Code:

<ol id="notes">
  <li>
    <p id="note-1">Footnote 1</p>
  </li>
  <li>
    <p id="note-2">Footnote 2</p>
  </li>
</ol>

and the command line:

Code:

--force-page-break-before-tag="h2|p id="

(because if I don't use page breaks, the links just won't work correctly in LRF; in case you wonder why I used <p id="..." instead of the sematically better <li id="...">, it's because in the latter case the links won't work correctly even with page breaks).

Two strange things happen:
(i) All footnotes get recognized as chapters (!), so I get some 90 chapters instead of 10 in the lrf-toc.
(ii) Despite the force-page-break, there are as many footnotes per page as can fit (!) and still the links work correctly in the LRF (!!!). I don't complain about it, this result is actually very useful, but I find it strange that with <h2> chapters I need to keep each at the start of its own page to make it work but with <li><p> I can have many on the same page and still they work.

Are these expected behaviors due to some property of LRF which I am not familiar with or are these bugs and I should create a new ticket for them? (In that case, is it possible to send the demo file privately? I do not want to infringe on someone's copyright by posting a book into a public section)

08-16-2008, 03:27 AM	#725
pepak Guru Posts: 610 Karma: 4150 Join Date: Mar 2008 Device: Sony Reader PRS-T3, Kobo Libra H2O	Thanks to both of you. Another question: I bought a book scanner and have been using it to convert my paper books into ebooks in HTML format (because I consider it the best in regards of current and future functionality). I noticed several strange things when converting them into LRF using HTML2LRF on Windows and using that LRF on my Sony Reader PRS-505. Please note: I do not use GUI - I convert from command line and then copy the LRF to the Reader using file management utility. 1) "author-sort" doesn't seem to have any effect. I use command line such as Code: --author="Steve Perry" --author-sort="PERRY STEVE" but in the books-by-author the book gets sorted among "S", not among "P". 2) I just can't understand chapter detection and TOC generation: I use <h2> tag for marking chapters, as in Code: <h2 id="contents">Table of Contents</h2> <h2 id="chapter-10">The Attack</h2> (Note: The id="contents" in the example refers to a hand-crafted TOC for the HTML file, which I will call html-toc further on. My problem relates to the TOC as displayed by the reader, which I will call lrf-toc.) The command line is: Code: --chapter-regex=^ (this is real ^; I had to prepend it by another ^ for use in batch files) I took it to mean that ANY h[1-6] tag would be considered a new chapter. Curiously enough, in my example above <h2 id="chapter-10"> gets detected as a chapter but <h2 id="contents"> does not. I thought maybe the regexp didn't get used so as an experiment, I renamed that chapter-10 to xxxpter-10, expecting it not to appear in lrf-toc. Strangely enough, it DID get detected. Only that <h2 id="contents"> seems to be ignored. 3) Another problem with chapter detection: I have a book which has 10 chapters and a whole lot of footnotes. I used a <ol> list at the end of the document to store all notes: Code: <ol id="notes"> <li> <p id="note-1">Footnote 1</p> </li> <li> <p id="note-2">Footnote 2</p> </li> </ol> and the command line: Code: --force-page-break-before-tag="h2\|p id=" (because if I don't use page breaks, the links just won't work correctly in LRF; in case you wonder why I used <p id="..." instead of the sematically better <li id="...">, it's because in the latter case the links won't work correctly even with page breaks). Two strange things happen: (i) All footnotes get recognized as chapters (!), so I get some 90 chapters instead of 10 in the lrf-toc. (ii) Despite the force-page-break, there are as many footnotes per page as can fit (!) and still the links work correctly in the LRF (!!!). I don't complain about it, this result is actually very useful, but I find it strange that with <h2> chapters I need to keep each at the start of its own page to make it work but with <li><p> I can have many on the same page and still they work. Are these expected behaviors due to some property of LRF which I am not familiar with or are these bugs and I should create a new ticket for them? (In that case, is it possible to send the demo file privately? I do not want to infringe on someone's copyright by posting a book into a public section)