Thread: LRF output
View Single Post
Old 08-16-2008, 08:15 AM   #727
llasram
Reticulator of Tharn
llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.
 
llasram's Avatar
 
Posts: 618
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
Quote:
Originally Posted by pepak View Post
Another question: I bought a book scanner and have been using it to convert my paper books into ebooks in HTML format (because I consider it the best in regards of current and future functionality).
HTML – excellent choice. I would actually recommend going the extra mile and saving your books as full EPUB books. Even if you don’t like the way ADE on the Reader renders EPUB, the additional metadata, external TOC, etc. in EPUB is an arguably better-in-the-first-place work-around for some of the issues below.

Quote:
Originally Posted by pepak View Post
1) "author-sort" doesn't seem to have any effect. I use command line such asn
Code:
--author="Steve Perry" --author-sort="PERRY STEVE"
but in the books-by-author the book gets sorted among "S", not among "P".
Hmm. That’s weird. I just tested with my firmware-updated 505 and it totally ignores the ‘Author.reading’ metadata. I vaguely remember it working before I updated the firmware, but I use the “Sort by Author” view so infrequently that I can’t be sure. This one looks like an upstream problem with Sony, although if purchased DRMed BBeB books sort correctly it may mean a community miscomprehension of the file format.

Quote:
Originally Posted by pepak View Post
2) I just can't understand chapter detection and TOC generation: I use <h2> tag for marking chapters, as in
Code:
<h2 id="contents">Table of Contents</h2>
<h2 id="chapter-10">The Attack</h2>
(Note: The id="contents" in the example refers to a hand-crafted TOC for the HTML file, which I will call html-toc further on. My problem relates to the TOC as displayed by the reader, which I will call lrf-toc.)

The command line is:
Code:
--chapter-regex=^
(this is real ^; I had to prepend it by another ^ for use in batch files)

I took it to mean that ANY h[1-6] tag would be considered a new chapter. Curiously enough, in my example above <h2 id="chapter-10"> gets detected as a chapter but <h2 id="contents"> does not. I thought maybe the regexp didn't get used so as an experiment, I renamed that chapter-10 to xxxpter-10, expecting it not to appear in lrf-toc. Strangely enough, it DID get detected. Only that <h2 id="contents"> seems to be ignored.
I’m not able to reproduce this one with a minimal example. Could you open a ticket with a file reproducing the error?

Quote:
Originally Posted by pepak View Post
3) Another problem with chapter detection: I have a book which has 10 chapters and a whole lot of footnotes. I used a <ol> list at the end of the document to store all notes:
Code:
<ol id="notes">
  <li>
    <p id="note-1">Footnote 1</p>
  </li>
  <li>
    <p id="note-2">Footnote 2</p>
  </li>
</ol>
and the command line:
Code:
--force-page-break-before-tag="h2|p id="
That regexp is only applied to the tag name, so the ‘p id=’ portion will never match.

Quote:
Originally Posted by pepak View Post
(because if I don't use page breaks, the links just won't work correctly in LRF; in case you wonder why I used <p id="..." instead of the sematically better <li id="...">, it's because in the latter case the links won't work correctly even with page breaks).
That sounds like a bug. If you can create a fairly minimal file reproducing the error, could you submit a ticket for that one too?

Quote:
Originally Posted by pepak View Post
Two strange things happen:
(i) All footnotes get recognized as chapters (!), so I get some 90 chapters instead of 10 in the lrf-toc.
Default behavior is to add all link-targets to the lrf-toc – see the option ‘--no-links-in-toc’.

Quote:
Originally Posted by pepak View Post
(ii) Despite the force-page-break, there are as many footnotes per page as can fit (!) and still the links work correctly in the LRF (!!!). I don't complain about it, this result is actually very useful, but I find it strange that with <h2> chapters I need to keep each at the start of its own page to make it work but with <li><p> I can have many on the same page and still they work.
If I understand this correctly, there are two issues going on here. First, that calibre’s chapter-detection co-joins “add this to the lrf-toc as a chapter” and “put a page-break at this point.” As an alternative to this, you can create an OPF file specifying an external NCX TOC (or HTML TOC). Calibre will generate an lrf-toc from that without inserting page-breaks. The second issue is the inconsistent way calibre finds link-targets, only paying attention to the ‘id’ attribute on a handful of tags – much obliged if you could open a ticket there too.

Quote:
Originally Posted by pepak View Post
Are these expected behaviors due to some property of LRF which I am not familiar with or are these bugs and I should create a new ticket for them? (In that case, is it possible to send the demo file privately? I do not want to infringe on someone's copyright by posting a book into a public section)
Well ideally for each ticket you would create a minimal HTML input file which re-creates the described error. Failing that, could you (perhaps with a script) replace all the text in your HTML file with “lorem ipsum” text? If not, then... Actually, if you e-mail me the file at llasram@gmail.com I’ll do the “lorem ipsum” replacement and send you back the resulting file for you to directly attach to the ticket(s)

Quote:
Originally Posted by pepak View Post
4) Paragraphs in <blockquote> have a much larger padding between them than normal paragraphs.

5) Paragraphs in <blockquote> can't be centered using class styles.
Those are known-but-annoying issues with calibre’s ad-hoc CSS parsing and rendering. With the Reader getting EPUB support LRF formatting issues are downgraded a bit, but that one bugs me too and if you open a ticket I’ll see if I can’t at least improve the situation.
llasram is offline   Reply With Quote