I seldom use HTML to produce ebooks. I have used DB for a long time and I have had more problems with HTML sources than any other format. DOC or RTF seem to work better and even a well prepared TXT file seems more adapt than HTML. I use the BD TOC functions rather than importing any TOC from the outside.
That said, I reviewed the LRF output and there are sevral things I would have liked to see that I did not and several things I saw that I would liked to have not seen.
While I did see page breaks for the main stories, I did not see page breaks for the TOC, title, and other front material. This produced a run-on situation with the title of the book split between the bottom of one page and the top of the next.
I saw all of the original page references. Many of the PG HTML sources place these on the side away from the body of the text and that is fine. Given the narrow column width of most readers, this is not viable. Some PG HTML sources put them inline (as this LRF output was) and for long pages of text it is not too bad, here with very short pages it is a major intrusion. Some PG texts have even put the page number within a word when it is split over two pages.
I wish you all the possible success with the project Nick. If anyone can pull it off, I believe you can.