OK, first strange results.
I have attached two files:
ttt.tar.gz - contains a directory 'ttt' that contains a file hierarchy with html and other files from which a ebub is built,
ttt.epub - the epub that was built from the directory 'ttt'
In the directory 'ttt', in the subdirectory 'feed_0', only first two subdirectories ('article_1' and 'article_2') contain index.html files that came from the Globe and Mail feed. All other 'article_*' subdirectories contain dummy index.html files with a simple html text 'Empty Article' (in all these directories, the original index.html files frrom the G&M feed are saved under names 'index.html.saved', so that you can restore the real feed data. I don't think that files with extension '.saved' that are not referenced anywhere in the .html files have any influence on the contents of the resulting epub file.)
The _expected_ result would be that the epub file, when viewed, would show first two articles as from the real G&M feed, and third and subsequent articles with text "Empty Article".
The _observed_ result is that when viewed, the epub file shows first two articles as expected, with the text from the real G&M feed, but third and subsequent articles are shown all as the text of the first article (may be there is a recursion somewhere in the feed?) . This happens both when viewed by the Calibre internal e-book viewer, and when downloaded to the Sony PRS-600. But the Sony does not freeze in this case.
This is a strange result - epub should not contain multiple copies of the first article instead of the articles with "Empty Article" text. May be this is a manifestation of the same error that causes the Sony to freeze on the bigger epub?
|