Thank, Kovid. When I override get_article_url(), it is never called.
Regarding vanityfair, you're right that it's a split page problem. They do have print version. However, downloading the print version causes an error.
Code:
Auto cleanup of URL: u'http://www.vanityfair.com/culture/features/2009/03/godfather200903.print' failed
Full output below
Code:
[C:\Program Files (x86)\Calibre2]ebook-convert longform.recipe .epub --test --d
ebug-pipeline debug
1% Converting input to HTML...
InputFormatPlugin: Recipe Input running
1% Fetching feeds...
1% Got feeds from index page
1% Trying to download cover...
34% Downloading cover from http://longform.org/wp-content/themes/grid_focus_apri
l2011/images/longform_flag.jpg
1% Generating masthead...
Synthesizing mastheadImage
1% Starting download [4 thread(s)]...
9% Article downloaded: u'The End of Borders and the Future of Books'
17% Article downloaded: u'The Sicario: A Ju\xe1rez Hit Man Speaks'
25% Article downloaded: u'The Assassination: The Reporters\u2019 Story'
WARNING: Encoding detection confidence 76%
Auto cleanup of URL: u'http://www.vanityfair.com/culture/features/2009/03/godfat
her200903.print' failed
34% Article downloaded: u'The Godfather Wars'
34% Feeds downloaded to c:\temp\calibre_0.8.27_tmp_eln8ut\doddxu_plumber\index.h
tml
34% Download finished
Input debug saved to: C:\Program Files (x86)\Calibre2\debug\input
Parsing all content...
Forcing index.html into XHTML namespace
Forcing feed_0/article_0/index.html into XHTML namespace
Parsing file 'feed_0/index.html' as HTML
Forcing feed_0/index.html into XHTML namespace
Parsing file 'feed_1/index.html' as HTML
Forcing feed_1/index.html into XHTML namespace
Forcing feed_1/article_0/index.html into XHTML namespace
Found microsoft markup, cleaning...
Parsing file 'feed_0/article_1/index.html' as HTML
Forcing feed_0/article_1/index.html into XHTML namespace
Stripping comments and meta tags from feed_0/article_1/index.html
File 'feed_0/article_1/index.html' missing <head/> element
File 'feed_0/article_1/index.html' missing <body/> element
Failed to parse content in feed_0/article_1/index.html
Forcing feed_1/article_1/index.html into XHTML namespace
Referenced file 'feed_0/article_1/index.html' not in manifest
Referenced file 'feed_2/index.html' not found
Found microsoft markup, cleaning...
Parsing file 'feed_0/article_1/index.html' as HTML
Forcing feed_0/article_1/index.html into XHTML namespace
Stripping comments and meta tags from feed_0/article_1/index.html
File 'feed_0/article_1/index.html' missing <head/> element
File 'feed_0/article_1/index.html' missing <body/> element
Python function terminated unexpectedly
list index out of range (Error Code: 1)
Traceback (most recent call last):
File "site.py", line 132, in main
File "site.py", line 109, in run_entry_point
File "site-packages\calibre\ebooks\conversion\cli.py", line 287, in main
File "site-packages\calibre\ebooks\conversion\plumber.py", line 968, in run
File "site-packages\calibre\ebooks\conversion\plumber.py", line 1114, in creat
e_oebbook
File "site-packages\calibre\ebooks\oeb\reader.py", line 71, in __call__
File "site-packages\calibre\ebooks\oeb\reader.py", line 611, in _all_from_opf
File "site-packages\calibre\ebooks\oeb\reader.py", line 261, in _manifest_from
_opf
File "site-packages\calibre\ebooks\oeb\reader.py", line 185, in _manifest_add_
missing
File "site-packages\calibre\ebooks\oeb\base.py", line 1161, in fget
File "site-packages\calibre\ebooks\oeb\base.py", line 1032, in _parse_xhtml
IndexError: list index out of range
I have another question: the feed recipe gives me only 15 or so articles even though my limit is set much higher than that. When I use my RSS reader, I can see many more articles going back many months, and I can use "load more articles" to get even more. Can I force it to get more articles?