View Single Post
Old 11-22-2011, 07:01 PM   #3
Barty
doofus
Barty ought to be getting tired of karma fortunes by now.Barty ought to be getting tired of karma fortunes by now.Barty ought to be getting tired of karma fortunes by now.Barty ought to be getting tired of karma fortunes by now.Barty ought to be getting tired of karma fortunes by now.Barty ought to be getting tired of karma fortunes by now.Barty ought to be getting tired of karma fortunes by now.Barty ought to be getting tired of karma fortunes by now.Barty ought to be getting tired of karma fortunes by now.Barty ought to be getting tired of karma fortunes by now.Barty ought to be getting tired of karma fortunes by now.
 
Barty's Avatar
 
Posts: 2,552
Karma: 13089041
Join Date: Sep 2010
Device: Kobo Libra 2, Kindle Voyage
Thank, Kovid. When I override get_article_url(), it is never called.

Regarding vanityfair, you're right that it's a split page problem. They do have print version. However, downloading the print version causes an error.

Code:
Auto cleanup of URL: u'http://www.vanityfair.com/culture/features/2009/03/godfather200903.print' failed
Full output below

Code:
[C:\Program Files (x86)\Calibre2]ebook-convert longform.recipe .epub --test  --d
ebug-pipeline debug
1% Converting input to HTML...
InputFormatPlugin: Recipe Input running
1% Fetching feeds...
1% Got feeds from index page
1% Trying to download cover...
34% Downloading cover from http://longform.org/wp-content/themes/grid_focus_apri
l2011/images/longform_flag.jpg
1% Generating masthead...
Synthesizing mastheadImage
1% Starting download [4 thread(s)]...
9% Article downloaded: u'The End of Borders and the Future of Books'
17% Article downloaded: u'The Sicario: A Ju\xe1rez Hit Man Speaks'
25% Article downloaded: u'The Assassination: The Reporters\u2019 Story'
WARNING: Encoding detection confidence 76%
Auto cleanup of URL: u'http://www.vanityfair.com/culture/features/2009/03/godfat
her200903.print' failed
34% Article downloaded: u'The Godfather Wars'
34% Feeds downloaded to c:\temp\calibre_0.8.27_tmp_eln8ut\doddxu_plumber\index.h
tml
34% Download finished
Input debug saved to: C:\Program Files (x86)\Calibre2\debug\input
Parsing all content...
Forcing index.html into XHTML namespace
Forcing feed_0/article_0/index.html into XHTML namespace
Parsing file 'feed_0/index.html' as HTML
Forcing feed_0/index.html into XHTML namespace
Parsing file 'feed_1/index.html' as HTML
Forcing feed_1/index.html into XHTML namespace
Forcing feed_1/article_0/index.html into XHTML namespace
Found microsoft markup, cleaning...
Parsing file 'feed_0/article_1/index.html' as HTML
Forcing feed_0/article_1/index.html into XHTML namespace
Stripping comments and meta tags from feed_0/article_1/index.html
File 'feed_0/article_1/index.html' missing <head/> element
File 'feed_0/article_1/index.html' missing <body/> element
Failed to parse content in feed_0/article_1/index.html
Forcing feed_1/article_1/index.html into XHTML namespace
Referenced file 'feed_0/article_1/index.html' not in manifest
Referenced file 'feed_2/index.html' not found
Found microsoft markup, cleaning...
Parsing file 'feed_0/article_1/index.html' as HTML
Forcing feed_0/article_1/index.html into XHTML namespace
Stripping comments and meta tags from feed_0/article_1/index.html
File 'feed_0/article_1/index.html' missing <head/> element
File 'feed_0/article_1/index.html' missing <body/> element
Python function terminated unexpectedly
  list index out of range (Error Code: 1)
Traceback (most recent call last):
  File "site.py", line 132, in main
  File "site.py", line 109, in run_entry_point
  File "site-packages\calibre\ebooks\conversion\cli.py", line 287, in main
  File "site-packages\calibre\ebooks\conversion\plumber.py", line 968, in run
  File "site-packages\calibre\ebooks\conversion\plumber.py", line 1114, in creat
e_oebbook
  File "site-packages\calibre\ebooks\oeb\reader.py", line 71, in __call__
  File "site-packages\calibre\ebooks\oeb\reader.py", line 611, in _all_from_opf
  File "site-packages\calibre\ebooks\oeb\reader.py", line 261, in _manifest_from
_opf
  File "site-packages\calibre\ebooks\oeb\reader.py", line 185, in _manifest_add_
missing
  File "site-packages\calibre\ebooks\oeb\base.py", line 1161, in fget
  File "site-packages\calibre\ebooks\oeb\base.py", line 1032, in _parse_xhtml
IndexError: list index out of range
I have another question: the feed recipe gives me only 15 or so articles even though my limit is set much higher than that. When I use my RSS reader, I can see many more articles going back many months, and I can use "load more articles" to get even more. Can I force it to get more articles?

Last edited by Barty; 11-28-2011 at 11:35 AM.
Barty is offline   Reply With Quote