Hi.
I attempted to grab a wiki page with the following command:
Code:
wget --limit-rate=20k --force-directories --html-extension --random-wait --adjust-extension --convert-links --page-requisites -e robots=off --user-agent=Mozilla --span-hosts http://en.wikipedia.org/wiki/Wget
ebook-convert Will bail with these errors:
Code:
1% Converting input to HTML...
InputFormatPlugin: HTML Input running
on /home/lazy/en.wikipedia.org/wiki/Wget.html
Language not specified
Creator not specified
Building file list...
Normalizing filename cases
Rewriting HTML links
Initial parse failed:
Parsing file 'Wget.html' as HTML
Forcing Wget.html into XHTML namespace
Property: No CSS priority value: u'ie'.
Property: Invalid value for "CSS Level 2.1" property: -moz-inline-box [2:14343: display]
Property: Unknown Property name. [2:14413: zoom]
CSSStyleDeclaration: Unexpected token, ignoring upto u'*display:inline'. [2:14420: *]
Property: Unknown Property name. [2:14574: word-wrap]
Property: Unknown Property name. [2:14732: word-wrap]
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: Invalid value for "CSS Level 2.1" property: -moz-persian [2:35244: list-style-type]
Property: Invalid value for "CSS Level 2.1" property: persian [2:35273: list-style-type]
Property: Invalid value for "CSS Level 2.1" property: -moz-arabic-indic [2:35313: list-style-type]
Property: Invalid value for "CSS Level 2.1" property: arabic-indic [2:35347: list-style-type]
Property: Invalid value for "CSS Level 2.1" property: -moz-bengali [2:35391: list-style-type]
Property: Invalid value for "CSS Level 2.1" property: bengali [2:35420: list-style-type]
Property: Invalid value for "CSS Level 2.1" property: -moz-oriya [2:35459: list-style-type]
Property: Invalid value for "CSS Level 2.1" property: oriya [2:35486: list-style-type]
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: Unknown Property name. [2:574: word-wrap]
Property: Unknown Property name. [2:13172: max-device-width]
Property: Unknown Property name. [2:13201: -webkit-text-size-adjust]
Property: Unknown Property name. [2:13313: filter]
Traceback (most recent call last):
File "/usr/bin/ebook-convert", line 19, in <module>
sys.exit(main())
File "/usr/lib64/calibre/calibre/ebooks/conversion/cli.py", line 283, in main
plumber.run()
File "/usr/lib64/calibre/calibre/ebooks/conversion/plumber.py", line 920, in run
accelerators, tdir)
File "/usr/lib64/calibre/calibre/customize/conversion.py", line 204, in __call__
log, accelerators)
File "/usr/lib64/calibre/calibre/ebooks/html/input.py", line 294, in convert
oeb = self.create_oebbook(stream.name, basedir, opts, log, mi)
File "/usr/lib64/calibre/calibre/ebooks/html/input.py", line 375, in create_oebbook
rewrite_links(item.data, partial(self.resource_adder, base=dpath))
File "/usr/lib64/calibre/calibre/ebooks/oeb/base.py", line 183, in rewrite_links
new_link = link_repl_func(link.strip())
File "/usr/lib64/calibre/calibre/ebooks/html/input.py", line 470, in resource_adder
item.data
File "/usr/lib64/calibre/calibre/ebooks/oeb/base.py", line 1150, in fget
self.href))
File "/usr/lib64/calibre/calibre/ebooks/oeb/base.py", line 471, in read
with open(urlunquote(path), 'rb') as f:
IOError: [Errno 2] No such file or directory: u'/home/lazy/en.wikipedia.org/w/index.php?title=Special:BannerController&cache=/cn.js&301-2'
I believe this is caused by wget's handling of escaped characters.
Actual filename:
index.php?title=Special:BannerController&cache=%2F cn.js&301-2
What ebook-convert is asking for:
index.php?title=Special:BannerController&cache=/cn.js&301-2
Any thoughts?