View Single Post
Old 05-25-2011, 09:40 PM   #1
TheLazy1
Junior Member
TheLazy1 began at the beginning.
 
Posts: 5
Karma: 10
Join Date: May 2011
Device: Kindle 3 Wifi
Grabbing pages with wget and using ebook-convert

Hi.
I attempted to grab a wiki page with the following command:

Code:
wget --limit-rate=20k  --force-directories --html-extension --random-wait --adjust-extension --convert-links --page-requisites -e robots=off --user-agent=Mozilla --span-hosts  http://en.wikipedia.org/wiki/Wget
ebook-convert Will bail with these errors:

Code:
1% Converting input to HTML...
InputFormatPlugin: HTML Input running
on /home/lazy/en.wikipedia.org/wiki/Wget.html
Language not specified
Creator not specified
Building file list...
Normalizing filename cases
Rewriting HTML links
Initial parse failed:
Parsing file 'Wget.html' as HTML
Forcing Wget.html into XHTML namespace
Property: No CSS priority value: u'ie'.
Property: Invalid value for "CSS Level 2.1" property: -moz-inline-box [2:14343: display]
Property: Unknown Property name. [2:14413: zoom]
CSSStyleDeclaration: Unexpected token, ignoring upto u'*display:inline'. [2:14420: *]
Property: Unknown Property name. [2:14574: word-wrap]
Property: Unknown Property name. [2:14732: word-wrap]
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: Invalid value for "CSS Level 2.1" property: -moz-persian [2:35244: list-style-type]
Property: Invalid value for "CSS Level 2.1" property: persian [2:35273: list-style-type]
Property: Invalid value for "CSS Level 2.1" property: -moz-arabic-indic [2:35313: list-style-type]
Property: Invalid value for "CSS Level 2.1" property: arabic-indic [2:35347: list-style-type]
Property: Invalid value for "CSS Level 2.1" property: -moz-bengali [2:35391: list-style-type]
Property: Invalid value for "CSS Level 2.1" property: bengali [2:35420: list-style-type]
Property: Invalid value for "CSS Level 2.1" property: -moz-oriya [2:35459: list-style-type]
Property: Invalid value for "CSS Level 2.1" property: oriya [2:35486: list-style-type]
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: Unknown Property name. [2:574: word-wrap]
Property: Unknown Property name. [2:13172: max-device-width]
Property: Unknown Property name. [2:13201: -webkit-text-size-adjust]
Property: Unknown Property name. [2:13313: filter]

Traceback (most recent call last):
  File "/usr/bin/ebook-convert", line 19, in <module>
    sys.exit(main())
  File "/usr/lib64/calibre/calibre/ebooks/conversion/cli.py", line 283, in main
    plumber.run()
  File "/usr/lib64/calibre/calibre/ebooks/conversion/plumber.py", line 920, in run
    accelerators, tdir)
  File "/usr/lib64/calibre/calibre/customize/conversion.py", line 204, in __call__
    log, accelerators)
  File "/usr/lib64/calibre/calibre/ebooks/html/input.py", line 294, in convert
    oeb = self.create_oebbook(stream.name, basedir, opts, log, mi)
  File "/usr/lib64/calibre/calibre/ebooks/html/input.py", line 375, in create_oebbook
    rewrite_links(item.data, partial(self.resource_adder, base=dpath))
  File "/usr/lib64/calibre/calibre/ebooks/oeb/base.py", line 183, in rewrite_links
    new_link = link_repl_func(link.strip())
  File "/usr/lib64/calibre/calibre/ebooks/html/input.py", line 470, in resource_adder
    item.data
  File "/usr/lib64/calibre/calibre/ebooks/oeb/base.py", line 1150, in fget
    self.href))
  File "/usr/lib64/calibre/calibre/ebooks/oeb/base.py", line 471, in read
    with open(urlunquote(path), 'rb') as f:
IOError: [Errno 2] No such file or directory: u'/home/lazy/en.wikipedia.org/w/index.php?title=Special:BannerController&cache=/cn.js&301-2'
I believe this is caused by wget's handling of escaped characters.
Actual filename: index.php?title=Special:BannerController&cache=%2F cn.js&301-2
What ebook-convert is asking for: index.php?title=Special:BannerController&cache=/cn.js&301-2

Any thoughts?
TheLazy1 is offline   Reply With Quote