05-25-2011, 09:40 PM | #1 |
Junior Member
Posts: 5
Karma: 10
Join Date: May 2011
Device: Kindle 3 Wifi
|
Grabbing pages with wget and using ebook-convert
Hi.
I attempted to grab a wiki page with the following command: Code:
wget --limit-rate=20k --force-directories --html-extension --random-wait --adjust-extension --convert-links --page-requisites -e robots=off --user-agent=Mozilla --span-hosts http://en.wikipedia.org/wiki/Wget Code:
1% Converting input to HTML... InputFormatPlugin: HTML Input running on /home/lazy/en.wikipedia.org/wiki/Wget.html Language not specified Creator not specified Building file list... Normalizing filename cases Rewriting HTML links Initial parse failed: Parsing file 'Wget.html' as HTML Forcing Wget.html into XHTML namespace Property: No CSS priority value: u'ie'. Property: Invalid value for "CSS Level 2.1" property: -moz-inline-box [2:14343: display] Property: Unknown Property name. [2:14413: zoom] CSSStyleDeclaration: Unexpected token, ignoring upto u'*display:inline'. [2:14420: *] Property: Unknown Property name. [2:14574: word-wrap] Property: Unknown Property name. [2:14732: word-wrap] Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: Invalid value for "CSS Level 2.1" property: -moz-persian [2:35244: list-style-type] Property: Invalid value for "CSS Level 2.1" property: persian [2:35273: list-style-type] Property: Invalid value for "CSS Level 2.1" property: -moz-arabic-indic [2:35313: list-style-type] Property: Invalid value for "CSS Level 2.1" property: arabic-indic [2:35347: list-style-type] Property: Invalid value for "CSS Level 2.1" property: -moz-bengali [2:35391: list-style-type] Property: Invalid value for "CSS Level 2.1" property: bengali [2:35420: list-style-type] Property: Invalid value for "CSS Level 2.1" property: -moz-oriya [2:35459: list-style-type] Property: Invalid value for "CSS Level 2.1" property: oriya [2:35486: list-style-type] Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: No CSS priority value: u'ie'. Property: Unknown Property name. [2:574: word-wrap] Property: Unknown Property name. [2:13172: max-device-width] Property: Unknown Property name. [2:13201: -webkit-text-size-adjust] Property: Unknown Property name. [2:13313: filter] Traceback (most recent call last): File "/usr/bin/ebook-convert", line 19, in <module> sys.exit(main()) File "/usr/lib64/calibre/calibre/ebooks/conversion/cli.py", line 283, in main plumber.run() File "/usr/lib64/calibre/calibre/ebooks/conversion/plumber.py", line 920, in run accelerators, tdir) File "/usr/lib64/calibre/calibre/customize/conversion.py", line 204, in __call__ log, accelerators) File "/usr/lib64/calibre/calibre/ebooks/html/input.py", line 294, in convert oeb = self.create_oebbook(stream.name, basedir, opts, log, mi) File "/usr/lib64/calibre/calibre/ebooks/html/input.py", line 375, in create_oebbook rewrite_links(item.data, partial(self.resource_adder, base=dpath)) File "/usr/lib64/calibre/calibre/ebooks/oeb/base.py", line 183, in rewrite_links new_link = link_repl_func(link.strip()) File "/usr/lib64/calibre/calibre/ebooks/html/input.py", line 470, in resource_adder item.data File "/usr/lib64/calibre/calibre/ebooks/oeb/base.py", line 1150, in fget self.href)) File "/usr/lib64/calibre/calibre/ebooks/oeb/base.py", line 471, in read with open(urlunquote(path), 'rb') as f: IOError: [Errno 2] No such file or directory: u'/home/lazy/en.wikipedia.org/w/index.php?title=Special:BannerController&cache=/cn.js&301-2' Actual filename: index.php?title=Special:BannerController&cache=%2F cn.js&301-2 What ebook-convert is asking for: index.php?title=Special:BannerController&cache=/cn.js&301-2 Any thoughts? |
05-26-2011, 09:58 AM | #2 |
Sigil & calibre developer
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
That's the issue. The file referenced in the document doesn't exist because it was named differently when saved.
|
Advert | |
|
05-26-2011, 10:40 AM | #3 |
Junior Member
Posts: 5
Karma: 10
Join Date: May 2011
Device: Kindle 3 Wifi
|
Is there a workaround maybe?
Or maybe an option where Calibre doesn't bail if a page cannot be found. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to use wget to download an online HTML book | amoroso | Lounge | 11 | 04-25-2011 05:10 AM |
use moibpocket to convert web pages? | ignatz | Kindle Formats | 3 | 01-15-2010 11:52 PM |
Best way to convert linked .html pages? | VulcanRidr | Calibre | 1 | 10-04-2009 11:37 AM |
html tree via wget -> epub (or other format) | maynard | Workshop | 4 | 05-13-2009 06:05 PM |
Scanning pages: how many dpi to convert to PDF? | Ammon | Workshop | 4 | 12-28-2008 03:16 PM |