|
|
#1 |
|
Junior Member
![]() Posts: 5
Karma: 10
Join Date: May 2011
Device: Kindle 3 Wifi
|
Grabbing pages with wget and using ebook-convert
Hi.
I attempted to grab a wiki page with the following command: Code:
wget --limit-rate=20k --force-directories --html-extension --random-wait --adjust-extension --convert-links --page-requisites -e robots=off --user-agent=Mozilla --span-hosts http://en.wikipedia.org/wiki/Wget Code:
1% Converting input to HTML...
InputFormatPlugin: HTML Input running
on /home/lazy/en.wikipedia.org/wiki/Wget.html
Language not specified
Creator not specified
Building file list...
Normalizing filename cases
Rewriting HTML links
Initial parse failed:
Parsing file 'Wget.html' as HTML
Forcing Wget.html into XHTML namespace
Property: No CSS priority value: u'ie'.
Property: Invalid value for "CSS Level 2.1" property: -moz-inline-box [2:14343: display]
Property: Unknown Property name. [2:14413: zoom]
CSSStyleDeclaration: Unexpected token, ignoring upto u'*display:inline'. [2:14420: *]
Property: Unknown Property name. [2:14574: word-wrap]
Property: Unknown Property name. [2:14732: word-wrap]
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: Invalid value for "CSS Level 2.1" property: -moz-persian [2:35244: list-style-type]
Property: Invalid value for "CSS Level 2.1" property: persian [2:35273: list-style-type]
Property: Invalid value for "CSS Level 2.1" property: -moz-arabic-indic [2:35313: list-style-type]
Property: Invalid value for "CSS Level 2.1" property: arabic-indic [2:35347: list-style-type]
Property: Invalid value for "CSS Level 2.1" property: -moz-bengali [2:35391: list-style-type]
Property: Invalid value for "CSS Level 2.1" property: bengali [2:35420: list-style-type]
Property: Invalid value for "CSS Level 2.1" property: -moz-oriya [2:35459: list-style-type]
Property: Invalid value for "CSS Level 2.1" property: oriya [2:35486: list-style-type]
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: Unknown Property name. [2:574: word-wrap]
Property: Unknown Property name. [2:13172: max-device-width]
Property: Unknown Property name. [2:13201: -webkit-text-size-adjust]
Property: Unknown Property name. [2:13313: filter]
Traceback (most recent call last):
File "/usr/bin/ebook-convert", line 19, in <module>
sys.exit(main())
File "/usr/lib64/calibre/calibre/ebooks/conversion/cli.py", line 283, in main
plumber.run()
File "/usr/lib64/calibre/calibre/ebooks/conversion/plumber.py", line 920, in run
accelerators, tdir)
File "/usr/lib64/calibre/calibre/customize/conversion.py", line 204, in __call__
log, accelerators)
File "/usr/lib64/calibre/calibre/ebooks/html/input.py", line 294, in convert
oeb = self.create_oebbook(stream.name, basedir, opts, log, mi)
File "/usr/lib64/calibre/calibre/ebooks/html/input.py", line 375, in create_oebbook
rewrite_links(item.data, partial(self.resource_adder, base=dpath))
File "/usr/lib64/calibre/calibre/ebooks/oeb/base.py", line 183, in rewrite_links
new_link = link_repl_func(link.strip())
File "/usr/lib64/calibre/calibre/ebooks/html/input.py", line 470, in resource_adder
item.data
File "/usr/lib64/calibre/calibre/ebooks/oeb/base.py", line 1150, in fget
self.href))
File "/usr/lib64/calibre/calibre/ebooks/oeb/base.py", line 471, in read
with open(urlunquote(path), 'rb') as f:
IOError: [Errno 2] No such file or directory: u'/home/lazy/en.wikipedia.org/w/index.php?title=Special:BannerController&cache=/cn.js&301-2'
Actual filename: index.php?title=Special:BannerController&cache=%2F cn.js&301-2 What ebook-convert is asking for: index.php?title=Special:BannerController&cache=/cn.js&301-2 Any thoughts? |
|
|
|
|
|
#2 |
|
Sigil & calibre developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
That's the issue. The file referenced in the document doesn't exist because it was named differently when saved.
|
|
|
|
| Advert | |
|
|
|
|
#3 |
|
Junior Member
![]() Posts: 5
Karma: 10
Join Date: May 2011
Device: Kindle 3 Wifi
|
Is there a workaround maybe?
Or maybe an option where Calibre doesn't bail if a page cannot be found. |
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| How to use wget to download an online HTML book | amoroso | Lounge | 11 | 04-25-2011 06:10 AM |
| use moibpocket to convert web pages? | ignatz | Kindle Formats | 3 | 01-16-2010 12:52 AM |
| Best way to convert linked .html pages? | VulcanRidr | Calibre | 1 | 10-04-2009 12:37 PM |
| html tree via wget -> epub (or other format) | maynard | Workshop | 4 | 05-13-2009 07:05 PM |
| Scanning pages: how many dpi to convert to PDF? | Ammon | Workshop | 4 | 12-28-2008 04:16 PM |