Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 05-25-2011, 09:40 PM   #1
TheLazy1
Junior Member
TheLazy1 began at the beginning.
 
Posts: 5
Karma: 10
Join Date: May 2011
Device: Kindle 3 Wifi
Grabbing pages with wget and using ebook-convert

Hi.
I attempted to grab a wiki page with the following command:

Code:
wget --limit-rate=20k  --force-directories --html-extension --random-wait --adjust-extension --convert-links --page-requisites -e robots=off --user-agent=Mozilla --span-hosts  http://en.wikipedia.org/wiki/Wget
ebook-convert Will bail with these errors:

Code:
1% Converting input to HTML...
InputFormatPlugin: HTML Input running
on /home/lazy/en.wikipedia.org/wiki/Wget.html
Language not specified
Creator not specified
Building file list...
Normalizing filename cases
Rewriting HTML links
Initial parse failed:
Parsing file 'Wget.html' as HTML
Forcing Wget.html into XHTML namespace
Property: No CSS priority value: u'ie'.
Property: Invalid value for "CSS Level 2.1" property: -moz-inline-box [2:14343: display]
Property: Unknown Property name. [2:14413: zoom]
CSSStyleDeclaration: Unexpected token, ignoring upto u'*display:inline'. [2:14420: *]
Property: Unknown Property name. [2:14574: word-wrap]
Property: Unknown Property name. [2:14732: word-wrap]
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: Invalid value for "CSS Level 2.1" property: -moz-persian [2:35244: list-style-type]
Property: Invalid value for "CSS Level 2.1" property: persian [2:35273: list-style-type]
Property: Invalid value for "CSS Level 2.1" property: -moz-arabic-indic [2:35313: list-style-type]
Property: Invalid value for "CSS Level 2.1" property: arabic-indic [2:35347: list-style-type]
Property: Invalid value for "CSS Level 2.1" property: -moz-bengali [2:35391: list-style-type]
Property: Invalid value for "CSS Level 2.1" property: bengali [2:35420: list-style-type]
Property: Invalid value for "CSS Level 2.1" property: -moz-oriya [2:35459: list-style-type]
Property: Invalid value for "CSS Level 2.1" property: oriya [2:35486: list-style-type]
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: Unknown Property name. [2:574: word-wrap]
Property: Unknown Property name. [2:13172: max-device-width]
Property: Unknown Property name. [2:13201: -webkit-text-size-adjust]
Property: Unknown Property name. [2:13313: filter]

Traceback (most recent call last):
  File "/usr/bin/ebook-convert", line 19, in <module>
    sys.exit(main())
  File "/usr/lib64/calibre/calibre/ebooks/conversion/cli.py", line 283, in main
    plumber.run()
  File "/usr/lib64/calibre/calibre/ebooks/conversion/plumber.py", line 920, in run
    accelerators, tdir)
  File "/usr/lib64/calibre/calibre/customize/conversion.py", line 204, in __call__
    log, accelerators)
  File "/usr/lib64/calibre/calibre/ebooks/html/input.py", line 294, in convert
    oeb = self.create_oebbook(stream.name, basedir, opts, log, mi)
  File "/usr/lib64/calibre/calibre/ebooks/html/input.py", line 375, in create_oebbook
    rewrite_links(item.data, partial(self.resource_adder, base=dpath))
  File "/usr/lib64/calibre/calibre/ebooks/oeb/base.py", line 183, in rewrite_links
    new_link = link_repl_func(link.strip())
  File "/usr/lib64/calibre/calibre/ebooks/html/input.py", line 470, in resource_adder
    item.data
  File "/usr/lib64/calibre/calibre/ebooks/oeb/base.py", line 1150, in fget
    self.href))
  File "/usr/lib64/calibre/calibre/ebooks/oeb/base.py", line 471, in read
    with open(urlunquote(path), 'rb') as f:
IOError: [Errno 2] No such file or directory: u'/home/lazy/en.wikipedia.org/w/index.php?title=Special:BannerController&cache=/cn.js&301-2'
I believe this is caused by wget's handling of escaped characters.
Actual filename: index.php?title=Special:BannerController&cache=%2F cn.js&301-2
What ebook-convert is asking for: index.php?title=Special:BannerController&cache=/cn.js&301-2

Any thoughts?
TheLazy1 is offline   Reply With Quote
Old 05-26-2011, 09:58 AM   #2
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by TheLazy1 View Post
I believe this is caused by wget's handling of escaped characters.
Actual filename: index.php?title=Special:BannerController&cache=%2F cn.js&301-2
What ebook-convert is asking for: index.php?title=Special:BannerController&cache=/cn.js&301-2
Any thoughts?
That's the issue. The file referenced in the document doesn't exist because it was named differently when saved.
user_none is offline   Reply With Quote
Advert
Old 05-26-2011, 10:40 AM   #3
TheLazy1
Junior Member
TheLazy1 began at the beginning.
 
Posts: 5
Karma: 10
Join Date: May 2011
Device: Kindle 3 Wifi
Is there a workaround maybe?
Or maybe an option where Calibre doesn't bail if a page cannot be found.
TheLazy1 is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
How to use wget to download an online HTML book amoroso Lounge 11 04-25-2011 05:10 AM
use moibpocket to convert web pages? ignatz Kindle Formats 3 01-15-2010 11:52 PM
Best way to convert linked .html pages? VulcanRidr Calibre 1 10-04-2009 11:37 AM
html tree via wget -> epub (or other format) maynard Workshop 4 05-13-2009 06:05 PM
Scanning pages: how many dpi to convert to PDF? Ammon Workshop 4 12-28-2008 03:16 PM


All times are GMT -4. The time now is 01:43 PM.


MobileRead.com is a privately owned, operated and funded community.