Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 05-25-2011, 09:40 PM   #1
TheLazy1
Junior Member
TheLazy1 began at the beginning.
 
Posts: 5
Karma: 10
Join Date: May 2011
Device: Kindle 3 Wifi
Grabbing pages with wget and using ebook-convert

Hi.
I attempted to grab a wiki page with the following command:

Code:
wget --limit-rate=20k  --force-directories --html-extension --random-wait --adjust-extension --convert-links --page-requisites -e robots=off --user-agent=Mozilla --span-hosts  http://en.wikipedia.org/wiki/Wget
ebook-convert Will bail with these errors:

Code:
1% Converting input to HTML...
InputFormatPlugin: HTML Input running
on /home/lazy/en.wikipedia.org/wiki/Wget.html
Language not specified
Creator not specified
Building file list...
Normalizing filename cases
Rewriting HTML links
Initial parse failed:
Parsing file 'Wget.html' as HTML
Forcing Wget.html into XHTML namespace
Property: No CSS priority value: u'ie'.
Property: Invalid value for "CSS Level 2.1" property: -moz-inline-box [2:14343: display]
Property: Unknown Property name. [2:14413: zoom]
CSSStyleDeclaration: Unexpected token, ignoring upto u'*display:inline'. [2:14420: *]
Property: Unknown Property name. [2:14574: word-wrap]
Property: Unknown Property name. [2:14732: word-wrap]
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: Invalid value for "CSS Level 2.1" property: -moz-persian [2:35244: list-style-type]
Property: Invalid value for "CSS Level 2.1" property: persian [2:35273: list-style-type]
Property: Invalid value for "CSS Level 2.1" property: -moz-arabic-indic [2:35313: list-style-type]
Property: Invalid value for "CSS Level 2.1" property: arabic-indic [2:35347: list-style-type]
Property: Invalid value for "CSS Level 2.1" property: -moz-bengali [2:35391: list-style-type]
Property: Invalid value for "CSS Level 2.1" property: bengali [2:35420: list-style-type]
Property: Invalid value for "CSS Level 2.1" property: -moz-oriya [2:35459: list-style-type]
Property: Invalid value for "CSS Level 2.1" property: oriya [2:35486: list-style-type]
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: No CSS priority value: u'ie'.
Property: Unknown Property name. [2:574: word-wrap]
Property: Unknown Property name. [2:13172: max-device-width]
Property: Unknown Property name. [2:13201: -webkit-text-size-adjust]
Property: Unknown Property name. [2:13313: filter]

Traceback (most recent call last):
  File "/usr/bin/ebook-convert", line 19, in <module>
    sys.exit(main())
  File "/usr/lib64/calibre/calibre/ebooks/conversion/cli.py", line 283, in main
    plumber.run()
  File "/usr/lib64/calibre/calibre/ebooks/conversion/plumber.py", line 920, in run
    accelerators, tdir)
  File "/usr/lib64/calibre/calibre/customize/conversion.py", line 204, in __call__
    log, accelerators)
  File "/usr/lib64/calibre/calibre/ebooks/html/input.py", line 294, in convert
    oeb = self.create_oebbook(stream.name, basedir, opts, log, mi)
  File "/usr/lib64/calibre/calibre/ebooks/html/input.py", line 375, in create_oebbook
    rewrite_links(item.data, partial(self.resource_adder, base=dpath))
  File "/usr/lib64/calibre/calibre/ebooks/oeb/base.py", line 183, in rewrite_links
    new_link = link_repl_func(link.strip())
  File "/usr/lib64/calibre/calibre/ebooks/html/input.py", line 470, in resource_adder
    item.data
  File "/usr/lib64/calibre/calibre/ebooks/oeb/base.py", line 1150, in fget
    self.href))
  File "/usr/lib64/calibre/calibre/ebooks/oeb/base.py", line 471, in read
    with open(urlunquote(path), 'rb') as f:
IOError: [Errno 2] No such file or directory: u'/home/lazy/en.wikipedia.org/w/index.php?title=Special:BannerController&cache=/cn.js&301-2'
I believe this is caused by wget's handling of escaped characters.
Actual filename: index.php?title=Special:BannerController&cache=%2F cn.js&301-2
What ebook-convert is asking for: index.php?title=Special:BannerController&cache=/cn.js&301-2

Any thoughts?
TheLazy1 is offline   Reply With Quote
Old 05-26-2011, 09:58 AM   #2
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,433
Karma: 950001
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by TheLazy1 View Post
I believe this is caused by wget's handling of escaped characters.
Actual filename: index.php?title=Special:BannerController&cache=%2F cn.js&301-2
What ebook-convert is asking for: index.php?title=Special:BannerController&cache=/cn.js&301-2
Any thoughts?
That's the issue. The file referenced in the document doesn't exist because it was named differently when saved.
user_none is offline   Reply With Quote
 
Enthusiast
Old 05-26-2011, 10:40 AM   #3
TheLazy1
Junior Member
TheLazy1 began at the beginning.
 
Posts: 5
Karma: 10
Join Date: May 2011
Device: Kindle 3 Wifi
Is there a workaround maybe?
Or maybe an option where Calibre doesn't bail if a page cannot be found.
TheLazy1 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
How to use wget to download an online HTML book amoroso Lounge 11 04-25-2011 05:10 AM
use moibpocket to convert web pages? ignatz Kindle Formats 3 01-15-2010 11:52 PM
Best way to convert linked .html pages? VulcanRidr Calibre 1 10-04-2009 11:37 AM
html tree via wget -> epub (or other format) maynard Workshop 4 05-13-2009 06:05 PM
Scanning pages: how many dpi to convert to PDF? Ammon Workshop 4 12-28-2008 03:16 PM


All times are GMT -4. The time now is 08:46 AM.


MobileRead.com is a privately owned, operated and funded community.