05-28-2008, 10:39 PM | #46 |
Addict
Posts: 208
Karma: 1523
Join Date: Jul 2007
Location: Houston,TX
Device: PRS-T1
|
1) To be honest I'm not 100% clear either, it has to do with how deep and how many links are followed to create the final file.
2) I see the same thing when executing web2lrf from the command line. I didn't see anything obviously unusual about the links. |
05-28-2008, 11:53 PM | #47 |
creator of calibre
Posts: 43,857
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
max-recursions controls the number of levels of recursion when downloading and link-levels controls the number of elvels of recursion when converting from HTML to LRF
There are two options because there are actually two separate components under the hood that do the downloading and converting respectively The wikipedia links show up fine for me Last edited by kovidgoyal; 05-29-2008 at 12:05 AM. |
05-29-2008, 12:39 AM | #48 |
creator of calibre
Posts: 43,857
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
05-29-2008, 08:13 AM | #49 | ||
Addict
Posts: 223
Karma: 356
Join Date: Aug 2007
Device: Rocket; Hiebook; N700; Sony 505; Kindle DX ...
|
Quote:
Unfortunately the test did not start correctly the 1st time, since I still had in the preferences the values "max recursions=1, link levels=1" I used yesterday for testing ... the result was an lrf file more than 6MB big! Speaking about it - you said: Quote:
Last point: one option I always find useful in webscrapers programs is the possibility to ask them to follow just links local to the starting website. This is still not possible in web2lrf, correct? Thanks for the help alessandro |
||
05-29-2008, 08:55 AM | #50 |
creator of calibre
Posts: 43,857
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
For links just one level away -r 1 should do the trick. You can easily ask the scraper to follow only links of a certain type using the --match-regexp option
|
05-30-2008, 03:08 AM | #51 | |
Addict
Posts: 223
Karma: 356
Join Date: Aug 2007
Device: Rocket; Hiebook; N700; Sony 505; Kindle DX ...
|
Quote:
I'm trying what you said, put in the bookit options "Max recursions=1", but I'm having trouble with regexps: from the site http://www.cityguide.travel-guides.c...rope/Lyon.html I wanted to follow all the internal links having "72" in the address: I tried putting Meta-data>Additional parameters "--match-regexp 72", "--match-regexp=72", "--match-regexp *72*", "--match-regexp=*72*", but none worked: it just saves the original page and that's all any hint? thanks... alessandro |
|
05-30-2008, 03:12 AM | #52 |
creator of calibre
Posts: 43,857
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
--match-regexp ".*72.*"
|
05-30-2008, 04:11 AM | #53 |
Addict
Posts: 223
Karma: 356
Join Date: Aug 2007
Device: Rocket; Hiebook; N700; Sony 505; Kindle DX ...
|
thanks but not, still not working.
The executed line is: /usr/bin/python /usr/bin/web2lrf -u http://www.cityguide.travel-guides.c...rope/Lyon.html -o /usr/src/bookit/Lyon City Guide _ Lyon City Break.lrf -t Lyon City Guide | Lyon City Break -a Bookit -r 1 --link-levels=0 --left-margin=0 --right-margin=0 --top-margin=0 --bottom-margin=0 --match-regexp .*72.* default and it still does not follow any link (all those visible on the right of the page) alessandro |
05-30-2008, 05:00 AM | #54 |
Wizard
Posts: 3,442
Karma: 300001
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
|
I think you need to escape the asterisks on command line.
|
05-30-2008, 11:53 AM | #55 |
creator of calibre
Posts: 43,857
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Try .+72.+ to avoid the asterisk problem
|
01-03-2009, 11:22 AM | #56 |
Connoisseur
Posts: 73
Karma: 120
Join Date: Apr 2008
Device: Sony Reader
|
Beowulf, thanks again for Bookit! I'm trying it in the newest Firefox, Beta 3.1b2 and it looks as though something has changed that is disallowing the plugin.
BTW, Beta 3.1b2 is very cool...it allows mousepad gestures for navigation on Macbooks! Dane |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
My "read" tag idea enhancement for Calibre idea | rcuadro | Calibre | 10 | 01-20-2011 04:23 PM |
Used "That Plugin" Now My Kindle Books... | Anarel | Plugins | 3 | 06-02-2010 01:32 PM |
Bookit plugin for Firefox | beowulf573 | Sony Reader | 365 | 01-27-2010 09:18 PM |
Microsoft Reader plugin "Read in" for Word doesn't load anymore | K-Thom | Reading and Management | 15 | 04-17-2009 05:52 AM |