Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Readers > Sony Reader

Notices

Reply
 
Thread Tools Search this Thread
Old 05-28-2008, 10:39 PM   #46
beowulf573
Addict
beowulf573 once ate a cherry pie in a record 7 seconds.beowulf573 once ate a cherry pie in a record 7 seconds.beowulf573 once ate a cherry pie in a record 7 seconds.beowulf573 once ate a cherry pie in a record 7 seconds.beowulf573 once ate a cherry pie in a record 7 seconds.beowulf573 once ate a cherry pie in a record 7 seconds.beowulf573 once ate a cherry pie in a record 7 seconds.beowulf573 once ate a cherry pie in a record 7 seconds.beowulf573 once ate a cherry pie in a record 7 seconds.beowulf573 once ate a cherry pie in a record 7 seconds.beowulf573 once ate a cherry pie in a record 7 seconds.
 
beowulf573's Avatar
 
Posts: 208
Karma: 1523
Join Date: Jul 2007
Location: Houston,TX
Device: PRS-T1
1) To be honest I'm not 100% clear either, it has to do with how deep and how many links are followed to create the final file.

2) I see the same thing when executing web2lrf from the command line. I didn't see anything obviously unusual about the links.
beowulf573 is offline   Reply With Quote
Old 05-28-2008, 11:53 PM   #47
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,246
Karma: 4961457
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
max-recursions controls the number of levels of recursion when downloading and link-levels controls the number of elvels of recursion when converting from HTML to LRF

There are two options because there are actually two separate components under the hood that do the downloading and converting respectively

The wikipedia links show up fine for me
Attached Files
File Type: lrf Default Profile [Wed 28 May 2008].lrf (123.4 KB, 124 views)

Last edited by kovidgoyal; 05-29-2008 at 12:05 AM.
kovidgoyal is offline   Reply With Quote
 
Enthusiast
Old 05-29-2008, 12:39 AM   #48
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,246
Karma: 4961457
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by beowulf573 View Post
2) I see the same thing when executing web2lrf from the command line. I didn't see anything obviously unusual about the links.
Actually, there was a regression introduced in 0.4.61 causing this. Fixed in 0.4.63
kovidgoyal is offline   Reply With Quote
Old 05-29-2008, 08:13 AM   #49
alexxxm
Addict
alexxxm has a complete set of Star Wars action figures.alexxxm has a complete set of Star Wars action figures.alexxxm has a complete set of Star Wars action figures.alexxxm has a complete set of Star Wars action figures.
 
Posts: 205
Karma: 356
Join Date: Aug 2007
Device: Rocket; Hiebook; N700; Sony 505; Kindle DX ...
Quote:
Originally Posted by kovidgoyal View Post
Actually, there was a regression introduced in 0.4.61 causing this. Fixed in 0.4.63
perfect, I just checked and now the link text appears - as it should be.

Unfortunately the test did not start correctly the 1st time, since I still had in the preferences the values "max recursions=1, link levels=1" I used yesterday for testing ... the result was an lrf file more than 6MB big!

Speaking about it - you said:
Quote:
Originally Posted by kovidgoyal
max-recursions controls the number of levels of recursion when downloading and link-levels controls the number of elvels of recursion when converting from HTML to LRF
I'm still not so sure I understand it - so let's try a typical test case: if I wanted to get an HTML page, plus all the others just 1 link away and no more, which values should I set?

Last point: one option I always find useful in webscrapers programs is the possibility to ask them to follow just links local to the starting website. This is still not possible in web2lrf, correct?


Thanks for the help

alessandro
alexxxm is offline   Reply With Quote
Old 05-29-2008, 08:55 AM   #50
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,246
Karma: 4961457
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
For links just one level away -r 1 should do the trick. You can easily ask the scraper to follow only links of a certain type using the --match-regexp option
kovidgoyal is offline   Reply With Quote
Old 05-30-2008, 03:08 AM   #51
alexxxm
Addict
alexxxm has a complete set of Star Wars action figures.alexxxm has a complete set of Star Wars action figures.alexxxm has a complete set of Star Wars action figures.alexxxm has a complete set of Star Wars action figures.
 
Posts: 205
Karma: 356
Join Date: Aug 2007
Device: Rocket; Hiebook; N700; Sony 505; Kindle DX ...
Quote:
Originally Posted by kovidgoyal View Post
For links just one level away -r 1 should do the trick. You can easily ask the scraper to follow only links of a certain type using the --match-regexp option
I'm still asking you here even thou I just discovered the other thread on "content" - I'll move there once I'm clarified with this:

I'm trying what you said, put in the bookit options "Max recursions=1", but I'm having trouble with regexps:
from the site http://www.cityguide.travel-guides.c...rope/Lyon.html

I wanted to follow all the internal links having "72" in the address:

I tried putting Meta-data>Additional parameters
"--match-regexp 72", "--match-regexp=72", "--match-regexp *72*", "--match-regexp=*72*", but none worked: it just saves the original page and that's all

any hint?

thanks...

alessandro
alexxxm is offline   Reply With Quote
Old 05-30-2008, 03:12 AM   #52
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,246
Karma: 4961457
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
--match-regexp ".*72.*"
kovidgoyal is offline   Reply With Quote
Old 05-30-2008, 04:11 AM   #53
alexxxm
Addict
alexxxm has a complete set of Star Wars action figures.alexxxm has a complete set of Star Wars action figures.alexxxm has a complete set of Star Wars action figures.alexxxm has a complete set of Star Wars action figures.
 
Posts: 205
Karma: 356
Join Date: Aug 2007
Device: Rocket; Hiebook; N700; Sony 505; Kindle DX ...
Quote:
Originally Posted by kovidgoyal View Post
--match-regexp ".*72.*"
thanks but not, still not working.
The executed line is:

/usr/bin/python /usr/bin/web2lrf -u http://www.cityguide.travel-guides.c...rope/Lyon.html -o /usr/src/bookit/Lyon City Guide _ Lyon City Break.lrf -t Lyon City Guide | Lyon City Break -a Bookit -r 1 --link-levels=0 --left-margin=0 --right-margin=0 --top-margin=0 --bottom-margin=0 --match-regexp .*72.* default

and it still does not follow any link (all those visible on the right of the page)

alessandro
alexxxm is offline   Reply With Quote
Old 05-30-2008, 05:00 AM   #54
igorsk
Wizard
igorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfolded
 
Posts: 3,443
Karma: 52235
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
I think you need to escape the asterisks on command line.
igorsk is offline   Reply With Quote
Old 05-30-2008, 11:53 AM   #55
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,246
Karma: 4961457
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Try .+72.+ to avoid the asterisk problem
kovidgoyal is offline   Reply With Quote
Old 01-03-2009, 11:22 AM   #56
dsuden
Connoisseur
dsuden doesn't litterdsuden doesn't litter
 
Posts: 73
Karma: 120
Join Date: Apr 2008
Device: Sony Reader
Beowulf, thanks again for Bookit! I'm trying it in the newest Firefox, Beta 3.1b2 and it looks as though something has changed that is disallowing the plugin.

BTW, Beta 3.1b2 is very cool...it allows mousepad gestures for navigation on Macbooks!

Dane
dsuden is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
My "read" tag idea enhancement for Calibre idea rcuadro Calibre 10 01-20-2011 04:23 PM
Used "That Plugin" Now My Kindle Books... Anarel Plugins 3 06-02-2010 01:32 PM
Bookit plugin for Firefox beowulf573 Sony Reader 365 01-27-2010 09:18 PM
Microsoft Reader plugin "Read in" for Word doesn't load anymore K-Thom Reading and Management 15 04-17-2009 05:52 AM


All times are GMT -4. The time now is 04:42 AM.


MobileRead.com is a privately owned, operated and funded community.