View Single Post
Old 03-12-2008, 11:19 PM   #3
soilwork
useR!
soilwork will become famous soon enoughsoilwork will become famous soon enoughsoilwork will become famous soon enoughsoilwork will become famous soon enoughsoilwork will become famous soon enoughsoilwork will become famous soon enough
 
soilwork's Avatar
 
Posts: 299
Karma: 651
Join Date: Nov 2007
Location: NY
Device: Onyx Boox Max 2, Kobo Libra H2O, iRiver Story HD
Quote:
Originally Posted by kovidgoyal View Post
You should be able to use recursion and --match-regexps with web2lrf to follow the links and convert the entire thread.
Hi, Kovid

First of all, thanks for your excellent program. It makes using Sony Reader better than I ever expected.

BTW, I tried the method but I could not get the satisfactory result.
For example, starting from this link
https://www.mobileread.com/forums/pri...?t=19142&pp=40

I would like to include only the following links in addition to the original link.
<a class="smallfont" href="printthread.php?t=19142&amp;pp=40&amp;page=2 " title="Show results 41 to 80 of 193">2</a>
<a class="smallfont" href="printthread.php?t=19142&amp;pp=40&amp;page=3 " title="Show results 81 to 120 of 193">3</a>
<a class="smallfont" href="printthread.php?t=19142&amp;pp=40&amp;page=4 " title="Show results 121 to 160 of 193">4</a>
<a class="smallfont" href="printthread.php?t=19142&amp;pp=40&amp;page=5 " title="Show results 161 to 193 of 193">5</a>

However, the following link (self-referencing link) is always included in the printable form and it ended up included twice in the resulting LRF.
<a href="printthread.php?t=19142&amp;pp=40">Show 40 post(s) from this thread on one page</a>
Is there a way to include this link only once in LRF?

I tried this,
Code:
web2lrf -u "https://www.mobileread.com/forums/printthread.php?t=19142&pp=40" default -r 1 -t "Reading" -a "Mobileread" --link-levels=1 --ignore-tables --match-regexp="printthread"
and this
Code:
web2lrf -u "https://www.mobileread.com/forums/printthread.php?t=19142&pp=40" default -r 1 -t "Reading" -a "Mobileread" --link-levels=1 --ignore-tables --match-regexp="printthread" --link-exclude="printthread.php?t=19142&amp;pp=40$"
However, both of them give me the identical result. I would appreciate any pointer to improve this.
soilwork is offline   Reply With Quote