View Full Version : More than 40 post(s) in one printable version?


soilwork
03-12-2008, 03:12 PM
Hello,

I wonder whether Mobileread can introduce an option to show one thread in ONE long printable version. Currently, the maximum number of posts displayed in one printable page is limited to 40. If the thread has more than 40 posts, we will have multiple printable pages and it is harder to convert the entire thread into an ebook format.

For example, if I found an interesting thread, I change it into a printable version (Thread Tools - Show Printable Version) and select 'Show 40 post(s) from this thread on one page'. Then I convert the page into LRF using web2lrf. However, this becomes bit more difficult with a thread having more than 40 posts. For instance, we need 5 printable pages to display all the posts in the following thread.
http://www.mobileread.com/forums/printthread.php?t=19142&pp=40

Since this site has many ebook users, I guess some users may use the printable pages to make an ebook file rather than print them on papers. If it is the case, I think it may be beneficial to have an option to display all the posts in one printable page.

If it is technically difficult, it is fine. However, if it can be implemented without much problem, such an option can be really helpful.

kovidgoyal
03-12-2008, 04:08 PM
You should be able to use recursion and --match-regexps with web2lrf to follow the links and convert the entire thread.

soilwork
03-13-2008, 12:19 AM
You should be able to use recursion and --match-regexps with web2lrf to follow the links and convert the entire thread.

Hi, Kovid

First of all, thanks for your excellent program. It makes using Sony Reader better than I ever expected.

BTW, I tried the method but I could not get the satisfactory result.
For example, starting from this link
http://www.mobileread.com/forums/printthread.php?t=19142&pp=40

I would like to include only the following links in addition to the original link.
<a class="smallfont" href="printthread.php?t=19142&amp;pp=40&amp;page=2" title="Show results 41 to 80 of 193">2</a>
<a class="smallfont" href="printthread.php?t=19142&amp;pp=40&amp;page=3" title="Show results 81 to 120 of 193">3</a>
<a class="smallfont" href="printthread.php?t=19142&amp;pp=40&amp;page=4" title="Show results 121 to 160 of 193">4</a>
<a class="smallfont" href="printthread.php?t=19142&amp;pp=40&amp;page=5" title="Show results 161 to 193 of 193">5</a>

However, the following link (self-referencing link) is always included in the printable form and it ended up included twice in the resulting LRF.
<a href="printthread.php?t=19142&amp;pp=40">Show 40 post(s) from this thread on one page</a>
Is there a way to include this link only once in LRF?

I tried this,
web2lrf -u "http://www.mobileread.com/forums/printthread.php?t=19142&pp=40" default -r 1 -t "Reading" -a "Mobileread" --link-levels=1 --ignore-tables --match-regexp="printthread"

and this
web2lrf -u "http://www.mobileread.com/forums/printthread.php?t=19142&pp=40" default -r 1 -t "Reading" -a "Mobileread" --link-levels=1 --ignore-tables --match-regexp="printthread" --link-exclude="printthread.php?t=19142&amp;pp=40$"

However, both of them give me the identical result. I would appreciate any pointer to improve this.

kovidgoyal
03-13-2008, 12:27 AM
--match-regexp printthread\S+page=\d+

soilwork
03-13-2008, 11:24 PM
--match-regexp printthread\S+page=\d+

Hi, Kovid,

Thanks for the tip. However, with the given regular expression, I got an error message looks as follows.

==============
D:\My_Documents\Download - Files\00>web2lrf -u "http://www.mobileread.com/forums
/printthread.php?t=19142&pp=40" default -r 1 -t "Reading" -a "Mobileread" --link
-levels=1 --ignore-tables --match-regexp printthread\S+page=\d+
Downloading
.
http://www.mobileread.com/forums/printthread.php?t=19142&pp=40 saved to
Traceback (most recent call last):
File "convert_from.py", line 194, in <module>
File "convert_from.py", line 188, in main
File "convert_from.py", line 165, in process_profile
WindowsError: [Error 123] The filename, directory name, or volume label syntax i
s incorrect: ''
===================

I am not familiar with HTML but the problem seems to occur since one of the link from the original url is identical to the url itself.
Using the example above, let me denote the original link as A and the links from the original url as B~F.

A. original URL: http://www.mobileread.com/forums/printthread.php?t=19142&pp=40
B. href="printthread.php?t=19142&amp;pp=40&amp;page=2 "
C. href="printthread.php?t=19142&amp;pp=40&amp;page=3 "
D. href="printthread.php?t=19142&amp;pp=40&amp;page=4 "
E. href="printthread.php?t=19142&amp;pp=40&amp;page=5 "
F. href="printthread.php?t=19142&amp;pp=40

The problem is F is identical to A. The regular expression seems to remove both A and F leading to the error message.

For now, I decided to use the following command.
web2lrf -u "http://www.mobileread.com/forums/printthread.php?t=19142&pp=40" default -r 1 -t "Reading" -a "Mobileread" --link-levels=1 --ignore-tables --match-regexp="printthread"
It gives me A-B-C-D-E-F(=A) rather than A-B-C-D-E, but I can read up to E and stop there. Since the file is an electronic one, there is no wasted paper anyway. :)

Again, thanks for your help and providing wonderful program to users.

Alexander Turcic
03-14-2008, 10:31 AM
Hi soilwork,

The limit could be changed to a higher number, but we still need to define a limit for performance reasons. So even if we pushed the limit up to display a maximum of posts of 80 per page, you'd still need to browse to a next page with larger threads.

On the positive side, when our mobile edition will be available, it'll be a lot easier to define regular expressions and parse content through mobile devices (including e-readers).

soilwork
03-14-2008, 07:41 PM
Hi soilwork,

The limit could be changed to a higher number, but we still need to define a limit for performance reasons. So even if we pushed the limit up to display a maximum of posts of 80 per page, you'd still need to browse to a next page with larger threads.

On the positive side, when our mobile edition will be available, it'll be a lot easier to define regular expressions and parse content through mobile devices (including e-readers).

Thank you very much for your detailed explanation. Since now I have a reasonably satisfactory solution as I posted above, I will use it until the mobile edition comes out. :)

soilwork
03-19-2008, 04:22 AM
Hi soilwork,

The limit could be changed to a higher number, but we still need to define a limit for performance reasons. So even if we pushed the limit up to display a maximum of posts of 80 per page, you'd still need to browse to a next page with larger threads.

On the positive side, when our mobile edition will be available, it'll be a lot easier to define regular expressions and parse content through mobile devices (including e-readers).

Hi, Alexander,

This is a slightly revised idea to address the same problem.

I wonder whether it is possible to remove the link 'Show 40 post(s) from this thread on one page' when the printable version is already showing 40 posts. For example,
http://www.mobileread.com/forums/printthread.php?t=22004
have a link to show 40 posts per page. After I click it, it becomes
http://www.mobileread.com/forums/printthread.php?t=22004&pp=40
However, this link still have the same link 'Show 40 post(s) from this thread on one page'.

If it is possible to remove this link, the problem I described above will disappear. Then, I can use the web2lrf command to make the printable page into LRF without any duplicate page.