Rss2Book - Page 10

geekraver · 02-28-2007, 03:43 AM

Sorry, this is a known bug that will be fixed in the next release. For the time being you can replace the writehtmldoc.dll with the one attached to this message.

Fubrite · 02-28-2007, 12:35 PM

Geekraver,

Thanks for the response, I tried that, (I renamed the old file to writeHtmlDoc.dll.old and copied and pasted the new file into the folder) and the option for HtmlDoc PDF disappeared from the options list completely.....

geekraver · 03-01-2007, 12:12 PM

Aargh - sorry, you're right, I made a small change to the plugin interface which broke compatibility.

Anyway, I have a proper fix in; I will release an update tonight with this fix and the new XWord/Crossword Compiler plugin.

Fubrite · 03-01-2007, 12:26 PM

Thanks for your efforts, geekraver!

I'm looking forward to the new release later on tonight!

geekraver · 03-02-2007, 02:42 AM

Okay, rel 22 is up (if all is well, your rel 21 should prompt you for an update automatically when you start it up).

I have added the new XWord plugin. However, it doesn't work with LRF output (not that much does with the existing Sony DLLs), nor does it seem to work with htmldoc. As far as I can tell this is a bug with htmldoc (at least the version I have); htmldoc seems to break on images that have local file:// URLs (and in fact if I invoke htmldoc with the --no-localfiles option, which is supposed to make it reject such URLs, I just get a usage error, so hrmldoc seems to have multiple issues here). The plugin will work with the built-in PDF and RTF converters, although you lose the two column layout of clues (until such time as I finally implement table support in these plugins).

Note that the URLs for sites that have crosswords generated by Crossword Compiler must have Grid.class applet elements or they won't work (confirm this by viewing the web page source from your browser). I've published one source so far that you can look at as an example; the URL is http://www.sundaytimes.co.za//Entert...crossword.html

LEE YONG HOON · 03-12-2007, 04:00 AM

Can't I use Rss2Book that is support Korean or Japanese?? T-T(..surpport Extended Unix Coding ( EUC ) 8-bit character encoding used primarily for Korean or Japanese...)
please...develop Rss2Book support Korean or Japanese..

drgnbear · 03-15-2007, 06:47 PM

This has to be one of the coolest tools I have found. Are there any writers out there writing ebook serials? It seems like it would be fun to set up a community based site doing just that. Publish it to RSS or something.

Hadrien · 03-15-2007, 07:07 PM

Quote:

Originally Posted by drgnbear

This has to be one of the coolest tools I have found. Are there any writers out there writing ebook serials? It seems like it would be fun to set up a community based site doing just that. Publish it to RSS or something.

Ebooks serial would be nice yes. Maybe I could do something like this too for Feedbooks ? We already provide an easy way for authors to publish their works, I could add an RSS feed for each author too.
It could work quite well too, writing directly the text inside the RSS feed and then generating the whole stuff with tools such as rss2book/web2book or what we're currently working on in our news section.
The text version of a podcast... Although podcast work with embedded files in the RSS feed instead of providing the content inside the RSS itself.

adinb · 03-16-2007, 02:20 AM

Has anyone else had problems getting the current version of web2book (v23, i believe) to "Apply extractor to linked content instead of link text"?

Here's the deets:

Code:

     URL: http://www.abqtrib.com/feeds/headlines/
     Link Element: link
     (apply extractor to linked content)
     Link Reformatter: {0}?printer=1/

So, I'm just appending "?printer=1/" to the original link found in the link element to try and make it go to the printer friendly page. Even though the log shows the link formatter coming up with the correct "printer friendly" links, the pdf output is the linked page. (example: the content of http://abqtrib.com/news/2007/mar/15/...th-dwi-charge/ is ending up in the pdf instead of http://abqtrib.com/news/2007/mar/15/...ge/?printer=1/)

This is all using the test function, so I haven't *absolutely* verified what will be put on my reader. But this really looks like it's not following the reformatted link. If there's a different preferred way of doing this (maybe something with the link extractor pattern?) I'd love to hear it. (I can probably extract text using the content reformatter, but then I miss small graphics accompanying the stories in print mode)

Log output (abbreviated):

Quote:

Processing Albuquerque Tribune Today
Got link from RSS: http://abqtrib.com/news/2007/mar/15/...th-dwi-charge/
Thu, 15 Mar 2007 22:05:00 -0000 is in range

Done link extraction{0} = http://abqtrib.com/news/2007/mar/15/...th-dwi-charge/
Reformatted link is http://abqtrib.com/news/2007/mar/15/...ge/?printer=1/

HTML of "normal" page follows (vice printer friendly page)

EDIT: Same problem reproduced multiple times, like on the The Reg, etc.

Also, there seems to be some problem using the "link" element on RSS .91 and ATOM feeds.

EDIT2: There also seems to be something funky going on evaluating regexp's with logical "OR"s in the ( this | that)

jezlyn · 03-27-2007, 12:47 PM

Hi, All. I can't seem to download the Web2book application from Geekraver's site. It's timing out. Is there a mirror for the application, or can it be uploaded directly to the forum? I'd love to try it out, considering I'm such an RSS feed junkie.

Thanks in advance for any help with this.

jezlyn · 03-27-2007, 06:21 PM

Anybody? I'd at least like to know if other people have been able to download the Web2Book app in the last couple days, so I know that the problem might just be with my network connection somehow. I've tried downloading the program at home and work and so far haven't been able to get a proper copy.

geekraver · 03-28-2007, 03:26 AM

Sorry, I've been upgrading my systems at home to Vista, and downloading lots of updated versions of software, etc, so the network has been up and down, and when its been up its been busy.

Anyway, you can get it from CNet now:

http://www.download.com/Web2book/300...ml?tag=lst-0-1

adinb · 03-29-2007, 03:57 AM

Now that I'm getting better with more complex .Net regex's, I can also articulate potential bug #2 a little more clearly to you:

-when the "apply extractor pattern to linked content" the Link Refomatter field is still using the groupings from the link element (i.e. guid, link, etc) and not the link extractor pattern.

I'll use "The Raw Story" as an example. It's a pretty basic RSS feed with the link element = 'link'. There's a printable version of each story, but you have to follow the link element and use the link extractor pattern on the followed link. (For this example I'll say that we grabbed 'http://rawstory.com/news/2007/Colbert_invites_Rom_Emanuel_on_show_0327.html')

On the followed link, I'll apply the regex "action='(http://rawstory.com/printstory.php\?story=\d+)'>" to snag the proper url for the printable version. With this regex, I should be able to make the link reformatter just {0} since I was able to pull the entire link. (yeah, I could optimize the regex, but I like 'em a little more readable, vice using backreferences, etc)

Looking in the log, the reformatted link ends up as "http://rawstory.com/news/2007/Colbert_invites_Rom_Emanuel_on_show_0327.html" instead of "http://rawstory.com/printstory.php?story=5513".

Doing a little more testing, if I move around the parens to make the regex "action='http://rawstory.com/printstory.php\?story=(\d+)'>" (which makes {0}=5513) and setting the the link reformatter field to "http://rawstory.com/printstory.php?story={0}" (which should again result in "http://rawstory.com/printstory.php?story=5513") results in the following reformatted link (copied from the log):
"http://rawstory.com/printstory.php?story=http://rawstory.com/news/2007/Colbert_invites_Rom_Emanuel_on_show_0327.html"

Which is why it initially looks like the extractor isn't being applied to linked content.

If there's just some sort of undocumented selector to force the link reformatter field to use the link extractor patter when following the link element, I'd ***love*** to see it.

shawn · 03-29-2007, 04:13 AM

Can someone please give me some advice on formatting a particular page?
This is the page:
http://www.econlib.org/library/Mises/msStoc.html

I don't know how to use the regex filter to properly format it, I think it's getting stuck on one link that creates a javascript popup. If I could tell it to just ignore those links I think it'll be fine.

This is the log text

Processing

System.ApplicationException: Getting web page http://www.econlib.org/library/Mises/javascript:shownotepad('/notepad.html#top');notepadwindow.focus(); returned error Got web exception The remote server returned an error: (404) Not Found.

at web2book.Utils.GetContent(String link, String html, String linkProcessor, String contentExtractor, String contentFormatter, Int32 depth, StringBuilder log)
at web2book.Utils.ExtractContent(String contentExtractor, String contentFormatter, String url, String html, String linkProcessor, Int32 depth, StringBuilder log)
at web2book.Utils.GetContent(String link, String html, String linkProcessor, String contentExtractor, String contentFormatter, Int32 depth, StringBuilder log)
at web2book.Utils.GetHtml(String url, Int32 numberOfDays, String linkProcessor, String contentExtractor, String contentFormatter, Int32 depth, StringBuilder log)
at web2book.WebPage.GetHtml(ISource mySourceGroup, Int32 displayWidth, Int32 displayHeight, Int32 displayDepth, StringBuilder log)
at web2book.MainForm.AddSource(ContentSourceList sourceClass, ContentSource source, Boolean isAutoUpdate)

adinb · 03-29-2007, 05:58 PM

@Shawn:
Are you trying to capture this as a web page and are you trying to follow all the links in the TOC?

The error that you're getting usually indicates an invalid regex. Give me a few more details and I'll pop out a regex for you.

02-28-2007, 03:43 AM	#136
geekraver Addict Posts: 364 Karma: 1035291 Join Date: Jul 2006 Location: Redmond, WA Device: iPad Mini,Kindle Paperwhite	Sorry, this is a known bug that will be fixed in the next release. For the time being you can replace the writehtmldoc.dll with the one attached to this message. Last edited by geekraver; 03-01-2007 at 12:10 PM.

03-15-2007, 06:47 PM	#142
drgnbear Evil alien overlord. Posts: 57 Karma: 10 Join Date: Jan 2007 Device: Kindle	Awesome This has to be one of the coolest tools I have found. Are there any writers out there writing ebook serials? It seems like it would be fun to set up a community based site doing just that. Publish it to RSS or something. Last edited by drgnbear; 03-15-2007 at 06:48 PM. Reason: spelling error

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
rss2book release 20 now available	geekraver	Sony Reader	4	01-26-2007 01:36 PM
rss2book release 19	geekraver	Sony Reader	2	12-30-2006 10:51 AM
rss2book release 18	geekraver	Sony Reader	0	12-22-2006 03:57 AM
rss2book release 16	geekraver	Sony Reader	1	12-13-2006 05:56 AM
rss2book release 13	geekraver	Sony Reader	0	11-13-2006 02:41 AM

02-28-2007, 12:35 PM	#137
Fubrite Junior Member Posts: 8 Karma: 10 Join Date: Feb 2007 Device: Sony Reader	Geekraver, Thanks for the response, I tried that, (I renamed the old file to writeHtmlDoc.dll.old and copied and pasted the new file into the folder) and the option for HtmlDoc PDF disappeared from the options list completely.....

03-01-2007, 12:12 PM	#138
geekraver Addict Posts: 364 Karma: 1035291 Join Date: Jul 2006 Location: Redmond, WA Device: iPad Mini,Kindle Paperwhite	Aargh - sorry, you're right, I made a small change to the plugin interface which broke compatibility. Anyway, I have a proper fix in; I will release an update tonight with this fix and the new XWord/Crossword Compiler plugin.

03-01-2007, 12:26 PM	#139
Fubrite Junior Member Posts: 8 Karma: 10 Join Date: Feb 2007 Device: Sony Reader	Thanks for your efforts, geekraver! I'm looking forward to the new release later on tonight!

03-02-2007, 02:42 AM	#140
geekraver Addict Posts: 364 Karma: 1035291 Join Date: Jul 2006 Location: Redmond, WA Device: iPad Mini,Kindle Paperwhite	Okay, rel 22 is up (if all is well, your rel 21 should prompt you for an update automatically when you start it up). I have added the new XWord plugin. However, it doesn't work with LRF output (not that much does with the existing Sony DLLs), nor does it seem to work with htmldoc. As far as I can tell this is a bug with htmldoc (at least the version I have); htmldoc seems to break on images that have local file:// URLs (and in fact if I invoke htmldoc with the --no-localfiles option, which is supposed to make it reject such URLs, I just get a usage error, so hrmldoc seems to have multiple issues here). The plugin will work with the built-in PDF and RTF converters, although you lose the two column layout of clues (until such time as I finally implement table support in these plugins). Note that the URLs for sites that have crosswords generated by Crossword Compiler must have Grid.class applet elements or they won't work (confirm this by viewing the web page source from your browser). I've published one source so far that you can look at as an example; the URL is http://www.sundaytimes.co.za//Entert...crossword.html

03-12-2007, 04:00 AM	#141
LEE YONG HOON Junior Member Posts: 2 Karma: 10 Join Date: Mar 2007 Device: SONY PRS-500	Can't I use Rss2Book that is support Korean or Japanese?? T-T(..surpport Extended Unix Coding ( EUC ) 8-bit character encoding used primarily for Korean or Japanese...) please...develop Rss2Book support Korean or Japanese..

03-27-2007, 12:47 PM	#145
jezlyn Junior Member Posts: 3 Karma: 10 Join Date: Mar 2007 Device: Sony Reader/T-Mobile MDA	Hi, All. I can't seem to download the Web2book application from Geekraver's site. It's timing out. Is there a mirror for the application, or can it be uploaded directly to the forum? I'd love to try it out, considering I'm such an RSS feed junkie. Thanks in advance for any help with this.

03-27-2007, 06:21 PM	#146
jezlyn Junior Member Posts: 3 Karma: 10 Join Date: Mar 2007 Device: Sony Reader/T-Mobile MDA	Anybody? I'd at least like to know if other people have been able to download the Web2Book app in the last couple days, so I know that the problem might just be with my network connection somehow. I've tried downloading the program at home and work and so far haven't been able to get a proper copy.

03-28-2007, 03:26 AM	#147
geekraver Addict Posts: 364 Karma: 1035291 Join Date: Jul 2006 Location: Redmond, WA Device: iPad Mini,Kindle Paperwhite	Sorry, I've been upgrading my systems at home to Vista, and downloading lots of updated versions of software, etc, so the network has been up and down, and when its been up its been busy. Anyway, you can get it from CNet now: http://www.download.com/Web2book/300...ml?tag=lst-0-1

03-29-2007, 03:57 AM	#148
adinb RSS &amp; Gadget Addict! Posts: 82 Karma: 67 Join Date: May 2005 Location: Albuquerque, NM Device: Sony PRS-500, iPod Touch, iPhone	Now that I'm getting better with more complex .Net regex's, I can also articulate potential bug #2 a little more clearly to you: -when the "apply extractor pattern to linked content" the Link Refomatter field is still using the groupings from the link element (i.e. guid, link, etc) and not the link extractor pattern. I'll use "The Raw Story" as an example. It's a pretty basic RSS feed with the link element = 'link'. There's a printable version of each story, but you have to follow the link element and use the link extractor pattern on the followed link. (For this example I'll say that we grabbed 'http://rawstory.com/news/2007/Colbert_invites_Rom_Emanuel_on_show_0327.html') On the followed link, I'll apply the regex "action='(http://rawstory.com/printstory.php\?story=\d+)'>" to snag the proper url for the printable version. With this regex, I should be able to make the link reformatter just {0} since I was able to pull the entire link. (yeah, I could optimize the regex, but I like 'em a little more readable, vice using backreferences, etc) Looking in the log, the reformatted link ends up as "http://rawstory.com/news/2007/Colbert_invites_Rom_Emanuel_on_show_0327.html" instead of "http://rawstory.com/printstory.php?story=5513". Doing a little more testing, if I move around the parens to make the regex "action='http://rawstory.com/printstory.php\?story=(\d+)'>" (which makes {0}=5513) and setting the the link reformatter field to "http://rawstory.com/printstory.php?story={0}" (which should again result in "http://rawstory.com/printstory.php?story=5513") results in the following reformatted link (copied from the log): "http://rawstory.com/printstory.php?story=http://rawstory.com/news/2007/Colbert_invites_Rom_Emanuel_on_show_0327.html" Which is why it initially looks like the extractor isn't being applied to linked content. If there's just some sort of undocumented selector to force the link reformatter field to use the link extractor patter when following the link element, I'd *love* to see it.

03-29-2007, 04:13 AM	#149
shawn Junior Member Posts: 8 Karma: 10 Join Date: Mar 2007	Can someone please give me some advice on formatting a particular page? This is the page: http://www.econlib.org/library/Mises/msStoc.html I don't know how to use the regex filter to properly format it, I think it's getting stuck on one link that creates a javascript popup. If I could tell it to just ignore those links I think it'll be fine. This is the log text Processing System.ApplicationException: Getting web page http://www.econlib.org/library/Mises/javascript:shownotepad('/notepad.html#top');notepadwindow.focus(); returned error Got web exception The remote server returned an error: (404) Not Found. at web2book.Utils.GetContent(String link, String html, String linkProcessor, String contentExtractor, String contentFormatter, Int32 depth, StringBuilder log) at web2book.Utils.ExtractContent(String contentExtractor, String contentFormatter, String url, String html, String linkProcessor, Int32 depth, StringBuilder log) at web2book.Utils.GetContent(String link, String html, String linkProcessor, String contentExtractor, String contentFormatter, Int32 depth, StringBuilder log) at web2book.Utils.GetHtml(String url, Int32 numberOfDays, String linkProcessor, String contentExtractor, String contentFormatter, Int32 depth, StringBuilder log) at web2book.WebPage.GetHtml(ISource mySourceGroup, Int32 displayWidth, Int32 displayHeight, Int32 displayDepth, StringBuilder log) at web2book.MainForm.AddSource(ContentSourceList sourceClass, ContentSource source, Boolean isAutoUpdate)

Advert

Advert

03-29-2007, 05:58 PM	#150
adinb RSS &amp; Gadget Addict! Posts: 82 Karma: 67 Join Date: May 2005 Location: Albuquerque, NM Device: Sony PRS-500, iPod Touch, iPhone	@Shawn: Are you trying to capture this as a web page and are you trying to follow all the links in the TOC? The error that you're getting usually indicates an invalid regex. Give me a few more details and I'll pop out a regex for you.