Rss2Book - Page 3

Kaitou Ace · 11-02-2006, 10:38 AM

Any way I can use this now to get Slate?
http://www.slate.com/rss/ has every link in the form of

http://www.slate.com/id/2152452/fr/rss/
and
http://www.slate.com/toolbar.aspx?ac...ead&id=2152452
is the print version of each of them. Or maybe could the program also be able to parse html files, and just get every link to a particular link depth, with matching urls? That'd be an amazing feature also.

It is impressive software, and I am looking forward to what the next version will offer

heavyB · 11-03-2006, 06:07 PM

Absolutly fantastic Geekraver! At the rate you're pumping out revisions, I'm unsure I should start posting my feed profiles or wait for the XML imports

geekraver · 11-03-2006, 06:13 PM

I should have the XML version done tonight, so you may as well wait.

heavyB · 11-03-2006, 06:37 PM

Quote:

Originally Posted by geekraver

I should have the XML version done tonight, so you may as well wait.

Wow! I was half joking

For those who are confused about regular expressions, check out:
http://www.amk.ca/python/howto/regex/

You only need concern yourself with the first couple pages of this tutorial to get down what you need to use geekraver's powerful app. It's easy and actually pretty fun.

neilm2 · 11-04-2006, 12:20 PM

O.K., now I'm falling behind on my books because my Reader is becoming an all-purpose book-blog-newspaper-magazine thing.

vranghel · 11-04-2006, 02:30 PM

I have a question for geekraver.
I want to make a PDF of the artcles from www.damninteresting.com.
I put 'link' for the 'link element' field, and it works but i get EVERYTHING on that page: links(menu), article, comments. Is there a way to set the program to only get the article?

I read all your instructions but i didnt understand all of it. So if you can please enlighten me

I attached an example pdf to better understand what i mean.

geekraver · 11-04-2006, 02:59 PM

You need to filter the article content, which is done by the 'Content Extraction Pattern'. This will work:

(<div id="post.*)<div class="postMetaData">

Alternatively, import the attached xml file.

The stuff that gets included is the stuff in parentheses, so this pattern says include everything starting from the first occurrence of '<div id="post' up to but not including the last occurrence of '<div class="postMetaData>'.

The .* matches any text of zero or more characters. The match is 'greedy'; i.e. as much text as possible gets matched, which is why we start with the FIRST occurence of '<div id="post' and end with the LAST occurence of '<div class="postMetaData'. There's probably only one occurence of each anyway but its worth mentioning the greedy aspect as it can cause confusion.

When experimenting with the patterns, use the RegExp Helper under the Tools menu. You can paste the web page HTML source into the Input box, then enter different patterns in the RegExp textbox. Click on Test and you will be shown the text that matches the whole pattern and the text that matches the parenthesized part of the pattern (i.e. the ultimately important stuff).

geekraver · 11-04-2006, 03:09 PM

Quote:

Originally Posted by neilm2

O.K., now I'm falling behind on my books because my Reader is becoming an all-purpose book-blog-newspaper-magazine thing.

That's largely why I bought a reader; I rarely get to read actual books and there aren't many that interest me at the Connect store yet anyway.

vranghel · 11-04-2006, 05:10 PM

Wow! Works great geekraver! Thanks a lot!

One other question: how far back can your program pull articles from?
I tried getting articles as far back as 100 days but i only got 30 days worth of articles.
Is there a 1 month limit?

geekraver · 11-04-2006, 05:42 PM

Quote:

Originally Posted by vranghel

Wow! Works great geekraver! Thanks a lot!

One other question: how far back can your program pull articles from?
I tried getting articles as far back as 100 days but i only got 30 days worth of articles.
Is there a 1 month limit?

It depends on the site. I only pull things that are in the RSS feed at the specified URL that fall within the range specified. If the site only includes the last 30 days in the RSS, then that's what you'll be limited to.

vranghel · 11-04-2006, 06:05 PM

oh! i undestand...so it's not the program, it's the feed that's the limiting factor.

geekraver · 11-04-2006, 06:49 PM

Okay, so I ended up making another release again already. I wanted to fix a couple of issues (web exceptions on a single entry preventing a whole feed from being handled, for example). I also found that I often ended up running in the debugger to understand why a feed didn't work; to make it easier for others I added a detailed test log window; if you click on Test then when the program is done testing the feed this window will pop up with lots of info about what happened.

vranghel · 11-04-2006, 08:54 PM

I wish iRex would release software as least 10 times slower than you do.

Keep up the great work!

heavyB · 11-04-2006, 10:00 PM

I sync with rss2book every morning, pour a cup of coffee, and read in natural sunlight the articles your app makes possible. Thanks for taking the time to not only make this, but improve on it so quickly. No small effort!

I think we should start a thread for nothing but xml imports. I get all mine together and if it hasn't happened yet, I'll start one.

geekraver · 11-05-2006, 12:41 AM

I've been thinking about what the best approach is for collecting them. There are various options:

- I could collect them and put them on my website
- I could keep adding them as attachments in the initial post; that may become unwieldy
- I could keep adding them to a single big Xml file that is kept with the initial post
- we could use the wiki
- we could just keep them on a thread

The main drawbacks to the last approach seem to be the haphazard organization that would result. Right now it seems like the wiki might be the best approach, and I can roll up the submissions on occasion into a single file and attach that to the first post.

So I've started a page at https://wiki.mobileread.com/wiki/Xml_feed_files

11-04-2006, 06:49 PM	#42
geekraver Addict Posts: 364 Karma: 1035291 Join Date: Jul 2006 Location: Redmond, WA Device: iPad Mini,Kindle Paperwhite	Release 8... Okay, so I ended up making another release again already. I wanted to fix a couple of issues (web exceptions on a single entry preventing a whole feed from being handled, for example). I also found that I often ended up running in the debugger to understand why a feed didn't work; to make it easier for others I added a detailed test log window; if you click on Test then when the program is done testing the feed this window will pop up with lots of info about what happened.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
rss2book release 20 now available	geekraver	Sony Reader	4	01-26-2007 01:36 PM
rss2book release 19	geekraver	Sony Reader	2	12-30-2006 10:51 AM
rss2book release 18	geekraver	Sony Reader	0	12-22-2006 03:57 AM
rss2book release 16	geekraver	Sony Reader	1	12-13-2006 05:56 AM
rss2book release 13	geekraver	Sony Reader	0	11-13-2006 02:41 AM

11-02-2006, 10:38 AM	#31
Kaitou Ace Zealot Posts: 126 Karma: 1352743 Join Date: Oct 2002	Any way I can use this now to get Slate? http://www.slate.com/rss/ has every link in the form of http://www.slate.com/id/2152452/fr/rss/ and http://www.slate.com/toolbar.aspx?ac...ead&id=2152452 is the print version of each of them. Or maybe could the program also be able to parse html files, and just get every link to a particular link depth, with matching urls? That'd be an amazing feature also. It is impressive software, and I am looking forward to what the next version will offer

11-03-2006, 06:07 PM	#32
heavyB Member Posts: 23 Karma: 47 Join Date: Oct 2006 Device: Sony Reader/Treo 600	Absolutly fantastic Geekraver! At the rate you're pumping out revisions, I'm unsure I should start posting my feed profiles or wait for the XML imports

11-03-2006, 06:13 PM	#33
geekraver Addict Posts: 364 Karma: 1035291 Join Date: Jul 2006 Location: Redmond, WA Device: iPad Mini,Kindle Paperwhite	I should have the XML version done tonight, so you may as well wait.

11-04-2006, 12:20 PM	#35
neilm2 Enthusiast Posts: 35 Karma: 12 Join Date: Oct 2006 Device: Amazon Kindle, Sony Reader	O.K., now I'm falling behind on my books because my Reader is becoming an all-purpose book-blog-newspaper-magazine thing.

11-04-2006, 05:10 PM	#39
vranghel Addict Posts: 285 Karma: 10 Join Date: Apr 2006 Location: Vancouver, Canada Device: Proud Iliad owner	Wow! Works great geekraver! Thanks a lot! One other question: how far back can your program pull articles from? I tried getting articles as far back as 100 days but i only got 30 days worth of articles. Is there a 1 month limit?

11-04-2006, 06:05 PM	#41
vranghel Addict Posts: 285 Karma: 10 Join Date: Apr 2006 Location: Vancouver, Canada Device: Proud Iliad owner	oh! i undestand...so it's not the program, it's the feed that's the limiting factor.

11-04-2006, 08:54 PM	#43
vranghel Addict Posts: 285 Karma: 10 Join Date: Apr 2006 Location: Vancouver, Canada Device: Proud Iliad owner	I wish iRex would release software as least 10 times slower than you do. Keep up the great work!

11-04-2006, 10:00 PM	#44
heavyB Member Posts: 23 Karma: 47 Join Date: Oct 2006 Device: Sony Reader/Treo 600	I sync with rss2book every morning, pour a cup of coffee, and read in natural sunlight the articles your app makes possible. Thanks for taking the time to not only make this, but improve on it so quickly. No small effort! I think we should start a thread for nothing but xml imports. I get all mine together and if it hasn't happened yet, I'll start one.

11-05-2006, 12:41 AM	#45
geekraver Addict Posts: 364 Karma: 1035291 Join Date: Jul 2006 Location: Redmond, WA Device: iPad Mini,Kindle Paperwhite	I've been thinking about what the best approach is for collecting them. There are various options: - I could collect them and put them on my website - I could keep adding them as attachments in the initial post; that may become unwieldy - I could keep adding them to a single big Xml file that is kept with the initial post - we could use the wiki - we could just keep them on a thread The main drawbacks to the last approach seem to be the haphazard organization that would result. Right now it seems like the wiki might be the best approach, and I can roll up the submissions on occasion into a single file and attach that to the first post. So I've started a page at https://wiki.mobileread.com/wiki/Xml_feed_files

Advert

Advert