![]() |
#2551 |
Junior Member
![]() Posts: 6
Karma: 10
Join Date: Aug 2010
Device: Kindle 3
|
Ok, so it appears that fixing the duplication issue that I received help with in this thread has also resolved the issue of some items not showing up. All the relevant items now appear, so if anyone is having issues, please see this thread:
https://www.mobileread.com/forums/showthread.php?t=96351 Last edited by gk_jam; 08-29-2010 at 02:32 PM. |
![]() |
![]() |
#2552 |
Addict
![]() Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
Find url where text =
If there is documentation on this I wasn't able to find it, so could someone help me out please. I wanna parse this website that doesn't have an rss feed. But it has a link under each of the articles to read the full article.
Code:
<a href="/blogs/hunting/2010/08/guest-blog-5-reasons-plant-food-plots-now">Read Full Post</a> My thoughts were something along the line of Spoiler:
so if i have links Code:
1. <a href="/blogs/test1">Read Full Post </a> 2. <a href="/blogs/test2">Read Full Post </a> thanks for the help |
![]() |
![]() |
#2553 | |
US Navy, Retired
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,897
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
|
Quote:
I think I see what your looking at now. You might want to start with the rss feed. Last edited by DoctorOhh; 08-29-2010 at 02:48 AM. |
|
![]() |
![]() |
#2554 | |
Addict
![]() Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
Quote:
![]() ![]() Reason I'd like to know is even on this page not all the feeds have feeds. More specifically have a look at http://www.fieldandstream.com/blogs and notice "The Wild Chef" takes you to feeds.feedburner.com and nothing else ![]() And the recipe blog was one of the main ones I wanted haha cause man gotta eat Last edited by TonytheBookworm; 08-29-2010 at 01:10 PM. Reason: added more info |
|
![]() |
![]() |
#2555 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
If all links were to rss feeds, you would use this page to manually get the feed links for your recipe, then the recipe would do all the work thereafter. Let's assume there are no RSS feeds. Then you would normally manually get all the other links from that page (and the title of the feed), and store them in a manually created dictionary of feed title and URL in your recipe. Each URL would be fed into parse_index. Each time one of those URLs was fed into parse_index, it would parse the page, find all article links and build a feed structure for the matching feed title/URL that would then be appended to the feed list and be passed back into the recipe. How you build the feed structure depends on the pages, but basically, you need: 'title' : article title, 'url' : URL of article 'date' : The publication date of the article as a string, 'description' : A summary of the article I suggest you search the recipes for "parse_index." There are dozens of examples of how this is done. Last edited by Starson17; 08-29-2010 at 02:49 PM. |
|
![]() |
![]() |
#2556 | |
Addict
![]() Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
Quote:
Spoiler:
In the example they used Code:
sectit = soup.find('h1', attrs={'class':'sectionTitle'}) but in my case I only have a href inside the h2 tags. ![]() ![]() Last edited by TonytheBookworm; 08-29-2010 at 04:37 PM. |
|
![]() |
![]() |
#2557 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Let me refer to my GoComics recipe, as I'm more familiar with it. Spoiler:
Above are the pairs of a title for a feed and a URL to scrape for articles. You would stick this in: Code:
(u"Wild Chef", u"http://www.fieldandstream.com/blogs/wild-chef"), You'd start with getting a soup for the url: soup = self.index_to_soup(url) then start scraping out the article urls and titles, etc. As you said, you have "href inside the h2 tags" the article title is really the string (NavigableString) inside the <a> tag. The url is the href atribute of the <a> tag (with a base URL stuck in front), and the summary is there too. All of those are easily obtained using Beautiful Soup from the soup of the url given above. Scrape the url, build your article list for that feed, then it gets returned to parse_index and the next feed gets processed, etc. I'm glad to see you working on a recipe (calibre-type) of recipes (food-type) - they're my favorite ![]() Last edited by Starson17; 08-29-2010 at 09:02 PM. |
|
![]() |
![]() |
#2558 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
|
|
![]() |
![]() |
#2559 | |
Addict
![]() Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
Quote:
Thanks for the information you posted. I will read try and read some more. Oh one more thing. when testing this thing to make sure it works without having to load it into calibre and run an wait. Do I use the same test command string you provided me with previously? Again I can't thank you and the others enough for helping me out on this. It is actually kinda fun. Frustrating at time when i run into something that I don't understand but other than that it is pretty fun to do. And I will more than likely be asking more questions but once I get it I hope to help instead of ask ![]() |
|
![]() |
![]() |
#2560 | ||||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
Quote:
Quote:
|
||||
![]() |
![]() |
#2561 |
Addict
![]() Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
hmm
Alright I looked at some samples and I also seen what you had done. I went the second method that you mentioned though about making my own links. Well, I thought I was obviously not working. Here is what I am up with. if you have the time could you look at this and kinda shed some more light on me. Thanks.
Spoiler:
|
![]() |
![]() |
#2562 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
soup = self.index_to_soup(url) where "url" was the url being passed to that function. In your code, you've taken the code that should have been in the called function and put it as the first line, but "url" isn't defined, so you never get a soup to work with. To write effectively, you need to use print statements to see what's happening. Put print 'the soup is: ', soup after the line to see what the soup is is and you'll see url is not yet defined and tehre is no soup. If you're not going to do it the way GoComics did it, I suspect you want: soup = self.index_to_soup("http://www.fieldandstream.com/blogs/wild-chef") However, doing it this way will only give you one feed - the one for Wild Chef. Doing it the way GoComics does will let you set up multiple feeds. |
|
![]() |
![]() |
#2563 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Spoiler:
Edit: Start with the above. It will give you the basic structure, since your code didn't appear to get to the page you needed to parse. The code above should get you there (check the printed soup to confirm in your output file). Once you have the soup being printed, we can work on the pseudocode. You should be able to adapt your own parsing code (as you posted) to replace the pseudocode above. Note that you can leave description and date blank for testing. You only need to parse a title (and you can even set that to a constant) and just parse out the article URL. Last edited by Starson17; 08-30-2010 at 10:57 AM. |
|
![]() |
![]() |
#2564 |
Junior Member
![]() Posts: 2
Karma: 10
Join Date: Aug 2010
Device: kindle dx
|
Help with recipe
Hello I tried to write a recipe for this site but failed miserable. (I'm learning python)
http://clipping.radiobras.gov.br/cli...psesDetail.php However, it seem that it will be easy since is a simple page. Can anyone help me? |
![]() |
![]() |
#2565 |
Addict
![]() Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
Starson17,
I was thinking the second method that you showed me was the method that was best suited for this situation. Actually, I wanted to learn both methods and have more tools/skills to work with in the future. Thanks for your continued support in this. I will work on what you have provided me with and get back with you when I have more questions. Once again I appreciate your time. |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Custom column read ? | pchrist7 | Calibre | 2 | 10-04-2010 02:52 AM |
Archive for custom screensavers | sleeplessdave | Amazon Kindle | 1 | 07-07-2010 12:33 PM |
How to back up preferences and custom recipes? | greenapple | Calibre | 3 | 03-29-2010 05:08 AM |
Donations for Custom Recipes | ddavtian | Calibre | 5 | 01-23-2010 04:54 PM |
Help understanding custom recipes | andersent | Calibre | 0 | 12-17-2009 02:37 PM |