![]() |
#2581 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
One is the ability to search defined folders and files. including subdirs for certain text, then open one or more of the located files. I often search *.recipe files in the resource directory for "keep_only" or "parse_index," etc, to see how other working recipes used those commands. The second feature is having multiple files open for editing. I keep my recipe, my batch file for executing my recipe and my output error file all open. The last feature is the ability to execute a batch file with a single keystroke. I have the batch file for executing the recipe connected to that key. Modify recipe, save it, hit execute, read errors in error file, rinse and repeat. I believe notepad++ is free and will do some of the above. |
|
![]() |
![]() |
#2582 |
Addict
![]() Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
Hey Starson17 trying to apply what you had showed me on field and streams yet still a little confused.
Trying to play around with that http://www.laineygossip.com/ for the other user. I can get the other articles just fine using the methods that you showed me but I'm having trouble getting the ones that are not inside the <h2> tags. More specifically look at http://www.laineygossip.com/ Notice how it has the date then it goes dear gossipers, blah blah blah well my thoughts were to take and do this to get the those articles then append it to the array then do another for loop to get the other articles that follow a different criteria here is what i'm having an issue with Spoiler:
the articles are contained in the div class=leftcontent and the title is inside a h1 tag there. then i figured since i was inside the leftcontent due to the for look then i would then take and do another findall for the artIntroShort then parse it for the url and the article text that is in the <p> tag..... here is the whole code i have thus far ![]() Spoiler:
I know i'm close to getting this yet seem so far away. |
![]() |
Advert | |
|
![]() |
#2583 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Here is what I gave you last time. Why doesn't that work?
Spoiler:
In line 1 it finds all the <h2> tags. In line 2 it looks at each one to decide if there is an <a> tag inside. In line 3, if there was an <a> tag found, it proceeds to do what needs to be done (look at the code I gave you again). I looked at the http://www.laineygossip.com/ page and it seems to have the same structure, with <a> tags (having the link you want) inside <h2> tags. |
![]() |
![]() |
#2584 | |
Addict
![]() Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
Quote:
this is the part that is throwing me note it is not in the <h2> and <a> like the rest of the page is. I hope that explains what I mean. Hope I'm not bugging you on this. If so just say so and i'll chill Spoiler:
|
|
![]() |
![]() |
#2585 | ||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
I did.
Quote:
Quote:
Edit: looking back at your code, I see that's sort of what you did, but you have an extra for loop layer at the leftcontent that I don't think you need. Last edited by Starson17; 08-31-2010 at 09:11 PM. |
||
![]() |
Advert | |
|
![]() |
#2586 | |
Addict
![]() Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
Quote:
Basically this works fine to get the none <h2> stuff with the exception of the title: Spoiler:
my question is how would i get something like this to work ? Spoiler:
of course I have the return statements and all but this is the block that i'm concerned about and thanks also. i'm noticing that there are <span> tags inside the <p> tags so when i do for a search for the <a> inside the <p> i get the dang links for the ads instead of the last <a> tag... this one i tell you is really working the brain. be interesting how this works out... I lookeed at the output log and notice like i said it keeps making the url is: to the ad.doubleclick thing that is inside the <span> i tried taking and doing a remove_tags on that tag but apparently it doesn't remove the tag till after it goes through the parsing. Last edited by TonytheBookworm; 08-31-2010 at 11:29 PM. Reason: added more info |
|
![]() |
![]() |
#2587 | ||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
I'll leave you to play with that. I'm sure a closer look at your code and the page you're scraping would let me make better comments, but I'm short on time today. Good Luck! |
||
![]() |
![]() |
#2588 |
Junior Member
![]() Posts: 6
Karma: 10
Join Date: Sep 2010
Device: entourage edge
|
request of Milenio recipe
Hi!
I'd like to ask if someone has a recipe for Milenio Diario (mexican newspaper, http://impreso.milenio.com/Nacional/) Opinion articles are not included in the RSS feeds, but I'd like them in the recipe... Thanks a lot! Cheers |
![]() |
![]() |
#2589 | |
Addict
![]() Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
Quote:
Issue I'm having: 1) For whatever reason I always get a full run of the whole page as an article not sure why this is unless it searches for artIntroShort and then the <a> tags and doesn't find any (the webmaster isn't consistent) so as a result My guess is somehow (I can't seem to find it in my output log) BUT it takes and link['href'] ends up being NONE so the url ends up just being the INDEX. 2) This one is really the one that is puzzling me the most. I also see the person that asked for someone to help on this recipe faced a similar problem with the xml (that is why i didn't use the feed was trying this method to get the thumbnails). but for some reason The thumbnails don't come through. I looked in firebug and they appear to be wrapped inside the mainContent tag. I even went as far as taking and commenting out the keep only tags and was faced with the same results. Anyway, whenever you get some free time have a look at this if you don't mind. thanks!!! Attached: Code that gets articles but has issues |
|
![]() |
![]() |
#2590 |
Addict
![]() Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
Forgive me for asking so many questions. Pretty much the only way I know to learn. With that being said. I was wondering how would one parse a website that does something like this
Article Content then has a pagenation to go to the rest of the article then continue to the rest of the article and yet keep it all in one article? Basically lets say you had page 1: blah blah blah test blah blah next page page 2: more stuff for same article next page how would you do that? My first guess would be using parse_index() then somehow call the article up and get the articlecontent then somehow take and do a find to get the <a> inside that article then get the content and append it to that article? To get a better idea of what I'm talking about have a look at: http://auto.howstuffworks.com/under-...-insurance.htm which is part of the http://feeds.feedburner.com/Howstuff...ffDailyRssFeed feed notice how it shows kinda like a description if you will then next page then shows more then next page and so forth? I think once I get some general templates on how this stuff works that (i can understand) then I'll be fine. |
![]() |
![]() |
#2591 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Briefly, in multipage you use BeautifulSoup to grab each subsequent page by following the "next page" links and you append them all into the soup for the first page to make a large single BS object. Search this thread for "multipage." Look at the discussion I had with "rty" to see some examples. Search the builtin recipes for "append_page" or search here for that and you will find many examples of how-to. |
|
![]() |
![]() |
#2592 | ||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
1) If something isn't appearing, make sure your own keep_only or remove_tags aren't stripping it. Try to get it to appear with all the other junk. 2) Maybe it's being removed with removal of scripting. Look at the page source to see. Try leaving scripts on in your test recipe. 3) If it still looks like the item should be picked up, sometimes the site is protecting the image from scraping. You may need to have the correct useragent, the correct cookie, the correct referer header, etc. FireFox and TamperData help here. There are techniques for simulating each of these. I try to get FireFox to act like Calibre (or vice-versa) to verify. The bottom line is that if FireFox can see it, so can your recipe. |
||
![]() |
![]() |
#2593 |
Addict
![]() Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
Been looking at the AventureGamer code and I have a few questions.
Spoiler:
and here is my painful attempt Spoiler:
|
![]() |
![]() |
#2594 | ||||
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
Quote:
Quote:
Quote:
Quote:
|
||||
![]() |
![]() |
#2595 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
|
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Custom column read ? | pchrist7 | Calibre | 2 | 10-04-2010 02:52 AM |
Archive for custom screensavers | sleeplessdave | Amazon Kindle | 1 | 07-07-2010 12:33 PM |
How to back up preferences and custom recipes? | greenapple | Calibre | 3 | 03-29-2010 05:08 AM |
Donations for Custom Recipes | ddavtian | Calibre | 5 | 01-23-2010 04:54 PM |
Help understanding custom recipes | andersent | Calibre | 0 | 12-17-2009 02:37 PM |