03-07-2011, 09:53 PM | #16 |
Member
Posts: 19
Karma: 10
Join Date: Feb 2011
Device: kindle
|
I'm getting worse at this, and feeling fatigued at trying. I can't get this to do anything whatsoever. The command prompt won't print my variables, won't download the articles... it just is a failure. I still would like to step through and see where it's failing. I'm about done trying
Code:
def preprocess_html(self,soup): for pix in soup.findAll('img'): next_tag=tag(soup, soup.body.nextSibling.name) new_tag=tag(soup,'p') new_tag.insert(0,pix) next_tag.insert(0,new_tag) return soup |
03-08-2011, 09:17 AM | #17 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Try this:
Spoiler:
I used previousSibling to find the span tag that preceded the img tag. Since the span tag had useful text (the date), and was still in the soup, I used it as the marker and just put the img tag into it, after putting it into a p tag. I didn't look closely at your code, but I did see it used "tag" instead of "Tag." Note the imports and the print, which you can comment out with "#". Last edited by Starson17; 03-08-2011 at 09:22 AM. |
Advert | |
|
03-09-2011, 05:51 PM | #18 |
Member
Posts: 19
Karma: 10
Join Date: Feb 2011
Device: kindle
|
Starson,
Thanks for helping with that. I was stuck, and I doubt I would have gotten it figured out. This is why you're a wizard and I'm a Jr. Member. Another questions, if you don't mind: How do I do a similar thing with other tags? I tried adding another for loop before the return soup, and it didn't want to take it. Can you call the preprocess_html twice, or what do you do? I know you didn't teach me to fish, but maybe I can help take it off the hook once you reel it in. |
03-09-2011, 08:55 PM | #19 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
|
03-12-2011, 02:00 PM | #20 |
Member
Posts: 19
Karma: 10
Join Date: Feb 2011
Device: kindle
|
Here are my tags. I'm working on the img and the fn.
Code:
keep_only_tags = [ dict(name='h1'), dict(name='span', attrs={'class':'updated'}), dict(name='span', attrs={'class':'fn'}), dict(name='img', attrs={'id':'img-holder'}), dict(name='span', attrs={'id':'gallery-cutline'}), dict(name='div', attrs={'id':'blox-story-text'}) ] Code:
def preprocess_html(self,soup): # print 'the soup is: ', soup for fn_tag in soup.findAll("span", {"class" : "fn"}): previousSibling_tag = fn_tag.previousSibling if previousSibling_tag.name == 'span': new_tag = Tag(soup,'p') new_tag.insert(0,fn_tag) previousSibling_tag.insert(1,new_tag) for img_tag in soup.findAll('img'): previousSibling_tag = img_tag.previousSibling if previousSibling_tag.name == 'span': new_tag = Tag(soup,'p') new_tag.insert(0,img_tag) previousSibling_tag.insert(2,new_tag) return soup |
Advert | |
|
03-12-2011, 08:51 PM | #21 | ||
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
|
||
03-13-2011, 01:10 AM | #22 |
Member
Posts: 19
Karma: 10
Join Date: Feb 2011
Device: kindle
|
Edit: actually what I had worked beautifully. Here's the final preprocess code.
Code:
def preprocess_html(self,soup): # print 'the soup is: ', soup for fn_tag in soup.findAll("span", {"class" : "fn"}): previousSibling_tag = fn_tag.previousSibling if previousSibling_tag.name == 'span': new_tag = Tag(soup,'p') new_tag.insert(0,fn_tag) previousSibling_tag.insert(1,new_tag) for img_tag in soup.findAll('img'): previousSibling_tag = img_tag.previousSibling # print 'img previoussibling is: ', previousSibling_tag # print 'previousSibling_tag.name is: ', previousSibling_tag.name if previousSibling_tag.name == 'span': new_tag = Tag(soup,'p') # print 'new_tag is: ', new_tag new_tag.insert(0,img_tag) # print 'new_tag is, after insert: ', new_tag previousSibling_tag.insert(2,new_tag) # print 'img previoussibling is after insert: ', previousSibling_tag return soup Last edited by clintiepoo; 03-13-2011 at 01:15 AM. |
03-13-2011, 11:01 AM | #23 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
IQ Parse Error when downloading apps on IQ | tasha326 | PocketBook | 6 | 01-20-2011 12:09 AM |
Initial parse failed: | mburgoa | Calibre | 4 | 08-07-2010 08:50 AM |
I dont live in any of the subscription newspaper's cities... | kilofox | Amazon Kindle | 9 | 04-02-2008 04:33 PM |
from Italy...is PSR 505 good for newspaper's layout? | ionontelodico | Sony Reader | 5 | 12-20-2007 02:12 PM |