| 
			
			 | 
		#16 | 
| 
			
			
			
			 Member 
			
			![]() Posts: 19 
				Karma: 10 
				Join Date: Feb 2011 
				
				
				
				Device: kindle 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			I'm getting worse at this, and feeling fatigued at trying.  I can't get this to do anything whatsoever.  The command prompt won't print my variables, won't download the articles... it just is a failure.  I still would like to step through and see where it's failing.  I'm about done trying  
		
	
		
		
		
		
		
		
		
		
		
		
	
	![]() Code: 
	
    def preprocess_html(self,soup):
        for pix in soup.findAll('img'):
            next_tag=tag(soup, soup.body.nextSibling.name)
            new_tag=tag(soup,'p')
            new_tag.insert(0,pix)
            next_tag.insert(0,new_tag)
        return soup
 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#17 | 
| 
			
			
			
			 Wizard 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004 
				Karma: 177841 
				Join Date: Dec 2009 
				
				
				
				Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Try this: 
		
	
		
		
		
		
		
		
		
		
		
		
		
			Spoiler: 
 I used previousSibling to find the span tag that preceded the img tag. Since the span tag had useful text (the date), and was still in the soup, I used it as the marker and just put the img tag into it, after putting it into a p tag. I didn't look closely at your code, but I did see it used "tag" instead of "Tag." Note the imports and the print, which you can comment out with "#". Last edited by Starson17; 03-08-2011 at 10:22 AM.  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#18 | 
| 
			
			
			
			 Member 
			
			![]() Posts: 19 
				Karma: 10 
				Join Date: Feb 2011 
				
				
				
				Device: kindle 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Starson, 
		
	
		
		
		
		
		
		
		
		
		
		
	
	Thanks for helping with that. I was stuck, and I doubt I would have gotten it figured out. This is why you're a wizard and I'm a Jr. Member. ![]() Another questions, if you don't mind: How do I do a similar thing with other tags? I tried adding another for loop before the return soup, and it didn't want to take it. Can you call the preprocess_html twice, or what do you do? I know you didn't teach me to fish, but maybe I can help take it off the hook once you reel it in.  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#19 | 
| 
			
			
			
			 Wizard 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004 
				Karma: 177841 
				Join Date: Dec 2009 
				
				
				
				Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T 
				
				
				 | 
	
	|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#20 | 
| 
			
			
			
			 Member 
			
			![]() Posts: 19 
				Karma: 10 
				Join Date: Feb 2011 
				
				
				
				Device: kindle 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Here are my tags.  I'm working on the img and the fn.   
		
	
		
		
		
		
		
		
		
		
		
		
	
	Code: 
	    keep_only_tags = [ 
                        dict(name='h1'),
                        dict(name='span', attrs={'class':'updated'}),
                        dict(name='span', attrs={'class':'fn'}),
                        dict(name='img', attrs={'id':'img-holder'}),
                        dict(name='span', attrs={'id':'gallery-cutline'}),
                        dict(name='div', attrs={'id':'blox-story-text'})
                     ]
Code: 
	    def preprocess_html(self,soup):
#        print 'the soup is: ', soup
        for fn_tag in soup.findAll("span", {"class" : "fn"}):
            previousSibling_tag = fn_tag.previousSibling
            if previousSibling_tag.name == 'span':
                new_tag = Tag(soup,'p')
                new_tag.insert(0,fn_tag)
                previousSibling_tag.insert(1,new_tag)
        for img_tag in soup.findAll('img'):
            previousSibling_tag = img_tag.previousSibling
            if previousSibling_tag.name == 'span':
                new_tag = Tag(soup,'p')
                new_tag.insert(0,img_tag)
                previousSibling_tag.insert(2,new_tag)                
                
                
        return soup
 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#21 | ||
| 
			
			
			
			 Wizard 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004 
				Karma: 177841 
				Join Date: Dec 2009 
				
				
				
				Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 Quote: 
	
  | 
||
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#22 | 
| 
			
			
			
			 Member 
			
			![]() Posts: 19 
				Karma: 10 
				Join Date: Feb 2011 
				
				
				
				Device: kindle 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Edit: actually what I had worked beautifully.  Here's the final preprocess code. 
		
	
		
		
		
		
		
		
		
		
		
		
		
			Code: 
	    def preprocess_html(self,soup):
#        print 'the soup is: ', soup
        for fn_tag in soup.findAll("span", {"class" : "fn"}):
            previousSibling_tag = fn_tag.previousSibling
            if previousSibling_tag.name == 'span':
                new_tag = Tag(soup,'p')
                new_tag.insert(0,fn_tag)
                previousSibling_tag.insert(1,new_tag)
        for img_tag in soup.findAll('img'):
            previousSibling_tag = img_tag.previousSibling
#            print 'img previoussibling is: ', previousSibling_tag
#            print 'previousSibling_tag.name is: ', previousSibling_tag.name
            if previousSibling_tag.name == 'span':
                new_tag = Tag(soup,'p')
#                print 'new_tag is: ', new_tag
                new_tag.insert(0,img_tag)
#                print 'new_tag is, after insert: ', new_tag                
                previousSibling_tag.insert(2,new_tag)                
#                print 'img previoussibling is after insert: ', previousSibling_tag
                
                
        return soup
Last edited by clintiepoo; 03-13-2011 at 03:15 AM.  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#23 | 
| 
			
			
			
			 Wizard 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004 
				Karma: 177841 
				Join Date: Dec 2009 
				
				
				
				Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T 
				
				
				 | 
	
	|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
![]()  | 
            
        
            
            
  | 
    
			 
			Similar Threads
		 | 
	||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| IQ Parse Error when downloading apps on IQ | tasha326 | PocketBook | 6 | 01-20-2011 01:09 AM | 
| Initial parse failed: | mburgoa | Calibre | 4 | 08-07-2010 09:50 AM | 
| I dont live in any of the subscription newspaper's cities... | kilofox | Amazon Kindle | 9 | 04-02-2008 05:33 PM | 
| from Italy...is PSR 505 good for newspaper's layout? | ionontelodico | Sony Reader | 5 | 12-20-2007 03:12 PM |