Quote:
Originally Posted by TonytheBookworm
Been looking at the AventureGamer code and I have a few questions.
|
Quote:
Originally Posted by TonytheBookworm
Code:
def preprocess_html(self, soup):
mtag = '<meta http-equiv="Content-Language" content="en-US"/>\n<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>'
soup.head.insert(0,mtag)
what is the reason for inserting the meta tag ?
|
That was my early experiment with soup, but now it is not needed and I do not put it in new recipes. You can just ignore it.
Quote:
Originally Posted by TonytheBookworm
Code:
for item in soup.findAll(style=True):
del item['style']
why is the above used? It appears to remove all instance of style but why is it needed?
|
This is needed to remove all style codes which usualy specify some text properties. We need as raw text as possible without any styles whatsoever.
Quote:
Code:
self.append_page(soup, soup.body, 3)
I'm not really clear on this. It appears to me that you are taking the whole soup. appending to the body of the soup with a position of 3?
Code:
pager = soup.find('div',attrs={'class':'toolbar_fat'})
if pager:
pager.extract()
I looked in the code and didn't see why the extraction of this is needed. Because the navigation appears to be inside toolbar_fat_next
|
This would reaquire a bit longer explanation but to shorten it I'm basically making multipage articles into one. The other code example deletes all div's with class toolbar_fat and I remove because we do not need to see navigation as everything is tied into one uniform article.