![]() |
#16 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
|
![]() |
![]() |
![]() |
#17 |
Enthusiast
![]() Posts: 37
Karma: 10
Join Date: Jul 2011
Device: Kindle
|
|
![]() |
![]() |
Advert | |
|
![]() |
#18 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
I'll be nice and look at your page - hold on .... There aren't any div tags like that. You should probably be doing something like this: Code:
for section in soup.findAll('li'): Code:
for post in section.findAll('a', href=True): |
|
![]() |
![]() |
![]() |
#19 |
Enthusiast
![]() Posts: 37
Karma: 10
Join Date: Jul 2011
Device: Kindle
|
I have an idea of what the <div> tag is I just never understand any of the recipe code referring to it.
Thanks, I appreciate you looking at it. That makes a lot of sense, also that Firebug extension is a great help. Here is my next problem. I really don't understand the whole indent thing in python. It always seems to give me errors. For example when I add Code:
for post in section.findAll('a', href=True): |
![]() |
![]() |
![]() |
#20 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 12,290
Karma: 74007256
Join Date: Nov 2007
Location: Toronto
Device: Nexus 7, Clara, Touch, Tolino EPOS
|
Python uses indentation to "nest" code. Be consistent and use spaces rather than tabs.
Not being an expert I think the idea is you want something close to this: Code:
from calibre.web.feeds.news import BasicNewsRecipe from calibre.ebooks.BeautifulSoup import Tag, NavigableString class WSWS(BasicNewsRecipe): title = 'World Socialist Web Site' __author__ = 'International Committee of The Fourth International' description = 'WSWS' no_stylesheets = True remove_javascript = True def parse_index(self): articles = [] soup = self.index_to_soup('http://wsws.org/mobile/') cover = None feeds = [] for section in soup.findAll('li'): section_title = self.tag_to_string(section.find('b')) articles = [] for post in section.findAll('a', href=True): url = post['href'] if url.startswith('/'): url = 'http://www.wsws.org'+url title = self.tag_to_string(post) if str(post).find('class=') > 0: klass = post['class'] if klass != "": self.log() self.log('--> post: ', post) self.log('--> url: ', url) self.log('--> title: ', title) self.log('--> class: ', klass) articles.append({'title':title, 'url':url}) if articles: feeds.append((section_title, articles)) return feeds Code:
for post in section.findAll('a', href=True): url = post['href'] if url.startswith('/'): url = 'http://www.wsws.org'+url title = self.tag_to_string(post) if str(post).find('class=') > 0: klass = post['class'] if klass != "": self.log() self.log('--> post: ', post) self.log('--> url: ', url) self.log('--> title: ', title) self.log('--> class: ', klass) articles.append({'title':title, 'url':url}) |
![]() |
![]() |
Advert | |
|
![]() |
#21 |
Enthusiast
![]() Posts: 37
Karma: 10
Join Date: Jul 2011
Device: Kindle
|
Thanks for the suggestion Peter, after trying that out it gets stuck at the klass = post['class'] I'm not sure that I need those lines because they are for getting rid of extra links, but my links seem pretty straight forward. I also think that klass has something to do with the other specific page but I'm not sure.
Ugh I'm still having a hard time with indentation errors and I'm not sure what to do. Calibre will tell me that I have an error on line 29 and so I will look at it in Komodo Edit to match up the line numbers and have gone as far as deleting line 29 but still get the error, I don't know what the problem could me. |
![]() |
![]() |
![]() |
#22 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Code:
ebook-convert _Test_1.recipe _Test_1 --test -vv > _Test.txt |
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Text file formatting - line feeds and spaces | Fallingwater | Workshop | 6 | 07-04-2011 02:42 PM |
Newbie question- PDF conversion without losing file formatting | simong6 | Amazon Kindle | 4 | 05-03-2011 04:26 PM |
PDB file (eReader) - How to keep the formatting? | Juliepac | Other formats | 0 | 11-26-2010 07:38 AM |
PDB file - how to keep the formatting? | Juliepac | Apple Devices | 0 | 11-25-2010 06:41 PM |
text file formatting | hobbyman | Calibre | 5 | 10-05-2008 05:18 PM |