|
|
#16 |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
|
|
|
|
|
|
#17 |
|
Enthusiast
![]() Posts: 37
Karma: 10
Join Date: Jul 2011
Device: Kindle
|
|
|
|
|
|
|
#18 | |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
I'll be nice and look at your page - hold on .... There aren't any div tags like that. You should probably be doing something like this: Code:
for section in soup.findAll('li'):
Code:
for post in section.findAll('a', href=True):
|
|
|
|
|
|
|
#19 |
|
Enthusiast
![]() Posts: 37
Karma: 10
Join Date: Jul 2011
Device: Kindle
|
I have an idea of what the <div> tag is I just never understand any of the recipe code referring to it.
Thanks, I appreciate you looking at it. That makes a lot of sense, also that Firebug extension is a great help. Here is my next problem. I really don't understand the whole indent thing in python. It always seems to give me errors. For example when I add Code:
for post in section.findAll('a', href=True):
|
|
|
|
|
|
#20 |
|
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 13,703
Karma: 79983758
Join Date: Nov 2007
Location: Toronto
Device: Libra H2O, Libra Colour
|
Python uses indentation to "nest" code. Be consistent and use spaces rather than tabs.
Not being an expert I think the idea is you want something close to this: Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Tag, NavigableString
class WSWS(BasicNewsRecipe):
title = 'World Socialist Web Site'
__author__ = 'International Committee of The Fourth International'
description = 'WSWS'
no_stylesheets = True
remove_javascript = True
def parse_index(self):
articles = []
soup = self.index_to_soup('http://wsws.org/mobile/')
cover = None
feeds = []
for section in soup.findAll('li'):
section_title = self.tag_to_string(section.find('b'))
articles = []
for post in section.findAll('a', href=True):
url = post['href']
if url.startswith('/'):
url = 'http://www.wsws.org'+url
title = self.tag_to_string(post)
if str(post).find('class=') > 0:
klass = post['class']
if klass != "":
self.log()
self.log('--> post: ', post)
self.log('--> url: ', url)
self.log('--> title: ', title)
self.log('--> class: ', klass)
articles.append({'title':title, 'url':url})
if articles:
feeds.append((section_title, articles))
return feeds
Code:
for post in section.findAll('a', href=True):
url = post['href']
if url.startswith('/'):
url = 'http://www.wsws.org'+url
title = self.tag_to_string(post)
if str(post).find('class=') > 0:
klass = post['class']
if klass != "":
self.log()
self.log('--> post: ', post)
self.log('--> url: ', url)
self.log('--> title: ', title)
self.log('--> class: ', klass)
articles.append({'title':title, 'url':url})
|
|
|
|
|
|
#21 |
|
Enthusiast
![]() Posts: 37
Karma: 10
Join Date: Jul 2011
Device: Kindle
|
Thanks for the suggestion Peter, after trying that out it gets stuck at the klass = post['class'] I'm not sure that I need those lines because they are for getting rid of extra links, but my links seem pretty straight forward. I also think that klass has something to do with the other specific page but I'm not sure.
Ugh I'm still having a hard time with indentation errors and I'm not sure what to do. Calibre will tell me that I have an error on line 29 and so I will look at it in Komodo Edit to match up the line numbers and have gone as far as deleting line 29 but still get the error, I don't know what the problem could me. |
|
|
|
|
|
#22 | |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Code:
ebook-convert _Test_1.recipe _Test_1 --test -vv > _Test.txt |
|
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Text file formatting - line feeds and spaces | Fallingwater | Workshop | 6 | 07-04-2011 03:42 PM |
| Newbie question- PDF conversion without losing file formatting | simong6 | Amazon Kindle | 4 | 05-03-2011 05:26 PM |
| PDB file (eReader) - How to keep the formatting? | Juliepac | Other formats | 0 | 11-26-2010 08:38 AM |
| PDB file - how to keep the formatting? | Juliepac | Apple Devices | 0 | 11-25-2010 07:41 PM |
| text file formatting | hobbyman | Calibre | 5 | 10-05-2008 06:18 PM |