08-07-2011, 06:04 PM | #1 |
Enthusiast
Posts: 37
Karma: 10
Join Date: Jul 2011
Device: Kindle
|
File name formatting
I've searched as much as I could but haven't had any luck with finding a solution.
I'm trying to make a batch file which runs a perl script that grabs a daily news sites stories and then takes the output file and converts it to .mobi and sends it to my kindle. What I need help with is setting up the input and output file names. The input file name is something like example-yyyymmdd.html and I don't know how to set it up. %DATE% gives me the name of the day as well. Any tips? |
08-07-2011, 06:29 PM | #2 |
Wizard
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
Why not just use a recipe?
|
08-07-2011, 07:49 PM | #3 |
Enthusiast
Posts: 37
Karma: 10
Join Date: Jul 2011
Device: Kindle
|
Good question, I should have clarified. The site has a really bad RSS feed in my opinion. I don't know if there is any solution to that because a recipe would be the ideal solution but I kind of gave up on that.
Instead the pearl script uses the mobile version of the site and downloads all the articles into a single html file. |
08-08-2011, 09:47 AM | #4 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
|
|
08-10-2011, 12:41 AM | #5 |
Enthusiast
Posts: 37
Karma: 10
Join Date: Jul 2011
Device: Kindle
|
OK awesome, I've tried to figure out how to use the parse_index, I have a Mobile version of the site that would work perfectly but I can't really figure it out from the official guide.
Would you mind going over the basics of what I should do or point out another post outlining it? Thanks |
08-10-2011, 09:55 AM | #6 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
|
|
08-10-2011, 11:33 AM | #7 |
Enthusiast
Posts: 37
Karma: 10
Join Date: Jul 2011
Device: Kindle
|
Well here is what I tried to hack together with my limited knowledge and the use of other recipes. It keeps saying "ValueError: No articles found, aborting"
Code:
from calibre.web.feeds.news import BasicNewsRecipe from calibre.ebooks.BeautifulSoup import Tag, NavigableString class WSWS(BasicNewsRecipe): title = 'World Socialist Web Site' description = 'WSWS' no_stylesheets = True remove_javascript = True def parse_index(self): articles = [] soup = self.index_to_soup('http://wsws.org/mobile/') cover = None feeds = [] for section in soup.findAll('div', attrs={'class':'content'}): section_title = self.tag_to_string(section.find('b')) articles = [] for post in section.findAll('a', href=True): url = post['href'] if url.startswith('/'): url = 'http://www.wsws.org'+url title = self.tag_to_string(post) if articles: feeds.append((section_title, articles)) return feeds |
08-10-2011, 11:47 AM | #8 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
That's pretty clear. I looked at your code. You're not appending anything to your list of articles. Look at the example you started from and see where they have the articles.append line. Notice how you've appended to the list of feeds with feeds.append but not to the article list.
|
08-10-2011, 12:13 PM | #9 |
Enthusiast
Posts: 37
Karma: 10
Join Date: Jul 2011
Device: Kindle
|
OK I removed that on accident thinking it was a problem, but even after putting it back in I seem to get the same error.
Code:
from calibre.web.feeds.news import BasicNewsRecipe from calibre.ebooks.BeautifulSoup import Tag, NavigableString class WSWS(BasicNewsRecipe): title = 'World Socialist Web Site' __author__ = 'International Committee of The Fourth International' description = 'WSWS' no_stylesheets = True remove_javascript = True def parse_index(self): articles = [] soup = self.index_to_soup('http://wsws.org/mobile/') cover = None feeds = [] for section in soup.findAll('div', attrs={'class':'content'}): section_title = self.tag_to_string(section.find('b')) articles = [] for post in section.findAll('a', href=True): url = post['href'] if url.startswith('/'): url = 'http://www.wsws.org'+url title = self.tag_to_string(post) if str(post).find('class=') > 0: klass = post['class'] if klass != "": self.log() self.log('--> post: ', post) self.log('--> url: ', url) self.log('--> title: ', title) self.log('--> class: ', klass) articles.append({'title':title, 'url':url}) if articles: feeds.append((section_title, articles)) return feeds |
08-10-2011, 01:21 PM | #10 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Then the next step is to see where the recipe is failing. Try adding some print statements:
Quote:
Last edited by Starson17; 08-10-2011 at 02:41 PM. |
|
08-10-2011, 01:38 PM | #11 |
Enthusiast
Posts: 37
Karma: 10
Join Date: Jul 2011
Device: Kindle
|
Still seems to be the same error when using the code you posted. Sorry I don't really have much experience with this stuff so I'm not sure if the details changed.
|
08-10-2011, 02:16 PM | #12 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
My code didn't change the way your code works. It printed out the different stages of your code so you could see where it fails. What output did you get?
|
08-10-2011, 02:27 PM | #13 |
Enthusiast
Posts: 37
Karma: 10
Join Date: Jul 2011
Device: Kindle
|
I see, well this is what I get.
Spoiler:
|
08-10-2011, 02:43 PM | #14 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
that means you didn't hit even one of the print statements.
There are no div tags having class="content" so this part is never entered. Code:
for section in soup.findAll('div', attrs={'class':'content'}): |
08-10-2011, 03:20 PM | #15 |
Enthusiast
Posts: 37
Karma: 10
Join Date: Jul 2011
Device: Kindle
|
I did, but I have no clue what that section means, like I said I don't have any experience with this stuff.
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Text file formatting - line feeds and spaces | Fallingwater | Workshop | 6 | 07-04-2011 02:42 PM |
Newbie question- PDF conversion without losing file formatting | simong6 | Amazon Kindle | 4 | 05-03-2011 04:26 PM |
PDB file (eReader) - How to keep the formatting? | Juliepac | Other formats | 0 | 11-26-2010 07:38 AM |
PDB file - how to keep the formatting? | Juliepac | Apple Devices | 0 | 11-25-2010 06:41 PM |
text file formatting | hobbyman | Calibre | 5 | 10-05-2008 05:18 PM |