File name formatting

yoss15 · 08-07-2011, 07:04 PM

I've searched as much as I could but haven't had any luck with finding a solution.

I'm trying to make a batch file which runs a perl script that grabs a daily news sites stories and then takes the output file and converts it to .mobi and sends it to my kindle.

What I need help with is setting up the input and output file names. The input file name is something like example-yyyymmdd.html and I don't know how to set it up. %DATE% gives me the name of the day as well.

Any tips?

Manichean · 08-07-2011, 07:29 PM

Why not just use a recipe?

yoss15 · 08-07-2011, 08:49 PM

Quote:

Originally Posted by Manichean

Why not just use a recipe?

Good question, I should have clarified. The site has a really bad RSS feed in my opinion. I don't know if there is any solution to that because a recipe would be the ideal solution but I kind of gave up on that.

Instead the pearl script uses the mobile version of the site and downloads all the articles into a single html file.

Starson17 · 08-08-2011, 10:47 AM

Quote:

Originally Posted by yoss15

The site has a really bad RSS feed in my opinion. I don't know if there is any solution to that because a recipe would be the ideal solution but I kind of gave up on that.

Instead the pearl script uses the mobile version of the site and downloads all the articles into a single html file.

There are two basic methods of grabbing the list of articles for a recipe-created ebook. The first is to use the RSS feed. If the RSS feed is bad, or there isn't one, you use parse_index. Parse_index will read a web page and process links to articles found on that page to create a virtual RSS feed, which is then passed into the recipe system.

yoss15 · 08-10-2011, 01:41 AM

OK awesome, I've tried to figure out how to use the parse_index, I have a Mobile version of the site that would work perfectly but I can't really figure it out from the official guide.

Would you mind going over the basics of what I should do or point out another post outlining it?

Thanks

Starson17 · 08-10-2011, 10:55 AM

Quote:

Originally Posted by yoss15

OK awesome, I've tried to figure out how to use the parse_index, I have a Mobile version of the site that would work perfectly but I can't really figure it out from the official guide.

Would you mind going over the basics of what I should do or point out another post outlining it?

Thanks

I can answer specific questions, if you have any? The official guide is an excellent primer. Another excellent option is to look at any recipe using parse_index. Many are posted here, and many of the builtins use it.

yoss15 · 08-10-2011, 12:33 PM

Well here is what I tried to hack together with my limited knowledge and the use of other recipes. It keeps saying "ValueError: No articles found, aborting"

Code:

from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Tag, NavigableString

class WSWS(BasicNewsRecipe):

    title      = 'World Socialist Web Site'
    description = 'WSWS'

    no_stylesheets = True
    remove_javascript     = True

    def parse_index(self):
        articles = []
        soup = self.index_to_soup('http://wsws.org/mobile/')
        cover = None
        feeds = []
        for section in soup.findAll('div', attrs={'class':'content'}):
            section_title = self.tag_to_string(section.find('b'))
            articles = []
            for post in section.findAll('a', href=True):
                url = post['href']
                if url.startswith('/'):
                  url = 'http://www.wsws.org'+url
                  title = self.tag_to_string(post)
            if articles:
                feeds.append((section_title, articles))
        return feeds

Starson17 · 08-10-2011, 12:47 PM

Quote:

Originally Posted by yoss15

Well here is what I tried to hack together with my limited knowledge and the use of other recipes. It keeps saying "ValueError: No articles found, aborting"

That's pretty clear. I looked at your code. You're not appending anything to your list of articles. Look at the example you started from and see where they have the articles.append line. Notice how you've appended to the list of feeds with feeds.append but not to the article list.

yoss15 · 08-10-2011, 01:13 PM

OK I removed that on accident thinking it was a problem, but even after putting it back in I seem to get the same error.

Code:

from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Tag, NavigableString

class WSWS(BasicNewsRecipe):

    title      = 'World Socialist Web Site'
    __author__ = 'International Committee of The Fourth International'
    description = 'WSWS'

    no_stylesheets = True
    remove_javascript     = True

    def parse_index(self):
        articles = []
        soup = self.index_to_soup('http://wsws.org/mobile/')
        cover = None
        feeds = []
        for section in soup.findAll('div', attrs={'class':'content'}):
            section_title = self.tag_to_string(section.find('b'))
            articles = []
            for post in section.findAll('a', href=True):
                url = post['href']
                if url.startswith('/'):
                  url = 'http://www.wsws.org'+url
                  title = self.tag_to_string(post)
                  if str(post).find('class=') > 0:
                    klass = post['class']
                    if klass != "":
                      self.log()
                      self.log('--> post:  ', post)
                      self.log('--> url:   ', url)
                      self.log('--> title: ', title)
                      self.log('--> class: ', klass)
                      articles.append({'title':title, 'url':url})
            if articles:
                feeds.append((section_title, articles))
        return feeds

Starson17 · 08-10-2011, 02:21 PM

Quote:

Originally Posted by yoss15

I seem to get the same error.

Then the next step is to see where the recipe is failing. Try adding some print statements:

Quote:

Spoiler:

yoss15 · 08-10-2011, 02:38 PM

Still seems to be the same error when using the code you posted. Sorry I don't really have much experience with this stuff so I'm not sure if the details changed.

Starson17 · 08-10-2011, 03:16 PM

Quote:

Originally Posted by yoss15

Still seems to be the same error when using the code you posted. Sorry I don't really have much experience with this stuff so I'm not sure if the details changed.

My code didn't change the way your code works. It printed out the different stages of your code so you could see where it fails. What output did you get?

yoss15 · 08-10-2011, 03:27 PM

I see, well this is what I get.

Spoiler:

Starson17 · 08-10-2011, 03:43 PM

Quote:

Originally Posted by yoss15

I see, well this is what I get.

that means you didn't hit even one of the print statements.

There are no div tags having class="content" so this part is never entered.

Code:

for section in soup.findAll('div', attrs={'class':'content'}):

Have you looked at your source page?

yoss15 · 08-10-2011, 04:20 PM

I did, but I have no clue what that section means, like I said I don't have any experience with this stuff.

08-07-2011, 07:04 PM	#1
yoss15 Enthusiast Posts: 37 Karma: 10 Join Date: Jul 2011 Device: Kindle	File name formatting I've searched as much as I could but haven't had any luck with finding a solution. I'm trying to make a batch file which runs a perl script that grabs a daily news sites stories and then takes the output file and converts it to .mobi and sends it to my kindle. What I need help with is setting up the input and output file names. The input file name is something like example-yyyymmdd.html and I don't know how to set it up. %DATE% gives me the name of the day as well. Any tips?

08-10-2011, 03:27 PM	#13
yoss15 Enthusiast Posts: 37 Karma: 10 Join Date: Jul 2011 Device: Kindle	I see, well this is what I get. Spoiler: Fetch news from World Socialist Web Site Resolved conversion options calibre version: 0.8.13 {'asciiize': False, 'author_sort': None, 'authors': None, 'base_font_size': 0, 'book_producer': None, 'change_justification': 'original', 'chapter': None, 'chapter_mark': 'pagebreak', 'comments': None, 'cover': None, 'debug_pipeline': None, 'dehyphenate': True, 'delete_blank_paragraphs': True, 'disable_font_rescaling': False, 'dont_compress': False, 'dont_download_recipe': False, 'duplicate_links_in_toc': False, 'enable_heuristics': False, 'extra_css': None, 'extract_to': None, 'fix_indents': True, 'font_size_mapping': None, 'format_scene_breaks': True, 'html_unwrap_factor': 0.4, 'input_encoding': None, 'input_profile': <calibre.customize.profiles.InputProfile object at 0x1086d9510>, 'insert_blank_line': False, 'insert_blank_line_size': 0.5, 'insert_metadata': False, 'isbn': None, 'italicize_common_cases': True, 'keep_ligatures': False, 'kindlegen': False, 'language': None, 'level1_toc': None, 'level2_toc': None, 'level3_toc': None, 'line_height': 0, 'linearize_tables': False, 'lrf': False, 'margin_bottom': 5.0, 'margin_left': 5.0, 'margin_right': 5.0, 'margin_top': 5.0, 'markup_chapter_headings': True, 'max_toc_links': 50, 'minimum_line_height': 120.0, 'mobi_ignore_margins': False, 'mobi_toc_at_start': False, 'no_chapters_in_toc': False, 'no_inline_navbars': True, 'no_inline_toc': False, 'output_profile': <calibre.customize.profiles.KindleOutput object at 0x1086d99d0>, 'page_breaks_before': None, 'password': None, 'personal_doc': '[PDOC]', 'prefer_author_sort': False, 'prefer_metadata_cover': False, 'pretty_print': False, 'pubdate': None, 'publisher': None, 'rating': None, 'read_metadata_from_opf': None, 'remove_fake_margins': True, 'remove_first_image': False, 'remove_paragraph_spacing': False, 'remove_paragraph_spacing_indent_size': 1.5, 'renumber_headings': True, 'replace_scene_breaks': '', 'rescale_images': False, 'series': None, 'series_index': None, 'smarten_punctuation': False, 'sr1_replace': '', 'sr1_search': '', 'sr2_replace': '', 'sr2_search': '', 'sr3_replace': '', 'sr3_search': '', 'tags': None, 'test': False, 'timestamp': None, 'title': None, 'title_sort': None, 'toc_filter': None, 'toc_threshold': 6, 'toc_title': None, 'unwrap_lines': True, 'use_auto_toc': False, 'username': None, 'verbose': 2} Python function terminated unexpectedly: No articles found, aborting InputFormatPlugin: Recipe Input running Traceback (most recent call last): File "/Applications/calibre.app/Contents/Resources/Python/lib/python2.7/site.py", line 147, in main return run_entry_point() File "/Applications/calibre.app/Contents/Resources/Python/lib/python2.7/site.py", line 116, in run_entry_point return getattr(pmod, func)() File "site-packages/calibre/utils/ipc/worker.py", line 181, in main File "site-packages/calibre/gui2/convert/gui_conversion.py", line 25, in gui_convert File "site-packages/calibre/ebooks/conversion/plumber.py", line 937, in run File "site-packages/calibre/customize/conversion.py", line 204, in __call__ File "site-packages/calibre/web/feeds/input.py", line 105, in convert File "site-packages/calibre/web/feeds/news.py", line 737, in download File "site-packages/calibre/web/feeds/news.py", line 882, in build_index ValueError: No articles found, aborting

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Text file formatting - line feeds and spaces	Fallingwater	Workshop	6	07-04-2011 03:42 PM
Newbie question- PDF conversion without losing file formatting	simong6	Amazon Kindle	4	05-03-2011 05:26 PM
PDB file (eReader) - How to keep the formatting?	Juliepac	Other formats	0	11-26-2010 08:38 AM
PDB file - how to keep the formatting?	Juliepac	Apple Devices	0	11-25-2010 07:41 PM
text file formatting	hobbyman	Calibre	5	10-05-2008 06:18 PM

08-07-2011, 07:29 PM	#2
Manichean Wizard Posts: 3,130 Karma: 91256 Join Date: Feb 2008 Location: Germany Device: Cybook Gen3	Why not just use a recipe?

08-10-2011, 01:41 AM	#5
yoss15 Enthusiast Posts: 37 Karma: 10 Join Date: Jul 2011 Device: Kindle	OK awesome, I've tried to figure out how to use the parse_index, I have a Mobile version of the site that would work perfectly but I can't really figure it out from the official guide. Would you mind going over the basics of what I should do or point out another post outlining it? Thanks

08-10-2011, 02:38 PM	#11
yoss15 Enthusiast Posts: 37 Karma: 10 Join Date: Jul 2011 Device: Kindle	Still seems to be the same error when using the code you posted. Sorry I don't really have much experience with this stuff so I'm not sure if the details changed.

08-10-2011, 04:20 PM	#15
yoss15 Enthusiast Posts: 37 Karma: 10 Join Date: Jul 2011 Device: Kindle	I did, but I have no clue what that section means, like I said I don't have any experience with this stuff.