Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 08-07-2011, 06:04 PM   #1
yoss15
Enthusiast
yoss15 began at the beginning.
 
Posts: 37
Karma: 10
Join Date: Jul 2011
Device: Kindle
File name formatting

I've searched as much as I could but haven't had any luck with finding a solution.

I'm trying to make a batch file which runs a perl script that grabs a daily news sites stories and then takes the output file and converts it to .mobi and sends it to my kindle.

What I need help with is setting up the input and output file names. The input file name is something like example-yyyymmdd.html and I don't know how to set it up. %DATE% gives me the name of the day as well.

Any tips?
yoss15 is offline   Reply With Quote
Old 08-07-2011, 06:29 PM   #2
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Why not just use a recipe?
Manichean is offline   Reply With Quote
Advert
Old 08-07-2011, 07:49 PM   #3
yoss15
Enthusiast
yoss15 began at the beginning.
 
Posts: 37
Karma: 10
Join Date: Jul 2011
Device: Kindle
Quote:
Originally Posted by Manichean View Post
Why not just use a recipe?
Good question, I should have clarified. The site has a really bad RSS feed in my opinion. I don't know if there is any solution to that because a recipe would be the ideal solution but I kind of gave up on that.

Instead the pearl script uses the mobile version of the site and downloads all the articles into a single html file.
yoss15 is offline   Reply With Quote
Old 08-08-2011, 09:47 AM   #4
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by yoss15 View Post
The site has a really bad RSS feed in my opinion. I don't know if there is any solution to that because a recipe would be the ideal solution but I kind of gave up on that.

Instead the pearl script uses the mobile version of the site and downloads all the articles into a single html file.
There are two basic methods of grabbing the list of articles for a recipe-created ebook. The first is to use the RSS feed. If the RSS feed is bad, or there isn't one, you use parse_index. Parse_index will read a web page and process links to articles found on that page to create a virtual RSS feed, which is then passed into the recipe system.
Starson17 is offline   Reply With Quote
Old 08-10-2011, 12:41 AM   #5
yoss15
Enthusiast
yoss15 began at the beginning.
 
Posts: 37
Karma: 10
Join Date: Jul 2011
Device: Kindle
OK awesome, I've tried to figure out how to use the parse_index, I have a Mobile version of the site that would work perfectly but I can't really figure it out from the official guide.

Would you mind going over the basics of what I should do or point out another post outlining it?

Thanks
yoss15 is offline   Reply With Quote
Advert
Old 08-10-2011, 09:55 AM   #6
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by yoss15 View Post
OK awesome, I've tried to figure out how to use the parse_index, I have a Mobile version of the site that would work perfectly but I can't really figure it out from the official guide.

Would you mind going over the basics of what I should do or point out another post outlining it?

Thanks
I can answer specific questions, if you have any? The official guide is an excellent primer. Another excellent option is to look at any recipe using parse_index. Many are posted here, and many of the builtins use it.
Starson17 is offline   Reply With Quote
Old 08-10-2011, 11:33 AM   #7
yoss15
Enthusiast
yoss15 began at the beginning.
 
Posts: 37
Karma: 10
Join Date: Jul 2011
Device: Kindle
Well here is what I tried to hack together with my limited knowledge and the use of other recipes. It keeps saying "ValueError: No articles found, aborting"

Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Tag, NavigableString

class WSWS(BasicNewsRecipe):

    title      = 'World Socialist Web Site'
    description = 'WSWS'

    no_stylesheets = True
    remove_javascript     = True

    def parse_index(self):
        articles = []
        soup = self.index_to_soup('http://wsws.org/mobile/')
        cover = None
        feeds = []
        for section in soup.findAll('div', attrs={'class':'content'}):
            section_title = self.tag_to_string(section.find('b'))
            articles = []
            for post in section.findAll('a', href=True):
                url = post['href']
                if url.startswith('/'):
                  url = 'http://www.wsws.org'+url
                  title = self.tag_to_string(post)
            if articles:
                feeds.append((section_title, articles))
        return feeds
yoss15 is offline   Reply With Quote
Old 08-10-2011, 11:47 AM   #8
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by yoss15 View Post
Well here is what I tried to hack together with my limited knowledge and the use of other recipes. It keeps saying "ValueError: No articles found, aborting"
That's pretty clear. I looked at your code. You're not appending anything to your list of articles. Look at the example you started from and see where they have the articles.append line. Notice how you've appended to the list of feeds with feeds.append but not to the article list.
Starson17 is offline   Reply With Quote
Old 08-10-2011, 12:13 PM   #9
yoss15
Enthusiast
yoss15 began at the beginning.
 
Posts: 37
Karma: 10
Join Date: Jul 2011
Device: Kindle
OK I removed that on accident thinking it was a problem, but even after putting it back in I seem to get the same error.

Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Tag, NavigableString

class WSWS(BasicNewsRecipe):

    title      = 'World Socialist Web Site'
    __author__ = 'International Committee of The Fourth International'
    description = 'WSWS'

    no_stylesheets = True
    remove_javascript     = True

    def parse_index(self):
        articles = []
        soup = self.index_to_soup('http://wsws.org/mobile/')
        cover = None
        feeds = []
        for section in soup.findAll('div', attrs={'class':'content'}):
            section_title = self.tag_to_string(section.find('b'))
            articles = []
            for post in section.findAll('a', href=True):
                url = post['href']
                if url.startswith('/'):
                  url = 'http://www.wsws.org'+url
                  title = self.tag_to_string(post)
                  if str(post).find('class=') > 0:
                    klass = post['class']
                    if klass != "":
                      self.log()
                      self.log('--> post:  ', post)
                      self.log('--> url:   ', url)
                      self.log('--> title: ', title)
                      self.log('--> class: ', klass)
                      articles.append({'title':title, 'url':url})
            if articles:
                feeds.append((section_title, articles))
        return feeds
yoss15 is offline   Reply With Quote
Old 08-10-2011, 01:21 PM   #10
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by yoss15 View Post
I seem to get the same error.
Then the next step is to see where the recipe is failing. Try adding some print statements:
Quote:
Spoiler:
Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Tag, NavigableString

class WSWS(BasicNewsRecipe):

    title      = 'World Socialist Web Site'
    __author__ = 'International Committee of The Fourth International'
    description = 'WSWS'

    no_stylesheets = True
    remove_javascript     = True

    def parse_index(self):
        articles = []
        soup = self.index_to_soup('http://wsws.org/mobile/')
        cover = None
        feeds = []
        for section in soup.findAll('div', attrs={'class':'content'}):
            print 'A section was found!' 
            section_title = self.tag_to_string(section.find('b'))
            articles = []
            for post in section.findAll('a', href=True):
                print 'A post was found!'
                url = post['href']
                if url.startswith('/'):
                  url = 'http://www.wsws.org'+url
                  title = self.tag_to_string(post)
                  if str(post).find('class=') > 0:
                    print 'A class was found in the post!'    
                    klass = post['class']
                    if klass != "":
                      print 'A klass was found!'
                      self.log()
                      self.log('--> post:  ', post)
                      self.log('--> url:   ', url)
                      self.log('--> title: ', title)
                      self.log('--> class: ', klass)
                      articles.append({'title':title, 'url':url})
            if articles:
                feeds.append((section_title, articles))
        return feeds

Last edited by Starson17; 08-10-2011 at 02:41 PM.
Starson17 is offline   Reply With Quote
Old 08-10-2011, 01:38 PM   #11
yoss15
Enthusiast
yoss15 began at the beginning.
 
Posts: 37
Karma: 10
Join Date: Jul 2011
Device: Kindle
Still seems to be the same error when using the code you posted. Sorry I don't really have much experience with this stuff so I'm not sure if the details changed.
yoss15 is offline   Reply With Quote
Old 08-10-2011, 02:16 PM   #12
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by yoss15 View Post
Still seems to be the same error when using the code you posted. Sorry I don't really have much experience with this stuff so I'm not sure if the details changed.
My code didn't change the way your code works. It printed out the different stages of your code so you could see where it fails. What output did you get?
Starson17 is offline   Reply With Quote
Old 08-10-2011, 02:27 PM   #13
yoss15
Enthusiast
yoss15 began at the beginning.
 
Posts: 37
Karma: 10
Join Date: Jul 2011
Device: Kindle
I see, well this is what I get.

Spoiler:
Fetch news from World Socialist Web Site
Resolved conversion options
calibre version: 0.8.13
{'asciiize': False,
'author_sort': None,
'authors': None,
'base_font_size': 0,
'book_producer': None,
'change_justification': 'original',
'chapter': None,
'chapter_mark': 'pagebreak',
'comments': None,
'cover': None,
'debug_pipeline': None,
'dehyphenate': True,
'delete_blank_paragraphs': True,
'disable_font_rescaling': False,
'dont_compress': False,
'dont_download_recipe': False,
'duplicate_links_in_toc': False,
'enable_heuristics': False,
'extra_css': None,
'extract_to': None,
'fix_indents': True,
'font_size_mapping': None,
'format_scene_breaks': True,
'html_unwrap_factor': 0.4,
'input_encoding': None,
'input_profile': <calibre.customize.profiles.InputProfile object at 0x1086d9510>,
'insert_blank_line': False,
'insert_blank_line_size': 0.5,
'insert_metadata': False,
'isbn': None,
'italicize_common_cases': True,
'keep_ligatures': False,
'kindlegen': False,
'language': None,
'level1_toc': None,
'level2_toc': None,
'level3_toc': None,
'line_height': 0,
'linearize_tables': False,
'lrf': False,
'margin_bottom': 5.0,
'margin_left': 5.0,
'margin_right': 5.0,
'margin_top': 5.0,
'markup_chapter_headings': True,
'max_toc_links': 50,
'minimum_line_height': 120.0,
'mobi_ignore_margins': False,
'mobi_toc_at_start': False,
'no_chapters_in_toc': False,
'no_inline_navbars': True,
'no_inline_toc': False,
'output_profile': <calibre.customize.profiles.KindleOutput object at 0x1086d99d0>,
'page_breaks_before': None,
'password': None,
'personal_doc': '[PDOC]',
'prefer_author_sort': False,
'prefer_metadata_cover': False,
'pretty_print': False,
'pubdate': None,
'publisher': None,
'rating': None,
'read_metadata_from_opf': None,
'remove_fake_margins': True,
'remove_first_image': False,
'remove_paragraph_spacing': False,
'remove_paragraph_spacing_indent_size': 1.5,
'renumber_headings': True,
'replace_scene_breaks': '',
'rescale_images': False,
'series': None,
'series_index': None,
'smarten_punctuation': False,
'sr1_replace': '',
'sr1_search': '',
'sr2_replace': '',
'sr2_search': '',
'sr3_replace': '',
'sr3_search': '',
'tags': None,
'test': False,
'timestamp': None,
'title': None,
'title_sort': None,
'toc_filter': None,
'toc_threshold': 6,
'toc_title': None,
'unwrap_lines': True,
'use_auto_toc': False,
'username': None,
'verbose': 2}
Python function terminated unexpectedly: No articles found, aborting
InputFormatPlugin: Recipe Input running
Traceback (most recent call last):
File "/Applications/calibre.app/Contents/Resources/Python/lib/python2.7/site.py", line 147, in main
return run_entry_point()
File "/Applications/calibre.app/Contents/Resources/Python/lib/python2.7/site.py", line 116, in run_entry_point
return getattr(pmod, func)()
File "site-packages/calibre/utils/ipc/worker.py", line 181, in main
File "site-packages/calibre/gui2/convert/gui_conversion.py", line 25, in gui_convert
File "site-packages/calibre/ebooks/conversion/plumber.py", line 937, in run
File "site-packages/calibre/customize/conversion.py", line 204, in __call__
File "site-packages/calibre/web/feeds/input.py", line 105, in convert
File "site-packages/calibre/web/feeds/news.py", line 737, in download
File "site-packages/calibre/web/feeds/news.py", line 882, in build_index
ValueError: No articles found, aborting
yoss15 is offline   Reply With Quote
Old 08-10-2011, 02:43 PM   #14
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by yoss15 View Post
I see, well this is what I get.
that means you didn't hit even one of the print statements.

There are no div tags having class="content" so this part is never entered.
Code:
for section in soup.findAll('div', attrs={'class':'content'}):
Have you looked at your source page?
Starson17 is offline   Reply With Quote
Old 08-10-2011, 03:20 PM   #15
yoss15
Enthusiast
yoss15 began at the beginning.
 
Posts: 37
Karma: 10
Join Date: Jul 2011
Device: Kindle
I did, but I have no clue what that section means, like I said I don't have any experience with this stuff.
yoss15 is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Text file formatting - line feeds and spaces Fallingwater Workshop 6 07-04-2011 02:42 PM
Newbie question- PDF conversion without losing file formatting simong6 Amazon Kindle 4 05-03-2011 04:26 PM
PDB file (eReader) - How to keep the formatting? Juliepac Other formats 0 11-26-2010 07:38 AM
PDB file - how to keep the formatting? Juliepac Apple Devices 0 11-25-2010 06:41 PM
text file formatting hobbyman Calibre 5 10-05-2008 05:18 PM


All times are GMT -4. The time now is 07:13 AM.


MobileRead.com is a privately owned, operated and funded community.