The Mainichi fails on every download

rasteps · 02-27-2023, 05:54 PM

The Mainichi fails on every download attempt for me. Is this happening for anyone else?

rasteps · 03-03-2023, 07:33 PM

What does this error code mean?

Spoiler:

kovidgoyal · 03-03-2023, 08:07 PM

It means the index page the recipe uses no longer exists.

rasteps · 03-04-2023, 02:57 AM

Quote:

Originally Posted by kovidgoyal

It means the index page the recipe uses no longer exists.

Hmm....thanks for letting me know.

rasteps · 03-04-2023, 05:58 AM

I checked the url which is listed as the "index" page

https://mainichi.jp/english/

And it does exist and is accessible. So maybe there is a different error.

unkn0wn · 03-04-2023, 12:25 PM

Code:

"""
www.mainichi.jp/english
"""

from calibre.ptempfile import PersistentTemporaryFile
from calibre.web.feeds.news import BasicNewsRecipe

class MainichiEnglishNews(BasicNewsRecipe):
    title = u"The Mainichi"
    __author__ = 'unkn0wn'

    description = "Japanese traditional newspaper Mainichi news in English"
    publisher = "Mainichi News"
    publication_type = "newspaper"
    category = "news, japan"
    language = "en_JP"

    index = "http://mainichi.jp/english/"
    masthead_url = index + "images/themainichi.png"

    no_stylesheets = True
    remove_javascript = True
    auto_cleanup = True
    
    ignore_duplicate_articles = {'title'}
    
    articles_are_obfuscated = True
    
    def get_obfuscated_article(self, url):
        br = self.get_browser()
        try:
            br.open(url)
        except Exception as e:
            url = e.hdrs.get('location')
        soup = self.index_to_soup(url)
        link = soup.find('a', href=True)
        html = br.open(link['href']).read()
        pt = PersistentTemporaryFile('.html')
        pt.write(html)
        pt.close()
        return pt.name
    
    feeds = [
        ('Articles', 'https://news.google.com/rss/search?q=when:48h+allinurl:mainichi.jp%2Fenglish%2Farticles%2F&hl=en-US&gl=US&ceid=US:en')
    ]

rasteps · 03-04-2023, 11:12 PM

Quote:

Originally Posted by unkn0wn

Code:

"""
www.mainichi.jp/english
"""

from calibre.ptempfile import PersistentTemporaryFile
from calibre.web.feeds.news import BasicNewsRecipe

class MainichiEnglishNews(BasicNewsRecipe):
    title = u"The Mainichi"
    __author__ = 'unkn0wn'

    description = "Japanese traditional newspaper Mainichi news in English"
    publisher = "Mainichi News"
    publication_type = "newspaper"
    category = "news, japan"
    language = "en_JP"

    index = "http://mainichi.jp/english/"
    masthead_url = index + "images/themainichi.png"

    no_stylesheets = True
    remove_javascript = True
    auto_cleanup = True
    
    ignore_duplicate_articles = {'title'}
    
    articles_are_obfuscated = True
    
    def get_obfuscated_article(self, url):
        br = self.get_browser()
        try:
            br.open(url)
        except Exception as e:
            url = e.hdrs.get('location')
        soup = self.index_to_soup(url)
        link = soup.find('a', href=True)
        html = br.open(link['href']).read()
        pt = PersistentTemporaryFile('.html')
        pt.write(html)
        pt.close()
        return pt.name
    
    feeds = [
        ('Articles', 'https://news.google.com/rss/search?q=when:48h+allinurl:mainichi.jp%2Fenglish%2Farticles%2F&hl=en-US&gl=US&ceid=US:en')
    ]

Is the fix to add "www"???

unkn0wn · 03-05-2023, 12:41 AM

no this is totally a new recipe of the website based on google feeds.

I didn't want to spend time to figure out all the issues with the old recipe.

rasteps · 03-10-2023, 01:11 AM

Thanks. This recipe is working again.

rasteps · 03-10-2023, 01:13 AM

Thanks. This recipe is working again. I guess I am the only one who uses it, so I should figure out how to fix these things.

02-27-2023, 05:54 PM	#1
rasteps Zealot Posts: 140 Karma: 10 Join Date: Sep 2010 Device: Kindle, Android phone	The Mainichi fails on every download The Mainichi fails on every download attempt for me. Is this happening for anyone else?

03-03-2023, 07:33 PM	#2
rasteps Zealot Posts: 140 Karma: 10 Join Date: Sep 2010 Device: Kindle, Android phone	What does this error code mean? Spoiler: Fetch news from The Mainichi Conversion options changed from defaults: output_profile: 'generic_eink' verbose: 2 Resolved conversion options calibre version: 6.13.0 {'asciiize': False, 'author_sort': None, 'authors': None, 'base_font_size': 0, 'book_producer': None, 'change_justification': 'original', 'chapter': None, 'chapter_mark': 'pagebreak', 'comments': None, 'cover': None, 'debug_pipeline': None, 'dehyphenate': True, 'delete_blank_paragraphs': True, 'disable_font_rescaling': False, 'dont_download_recipe': False, 'dont_split_on_page_breaks': True, 'duplicate_links_in_toc': False, 'embed_all_fonts': False, 'embed_font_family': None, 'enable_heuristics': False, 'epub_flatten': False, 'epub_inline_toc': False, 'epub_toc_at_end': False, 'epub_version': '2', 'expand_css': False, 'extra_css': None, 'extract_to': None, 'filter_css': None, 'fix_indents': True, 'flow_size': 260, 'font_size_mapping': None, 'format_scene_breaks': True, 'html_unwrap_factor': 0.4, 'input_encoding': None, 'input_profile': <calibre.customize.profiles.InputProfile object at 0x000002B5D9A855D0>, 'insert_blank_line': False, 'insert_blank_line_size': 0.5, 'insert_metadata': False, 'isbn': None, 'italicize_common_cases': True, 'keep_ligatures': False, 'language': None, 'level1_toc': None, 'level2_toc': None, 'level3_toc': None, 'line_height': 0, 'linearize_tables': False, 'lrf': False, 'margin_bottom': 5.0, 'margin_left': 5.0, 'margin_right': 5.0, 'margin_top': 5.0, 'markup_chapter_headings': True, 'max_toc_links': 50, 'minimum_line_height': 120.0, 'no_chapters_in_toc': False, 'no_default_epub_cover': False, 'no_inline_navbars': False, 'no_svg_cover': False, 'output_profile': <calibre.customize.profiles.GenericEink object at 0x000002B5D9A858D0>, 'page_breaks_before': None, 'prefer_metadata_cover': False, 'preserve_cover_aspect_ratio': False, 'pretty_print': True, 'pubdate': None, 'publisher': None, 'rating': None, 'read_metadata_from_opf': None, 'remove_fake_margins': True, 'remove_first_image': False, 'remove_paragraph_spacing': False, 'remove_paragraph_spacing_indent_size': 1.5, 'renumber_headings': True, 'replace_scene_breaks': '', 'search_replace': None, 'series': None, 'series_index': None, 'smarten_punctuation': False, 'sr1_replace': '', 'sr1_search': '', 'sr2_replace': '', 'sr2_search': '', 'sr3_replace': '', 'sr3_search': '', 'start_reading_at': None, 'subset_embedded_fonts': False, 'tags': None, 'test': False, 'timestamp': None, 'title': None, 'title_sort': None, 'toc_filter': None, 'toc_threshold': 6, 'toc_title': None, 'transform_css_rules': None, 'transform_html_rules': None, 'unsmarten_punctuation': False, 'unwrap_lines': True, 'use_auto_toc': False, 'verbose': 2} InputFormatPlugin: Recipe Input running Downloading recipe urn: builtin:mainichi_en Trying to get latest version of recipe: mainichi_en Using user agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.87 Safari/537.36 Traceback (most recent call last): File "runpy.py", line 196, in _run_module_as_main File "runpy.py", line 86, in _run_code File "site.py", line 83, in <module> File "site.py", line 78, in main File "site.py", line 50, in run_entry_point File "calibre\utils\ipc\worker.py", line 215, in main File "calibre\gui2\convert\gui_conversion.py", line 31, in gui_convert_recipe File "calibre\gui2\convert\gui_conversion.py", line 25, in gui_convert File "calibre\ebooks\conversion\plumber.py", line 1108, in run File "calibre\customize\conversion.py", line 242, in __call__ File "calibre\ebooks\conversion\plugins\recipe_input.py ", line 138, in convert File "calibre\web\feeds\news.py", line 1056, in download File "calibre\web\feeds\news.py", line 1225, in build_index File "<string>", line 125, in parse_index File "calibre\web\feeds\news.py", line 707, in index_to_soup File "mechanize\_mechanize.py", line 241, in open_novisit File "mechanize\_mechanize.py", line 313, in _mech_open mechanize._response.get_seek_wrapper_class.<locals >.httperror_seek_wrapper: HTTP Error 404: Not Found

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Fails to download Newsweek	snailslow	Recipes	5	04-08-2016 10:58 PM
Update download fails at 40%	TonyToews	Kobo Tablets	0	03-16-2012 02:48 PM
Metadata fails to download	jadedboi	Library Management	8	05-09-2011 03:55 PM
download metadata fails .....	schuster	Calibre	1	02-10-2011 11:19 AM
Download fails when using password	jarid	Calibre	5	11-14-2010 04:38 PM

03-03-2023, 08:07 PM	#3
kovidgoyal creator of calibre Posts: 45,604 Karma: 28548974 Join Date: Oct 2006 Location: Mumbai, India Device: Various	It means the index page the recipe uses no longer exists.

03-04-2023, 05:58 AM	#5
rasteps Zealot Posts: 140 Karma: 10 Join Date: Sep 2010 Device: Kindle, Android phone	I checked the url which is listed as the "index" page https://mainichi.jp/english/ And it does exist and is accessible. So maybe there is a different error.

03-05-2023, 12:41 AM	#8
unkn0wn Guru Posts: 644 Karma: 85520 Join Date: May 2021 Device: kindle	no this is totally a new recipe of the website based on google feeds. I didn't want to spend time to figure out all the issues with the old recipe.

03-10-2023, 01:11 AM	#9
rasteps Zealot Posts: 140 Karma: 10 Join Date: Sep 2010 Device: Kindle, Android phone	Thanks. This recipe is working again.

03-10-2023, 01:13 AM	#10
rasteps Zealot Posts: 140 Karma: 10 Join Date: Sep 2010 Device: Kindle, Android phone	Thanks. This recipe is working again. I guess I am the only one who uses it, so I should figure out how to fix these things.

Advert

Advert