|  01-03-2016, 12:09 PM | #1 | 
| Connoisseur            Posts: 93 Karma: 32466 Join Date: Jul 2013 Location: Paris Device: Kobo Desktop, Kindle Desktop, Kobo Forma | 
				
				Mediapart recipe
			 
			
			I receive the message " File "site-packages\mechanize-0.2.5-py2.7.egg\mechanize\_urllib2_fork.py", line 1118, in do_open urllib2.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:581)>" when attempting to download from Mediapart. Username and password are ok. | 
|   |   | 
|  01-04-2016, 08:34 AM | #2 | 
| Guru            Posts: 800 Karma: 194644 Join Date: Dec 2007 Location: Argentina Device: Kindle Voyage | 
			
			What kind of operating system you use? That error indicates that you do not have installed Gandi intermediate certificate (Mediapart uses SSL certificate issued by gandi.net). https://wiki.gandi.net/en/ssl/intermediate I am sure Kovid can explain better where does mechanize looks for certificates etc. Last edited by kiklop74; 01-05-2016 at 10:54 AM. | 
|   |   | 
|  01-04-2016, 08:39 AM | #3 | 
| creator of calibre            Posts: 45,600 Karma: 28548974 Join Date: Oct 2006 Location: Mumbai, India Device: Various | 
			
			It looks for certificates in whatever the default store is for the OS. However, I think the recipe is broken anyway. It fails with an HTTP 404 for me, because the login page https://www.mediapart.fr/user no longer exists.
		 | 
|   |   | 
|  01-06-2016, 12:34 PM | #4 | 
| Junior Member  Posts: 3 Karma: 10 Join Date: Jan 2015 Device: none | 
			
			Hello, and happy new years...   I have an other error message me. Do you know if there are one solution to receipt the pdf of mediapart directly on calibre? My error message me: Récupérer des informations de Mediapart Resolved conversion options calibre version: 2.47.0 {'asciiize': False, 'author_sort': None, 'authors': None, 'base_font_size': 0, 'book_producer': None, 'change_justification': 'original', 'chapter': None, 'chapter_mark': 'pagebreak', 'comments': None, 'cover': None, 'debug_pipeline': None, 'dehyphenate': True, 'delete_blank_paragraphs': True, 'disable_font_rescaling': False, 'dont_download_recipe': False, 'dont_split_on_page_breaks': True, 'duplicate_links_in_toc': False, 'embed_all_fonts': False, 'embed_font_family': None, 'enable_heuristics': False, 'epub_flatten': False, 'epub_inline_toc': False, 'epub_toc_at_end': False, 'expand_css': False, 'extra_css': None, 'extract_to': None, 'filter_css': None, 'fix_indents': True, 'flow_size': 260, 'font_size_mapping': None, 'format_scene_breaks': True, 'html_unwrap_factor': 0.4, 'input_encoding': None, 'input_profile': <calibre.customize.profiles.InputProfile object at 0x028B1C30>, 'insert_blank_line': False, 'insert_blank_line_size': 0.5, 'insert_metadata': False, 'isbn': None, 'italicize_common_cases': True, 'keep_ligatures': False, 'language': None, 'level1_toc': None, 'level2_toc': None, 'level3_toc': None, 'line_height': 0, 'linearize_tables': False, 'lrf': False, 'margin_bottom': 5.0, 'margin_left': 5.0, 'margin_right': 5.0, 'margin_top': 5.0, 'markup_chapter_headings': True, 'max_toc_links': 50, 'minimum_line_height': 120.0, 'no_chapters_in_toc': False, 'no_default_epub_cover': False, 'no_inline_navbars': False, 'no_svg_cover': False, 'output_profile': <calibre.customize.profiles.TabletOutput object at 0x028BE230>, 'page_breaks_before': None, 'prefer_metadata_cover': False, 'preserve_cover_aspect_ratio': False, 'pretty_print': True, 'pubdate': None, 'publisher': None, 'rating': None, 'read_metadata_from_opf': None, 'remove_fake_margins': True, 'remove_first_image': False, 'remove_paragraph_spacing': False, 'remove_paragraph_spacing_indent_size': 1.5, 'renumber_headings': True, 'replace_scene_breaks': '', 'search_replace': None, 'series': None, 'series_index': None, 'smarten_punctuation': False, 'sr1_replace': '', 'sr1_search': '', 'sr2_replace': '', 'sr2_search': '', 'sr3_replace': '', 'sr3_search': '', 'start_reading_at': None, 'subset_embedded_fonts': False, 'tags': None, 'test': False, 'timestamp': None, 'title': None, 'title_sort': None, 'toc_filter': None, 'toc_threshold': 6, 'toc_title': None, 'unsmarten_punctuation': False, 'unwrap_lines': True, 'use_auto_toc': False, 'verbose': 2} InputFormatPlugin: Recipe Input running Using custom recipe Python function terminated unexpectedly <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:581)> (Error Code: 1) Traceback (most recent call last): File "site.py", line 132, in main File "site.py", line 109, in run_entry_point File "site-packages\calibre\utils\ipc\worker.py", line 190, in main File "site-packages\calibre\gui2\convert\gui_conversion.py", line 25, in gui_convert File "site-packages\calibre\ebooks\conversion\plumber.py", line 1051, in run File "site-packages\calibre\customize\conversion.py", line 241, in __call__ File "site-packages\calibre\ebooks\conversion\plugins\recipe_ input.py", line 116, in convert File "site-packages\calibre\web\feeds\news.py", line 918, in __init__ File "<string>", line 148, in get_browser File "site-packages\mechanize-0.2.5-py2.7.egg\mechanize\_mechanize.py", line 203, in open File "site-packages\mechanize-0.2.5-py2.7.egg\mechanize\_mechanize.py", line 230, in _mech_open File "site-packages\mechanize-0.2.5-py2.7.egg\mechanize\_opener.py", line 193, in open File "site-packages\mechanize-0.2.5-py2.7.egg\mechanize\_urllib2_fork.py", line 344, in _open File "site-packages\mechanize-0.2.5-py2.7.egg\mechanize\_urllib2_fork.py", line 332, in _call_chain File "site-packages\calibre\utils\browser.py", line 25, in https_open File "site-packages\mechanize-0.2.5-py2.7.egg\mechanize\_urllib2_fork.py", line 1118, in do_open urllib2.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:581)> | 
|   |   | 
|  01-06-2016, 03:38 PM | #5 | 
| Guru            Posts: 800 Karma: 194644 Join Date: Dec 2007 Location: Argentina Device: Kindle Voyage | 
			
			You have exactly the same error as the starter of this thread. See my earlier post in this thread. https://www.mobileread.com/forums/sho...99&postcount=2 | 
|   |   | 
|  01-08-2016, 05:09 PM | #6 | 
| Junior Member  Posts: 3 Karma: 10 Join Date: Jan 2015 Device: none | 
			
			Ha oki sorry, and thank you! :-) Can you tell me witch " intermediate certificate" I have to choose? SHA1 or SHA2? and after the Standard certificate or the Pro certificate? And is it the same between DER and PEM format? Sorry, I'm not good in informatique... rastapoil | 
|   |   | 
|  01-09-2016, 07:58 AM | #7 | 
| Guru            Posts: 800 Karma: 194644 Join Date: Dec 2007 Location: Argentina Device: Kindle Voyage | 
			
			You would need "Gandi Standard SSL CA 2" meaning SHA2 Standard intermediate certificate. https://www.gandi.net/static/CAs/Gan...dardSSLCA2.pem | 
|   |   | 
|  01-09-2016, 08:54 AM | #8 | 
| Grand Sorcerer            Posts: 13,685 Karma: 79983758 Join Date: Nov 2007 Location: Toronto Device: Libra H2O, Libra Colour | 
			
			For DER vs PEM see https://support.ssl.com/Knowledgebas...o-convert-them
		 | 
|   |   | 
|  01-20-2016, 08:06 AM | #9 | 
| Connoisseur            Posts: 93 Karma: 32466 Join Date: Jul 2013 Location: Paris Device: Kobo Desktop, Kindle Desktop, Kobo Forma | 
			
			Yes Mediapart login page has changed. Thank you Kovid.
		 | 
|   |   | 
|  01-21-2016, 12:34 PM | #10 | 
| Member  Posts: 11 Karma: 10 Join Date: Jan 2016 Device: no | 
			
			Hi, I am new on the forum. I have problems with the mediapart recipe written in 2013 by M. Godlewski and L. Gesbert, so I tried to edit it. What I did was simply 
 I get an error at the end: Code: AttributeError: 'NoneType' object has no attribute 'find' I do not know if I am the only one with this problem, but any help would be much welcome. Last edited by DanielBonnery; 03-03-2016 at 09:34 PM. Reason: I deleted the big part of the code that was useless. | 
|   |   | 
|  01-24-2016, 11:19 PM | #11 | 
| creator of calibre            Posts: 45,600 Karma: 28548974 Join Date: Oct 2006 Location: Mumbai, India Device: Various | 
			
			That indicates the markup of the index page https://www.mediapart.fr/journal/fil-dactualites you will need to fix the my_parse_index() function to handle the new markup. | 
|   |   | 
|  03-03-2016, 06:46 AM | #12 | 
| Connoisseur            Posts: 93 Karma: 32466 Join Date: Jul 2013 Location: Paris Device: Kobo Desktop, Kindle Desktop, Kobo Forma | 
			
			Apparently this recipe is not supported. Time to prune it from Calibre maybe?
		 | 
|   |   | 
|  03-03-2016, 08:33 PM | #13 | 
| Member  Posts: 11 Karma: 10 Join Date: Jan 2016 Device: no | 
				
				New version of the recipe.
			 
			
			Hi, please find attached here a new version of the recipe. It can be improved, but at least it is working. Bests, Daniel. Code: # -*- mode:python -*-
from __future__ import unicode_literals
__license__   = 'GPL v3'
__copyright__ = '2016, Daniel Bonnery ? (contact: DanielBonnery sur mobileread.com) 2009, Mathieu Godlewski <mathieu at godlewski.fr>; 2010-2012, Louis Gesbert <meta at antislash dot info>'
'''
Mediapart
'''
__author__ = '2016, Daniel Bonnery (contact: DanielBonnery sur mobileread.com), 2009, Mathieu Godlewski <mathieu at godlewski.fr>; 2010-2012, Louis Gesbert <meta at antislash dot info>'
import re
from calibre.ebooks.BeautifulSoup import BeautifulSoup
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.web.feeds import feeds_from_index
from datetime import date,timedelta
class Mediapart(BasicNewsRecipe):
    title = 'Mediapart'
    __author__ = 'Daniel Bonnery from a version by Mathieu Godlewski, Louis Gesbert'
    description = 'Global news in french from news site Mediapart'
    publication_type = 'newspaper'
    language = 'fr'
    needs_subscription = True
    oldest_article = 2
    use_embedded_content = False
    no_stylesheets = True
    cover_url = 'https://static.mediapart.fr/files/M%20Philips/logo-mediapart.png'
# --
    oldest_article_date = date.today() - timedelta(days=oldest_article)
# -- get the index (the feed at 'http://www.mediapart.fr/articles/feed' only has
#    the 10 last elements :/)
    feeds =  [
        ('La Une', 'http://www.mediapart.fr/articles/feed'),
    ]
    def parse_feeds(self):
        feeds = super(Mediapart, self).parse_feeds()
        feeds += feeds_from_index(self.my_parse_index(feeds))
        return feeds
    def my_parse_index(self, la_une):
        articles = []
        breves = []
        liens = []
        confidentiels = []
       
        soup = self.index_to_soup('https://www.mediapart.fr/journal/fil-dactualites')
        page = soup.find('div', {'class':'page-content bust'})
        fils = page.find('ul', {'class':'post-list universe-journal'})
        for article in fils.findAll('li'):
            try:
                title = article.find('h3',recursive=False)
                if title is None or title['class'] == 'title-specific':
                    continue
                # print "found fil ",title
                article_type = article.find('a', {'href': re.compile(r'.*\/type-darticles\/.*')}).renderContents()
                # print "kind: ",article_type
                for s in title('span'):
                    s.replaceWith(s.renderContents() + "\n")
                url = title.find('a', href=True)['href']
                              
                #article_date = self.parse_french_date(article.find("span", "article-date").renderContents())
                #print("################################# 9")
                #print(article_date)
                #if article_date < self.oldest_article_date:
                    # print "too old"
                #    continue
                authors = article.findAll('a',{'class':re.compile(r'\bjournalist\b')})
                authors = [self.tag_to_string(a) for a in authors]
                #description = article.find('div', {'class': lambda c: c != 'taxonomy-teaser'}, recursive=False).findAll('p')
                # print "fil ",title," by ",authors," : ",description
                summary = {
                    'title': self.tag_to_string(title).strip(),
                    'author': ', '.join(authors),
                    'url': url,
                    #'date': u'' + article_date.strftime("%A %d %b %Y"),
                    'description': '\n'.join([self.tag_to_string(d) for d in description]),
                }
                {
                    "Brève": breves,
                    "Lien": liens,
                    "Confidentiel": confidentiels,
                }.get(article_type).append(summary)
            except:
                pass
        # print 'La Une: ', len(la_une), ' articles'
        # for a in la_une: print a["title"]
        # print 'Brèves: ', len(breves), ' articles'
        # print 'Revue web: ', len(liens), ' articles'
        # print 'Confidentiel: ', len(confidentiels), ' articles'
        articles += [('Brèves', breves)] if breves else []
        articles += [('Revue du Web', liens)] if liens else []
        articles += [('Confidentiel', confidentiels)] if confidentiels else []
        return articles
# -- print-version
    conversion_options = {'smarten_punctuation' : True}
    remove_tags = [dict(name='div', attrs={'class':'print-source_url'})]
    # non-locale specific date parse (strptime("%d %b %Y",s) would work with french locale)
    def parse_french_date(self, date_str):
        date_arr = date_str.lower().split()
        return date(day=int(date_arr[0]),
                    year=int(date_arr[2]),
                    month=[None, 'janvier', 'février', 'mars', 'avril', 'mai', 'juin', 'juillet',
                       'août', 'septembre', 'octobre', 'novembre', 'décembre'].index(date_arr[1]))
    def print_version(self, url):
        raw = self.browser.open(url).read()
        soup = BeautifulSoup(raw.decode('utf8', 'replace'))
        # Filter old articles
 #       article_date = self.parse_french_date(self.tag_to_string(soup.find('span', 'article-date')))
  #      if article_date < self.oldest_article_date:
   #         return None
        tools = soup.find('li', {'class':'print'})
        link = tools.find('a', {'href': re.compile(r'\/print\/.*')})
        print(link['href'])
      #       if link is None:
 #           print 'Error: print link not found'
 #           return None
        return 'https://mediapart.fr' + link['href']
#        return url
  
# -- Handle login
    def get_browser(self):
        br = BasicNewsRecipe.get_browser(self)
        if self.username is not None and self.password is not None:
            br.open('https://www.mediapart.fr/login')
            br.select_form(nr=1)
            br['name'] = self.username
            br['password'] = self.password
            br.submit()
        return br
    # This is a workaround articles with scribd content that include
    # <body></body> tags _within_ the body
    preprocess_regexps = [
        (re.compile(r'(<body.*?>)(.*)</body>', re.IGNORECASE|re.DOTALL),
         lambda match:
             match.group(1) + re.sub(
                 re.compile(r'</?body>', re.IGNORECASE|re.DOTALL),'', match.group(2)) + '</body>')
    ]Last edited by DanielBonnery; 03-03-2016 at 09:37 PM. | 
|   |   | 
|  03-06-2016, 12:29 PM | #14 | 
| Connoisseur            Posts: 93 Karma: 32466 Join Date: Jul 2013 Location: Paris Device: Kobo Desktop, Kindle Desktop, Kobo Forma | |
|   |   | 
|  03-06-2016, 09:23 PM | #15 | |
| Member  Posts: 11 Karma: 10 Join Date: Jan 2016 Device: no | Quote: 
 thank you for your answer, the mediapart recipe may not have been updated yet in calibre, have you tried the one posted above ? On my PC it works fine, the log is posted below. Bests, Daniel Code: Récupérer des informations de Mediapart
Resolved conversion options
calibre version: 2.49.0
{'asciiize': False,
 'author_sort': None,
 'authors': None,
 'base_font_size': 0,
 'book_producer': None,
 'change_justification': 'original',
 'chapter': None,
 'chapter_mark': 'pagebreak',
 'comments': None,
 'cover': None,
 'debug_pipeline': None,
 'dehyphenate': True,
 'delete_blank_paragraphs': True,
 'disable_font_rescaling': False,
 'dont_compress': False,
 'dont_download_recipe': False,
 'duplicate_links_in_toc': False,
 'embed_all_fonts': False,
 'embed_font_family': None,
 'enable_heuristics': False,
 'expand_css': False,
 'extra_css': None,
 'extract_to': None,
 'filter_css': None,
 'fix_indents': True,
 'font_size_mapping': None,
 'format_scene_breaks': True,
 'html_unwrap_factor': 0.4,
 'input_encoding': None,
 'input_profile': <calibre.customize.profiles.InputProfile object at 0x7f17f06a6090>,
 'insert_blank_line': False,
 'insert_blank_line_size': 0.5,
 'insert_metadata': False,
 'isbn': None,
 'italicize_common_cases': True,
 'keep_ligatures': False,
 'language': None,
 'level1_toc': None,
 'level2_toc': None,
 'level3_toc': None,
 'line_height': 0,
 'linearize_tables': False,
 'lrf': False,
 'margin_bottom': 5.0,
 'margin_left': 5.0,
 'margin_right': 5.0,
 'margin_top': 5.0,
 'markup_chapter_headings': True,
 'max_toc_links': 50,
 'minimum_line_height': 120.0,
 'mobi_file_type': 'old',
 'mobi_ignore_margins': False,
 'mobi_keep_original_images': False,
 'mobi_toc_at_start': False,
 'no_chapters_in_toc': False,
 'no_inline_navbars': True,
 'no_inline_toc': False,
 'output_profile': <calibre.customize.profiles.KindleDXOutput object at 0x7f17f06a6790>,
 'page_breaks_before': None,
 'personal_doc': '[PDOC]',
 'prefer_author_sort': False,
 'prefer_metadata_cover': False,
 'pretty_print': False,
 'pubdate': None,
 'publisher': None,
 'rating': None,
 'read_metadata_from_opf': None,
 'remove_fake_margins': True,
 'remove_first_image': False,
 'remove_paragraph_spacing': False,
 'remove_paragraph_spacing_indent_size': 1.5,
 'renumber_headings': True,
 'replace_scene_breaks': '',
 'search_replace': None,
 'series': None,
 'series_index': None,
 'share_not_sync': False,
 'smarten_punctuation': False,
 'sr1_replace': '',
 'sr1_search': '',
 'sr2_replace': '',
 'sr2_search': '',
 'sr3_replace': '',
 'sr3_search': '',
 'start_reading_at': None,
 'subset_embedded_fonts': False,
 'tags': None,
 'test': False,
 'timestamp': None,
 'title': None,
 'title_sort': None,
 'toc_filter': None,
 'toc_threshold': 6,
 'toc_title': None,
 'unsmarten_punctuation': False,
 'unwrap_lines': True,
 'use_auto_toc': False,
 'verbose': 2}
InputFormatPlugin: Recipe Input running
Using custom recipe
Synthesizing mastheadImage
[ part of the log output by print statements, deleted here to hide mediapart secret static urls  ]
Parsing all content...
Parsing feed_0/article_1/index.html ...
Parsing feed_0/index.html ...
Initial parse failed, using more forgiving parsers
Parsing feed_0/index.html as HTML
Parsing feed_0/article_9/index.html ...
Parsing feed_0/article_5/index.html ...
Parsing feed_0/article_7/index.html ...
Parsing feed_0/article_3/index.html ...
Parsing feed_0/article_4/index.html ...
Parsing feed_0/article_0/index.html ...
Parsing feed_0/article_8/index.html ...
Parsing feed_0/article_2/index.html ...
replaced 2 nbsp indents with inline styles
Parsing index.html ...
Forcing index.html into XHTML namespace
Parsing feed_0/article_6/index.html ...
Referenced file u'feed_1/index.html' not found
Referenced file u'/favicon.png' not found
Reading TOC from NCX...
Merging user specified metadata...
Detecting structure...
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Removing fake margins...
Found 30 items of level: div_2
Found 33 items of level: div_1
Found 4 items of level: div_5
Found 51 items of level: div_4
Found 12 items of level: p_2
Found 220 items of level: p_3
Ignoring level p_2
Ignoring level div_5
div_2  left margin stats: Counter({u'': 23})
div_2  right margin stats: Counter({u'': 23})
div_1  left margin stats: Counter({u'': 11})
div_1  right margin stats: Counter({u'': 11})
div_4  left margin stats: Counter({u'': 51})
div_4  right margin stats: Counter({u'': 51})
p_3  left margin stats: Counter({u'0': 220})
p_3  right margin stats: Counter({u'0': 220})
Cleaning up manifest...
Trimming unused files from manifest...
Creating MOBI Output...
Serializing resources...
Converting TOC for MOBI periodical indexing...
Using mastheadImage supplied in manifest...
Creating MOBI 6 output
Generating in-line TOC...
Applying case-transforming CSS...
Parsing manglecase.css ...
Parsing tocstyle.css ...
Rasterizing SVG images...
Converting XHTML to Mobipocket markup...
Serializing markup content...
  Compressing markup content...
Generating MOBI index for a periodical
MOBI output written to /tmp/calibre_2.49.0_tmp_xJe3BH/FSqo3L_recipe_out.mobiLast edited by DanielBonnery; 03-06-2016 at 09:26 PM. | |
|   |   | 
|  | 
| Thread Tools | Search this Thread | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Rules for mediapart.fr and rue89.com (french news websites) | Metapioca | Recipes | 18 | 08-25-2013 08:48 AM | 
| Recipe help please | wmaurer | Recipes | 0 | 04-23-2012 03:48 AM | 
| Recipe works when mocked up as Python file, fails when converted to Recipe | ode | Recipes | 7 | 09-04-2011 04:57 AM | 
| recipe please | Torx | Recipes | 0 | 01-22-2011 12:18 PM | 
| Recipe Help | lrain5 | Calibre | 3 | 05-09-2010 10:42 PM |