![]() |
#1 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 93
Karma: 32466
Join Date: Jul 2013
Location: Paris
Device: Kobo Desktop, Kindle Desktop, Kobo Forma
|
Mediapart recipe
I receive the message " File "site-packages\mechanize-0.2.5-py2.7.egg\mechanize\_urllib2_fork.py", line 1118, in do_open
urllib2.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:581)>" when attempting to download from Mediapart. Username and password are ok. |
![]() |
![]() |
![]() |
#2 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
What kind of operating system you use? That error indicates that you do not have installed Gandi intermediate certificate (Mediapart uses SSL certificate issued by gandi.net).
https://wiki.gandi.net/en/ssl/intermediate I am sure Kovid can explain better where does mechanize looks for certificates etc. Last edited by kiklop74; 01-05-2016 at 10:54 AM. |
![]() |
![]() |
![]() |
#3 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,355
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
It looks for certificates in whatever the default store is for the OS. However, I think the recipe is broken anyway. It fails with an HTTP 404 for me, because the login page https://www.mediapart.fr/user no longer exists.
|
![]() |
![]() |
![]() |
#4 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Jan 2015
Device: none
|
Hello, and happy new years...
![]() I have an other error message me. Do you know if there are one solution to receipt the pdf of mediapart directly on calibre? My error message me: Récupérer des informations de Mediapart Resolved conversion options calibre version: 2.47.0 {'asciiize': False, 'author_sort': None, 'authors': None, 'base_font_size': 0, 'book_producer': None, 'change_justification': 'original', 'chapter': None, 'chapter_mark': 'pagebreak', 'comments': None, 'cover': None, 'debug_pipeline': None, 'dehyphenate': True, 'delete_blank_paragraphs': True, 'disable_font_rescaling': False, 'dont_download_recipe': False, 'dont_split_on_page_breaks': True, 'duplicate_links_in_toc': False, 'embed_all_fonts': False, 'embed_font_family': None, 'enable_heuristics': False, 'epub_flatten': False, 'epub_inline_toc': False, 'epub_toc_at_end': False, 'expand_css': False, 'extra_css': None, 'extract_to': None, 'filter_css': None, 'fix_indents': True, 'flow_size': 260, 'font_size_mapping': None, 'format_scene_breaks': True, 'html_unwrap_factor': 0.4, 'input_encoding': None, 'input_profile': <calibre.customize.profiles.InputProfile object at 0x028B1C30>, 'insert_blank_line': False, 'insert_blank_line_size': 0.5, 'insert_metadata': False, 'isbn': None, 'italicize_common_cases': True, 'keep_ligatures': False, 'language': None, 'level1_toc': None, 'level2_toc': None, 'level3_toc': None, 'line_height': 0, 'linearize_tables': False, 'lrf': False, 'margin_bottom': 5.0, 'margin_left': 5.0, 'margin_right': 5.0, 'margin_top': 5.0, 'markup_chapter_headings': True, 'max_toc_links': 50, 'minimum_line_height': 120.0, 'no_chapters_in_toc': False, 'no_default_epub_cover': False, 'no_inline_navbars': False, 'no_svg_cover': False, 'output_profile': <calibre.customize.profiles.TabletOutput object at 0x028BE230>, 'page_breaks_before': None, 'prefer_metadata_cover': False, 'preserve_cover_aspect_ratio': False, 'pretty_print': True, 'pubdate': None, 'publisher': None, 'rating': None, 'read_metadata_from_opf': None, 'remove_fake_margins': True, 'remove_first_image': False, 'remove_paragraph_spacing': False, 'remove_paragraph_spacing_indent_size': 1.5, 'renumber_headings': True, 'replace_scene_breaks': '', 'search_replace': None, 'series': None, 'series_index': None, 'smarten_punctuation': False, 'sr1_replace': '', 'sr1_search': '', 'sr2_replace': '', 'sr2_search': '', 'sr3_replace': '', 'sr3_search': '', 'start_reading_at': None, 'subset_embedded_fonts': False, 'tags': None, 'test': False, 'timestamp': None, 'title': None, 'title_sort': None, 'toc_filter': None, 'toc_threshold': 6, 'toc_title': None, 'unsmarten_punctuation': False, 'unwrap_lines': True, 'use_auto_toc': False, 'verbose': 2} InputFormatPlugin: Recipe Input running Using custom recipe Python function terminated unexpectedly <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:581)> (Error Code: 1) Traceback (most recent call last): File "site.py", line 132, in main File "site.py", line 109, in run_entry_point File "site-packages\calibre\utils\ipc\worker.py", line 190, in main File "site-packages\calibre\gui2\convert\gui_conversion.py", line 25, in gui_convert File "site-packages\calibre\ebooks\conversion\plumber.py", line 1051, in run File "site-packages\calibre\customize\conversion.py", line 241, in __call__ File "site-packages\calibre\ebooks\conversion\plugins\recipe_ input.py", line 116, in convert File "site-packages\calibre\web\feeds\news.py", line 918, in __init__ File "<string>", line 148, in get_browser File "site-packages\mechanize-0.2.5-py2.7.egg\mechanize\_mechanize.py", line 203, in open File "site-packages\mechanize-0.2.5-py2.7.egg\mechanize\_mechanize.py", line 230, in _mech_open File "site-packages\mechanize-0.2.5-py2.7.egg\mechanize\_opener.py", line 193, in open File "site-packages\mechanize-0.2.5-py2.7.egg\mechanize\_urllib2_fork.py", line 344, in _open File "site-packages\mechanize-0.2.5-py2.7.egg\mechanize\_urllib2_fork.py", line 332, in _call_chain File "site-packages\calibre\utils\browser.py", line 25, in https_open File "site-packages\mechanize-0.2.5-py2.7.egg\mechanize\_urllib2_fork.py", line 1118, in do_open urllib2.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:581)> |
![]() |
![]() |
![]() |
#5 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
You have exactly the same error as the starter of this thread. See my earlier post in this thread.
https://www.mobileread.com/forums/sho...99&postcount=2 |
![]() |
![]() |
![]() |
#6 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Jan 2015
Device: none
|
Ha oki sorry, and thank you! :-)
Can you tell me witch " intermediate certificate" I have to choose? SHA1 or SHA2? and after the Standard certificate or the Pro certificate? And is it the same between DER and PEM format? Sorry, I'm not good in informatique... rastapoil |
![]() |
![]() |
![]() |
#7 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
You would need "Gandi Standard SSL CA 2" meaning SHA2 Standard intermediate certificate.
https://www.gandi.net/static/CAs/Gan...dardSSLCA2.pem |
![]() |
![]() |
![]() |
#8 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 13,524
Karma: 78910202
Join Date: Nov 2007
Location: Toronto
Device: Libra H2O, Libra Colour
|
For DER vs PEM see https://support.ssl.com/Knowledgebas...o-convert-them
|
![]() |
![]() |
![]() |
#9 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 93
Karma: 32466
Join Date: Jul 2013
Location: Paris
Device: Kobo Desktop, Kindle Desktop, Kobo Forma
|
Yes Mediapart login page has changed. Thank you Kovid.
|
![]() |
![]() |
![]() |
#10 |
Member
![]() Posts: 11
Karma: 10
Join Date: Jan 2016
Device: no
|
Hi,
I am new on the forum. I have problems with the mediapart recipe written in 2013 by M. Godlewski and L. Gesbert, so I tried to edit it. What I did was simply
I get an error at the end: Code:
AttributeError: 'NoneType' object has no attribute 'find' I do not know if I am the only one with this problem, but any help would be much welcome. Last edited by DanielBonnery; 03-03-2016 at 09:34 PM. Reason: I deleted the big part of the code that was useless. |
![]() |
![]() |
![]() |
#11 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,355
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
That indicates the markup of the index page https://www.mediapart.fr/journal/fil-dactualites
you will need to fix the my_parse_index() function to handle the new markup. |
![]() |
![]() |
![]() |
#12 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 93
Karma: 32466
Join Date: Jul 2013
Location: Paris
Device: Kobo Desktop, Kindle Desktop, Kobo Forma
|
Apparently this recipe is not supported. Time to prune it from Calibre maybe?
|
![]() |
![]() |
![]() |
#13 |
Member
![]() Posts: 11
Karma: 10
Join Date: Jan 2016
Device: no
|
New version of the recipe.
Hi, please find attached here a new version of the recipe.
It can be improved, but at least it is working. Bests, Daniel. Code:
# -*- mode:python -*- from __future__ import unicode_literals __license__ = 'GPL v3' __copyright__ = '2016, Daniel Bonnery ? (contact: DanielBonnery sur mobileread.com) 2009, Mathieu Godlewski <mathieu at godlewski.fr>; 2010-2012, Louis Gesbert <meta at antislash dot info>' ''' Mediapart ''' __author__ = '2016, Daniel Bonnery (contact: DanielBonnery sur mobileread.com), 2009, Mathieu Godlewski <mathieu at godlewski.fr>; 2010-2012, Louis Gesbert <meta at antislash dot info>' import re from calibre.ebooks.BeautifulSoup import BeautifulSoup from calibre.web.feeds.news import BasicNewsRecipe from calibre.web.feeds import feeds_from_index from datetime import date,timedelta class Mediapart(BasicNewsRecipe): title = 'Mediapart' __author__ = 'Daniel Bonnery from a version by Mathieu Godlewski, Louis Gesbert' description = 'Global news in french from news site Mediapart' publication_type = 'newspaper' language = 'fr' needs_subscription = True oldest_article = 2 use_embedded_content = False no_stylesheets = True cover_url = 'https://static.mediapart.fr/files/M%20Philips/logo-mediapart.png' # -- oldest_article_date = date.today() - timedelta(days=oldest_article) # -- get the index (the feed at 'http://www.mediapart.fr/articles/feed' only has # the 10 last elements :/) feeds = [ ('La Une', 'http://www.mediapart.fr/articles/feed'), ] def parse_feeds(self): feeds = super(Mediapart, self).parse_feeds() feeds += feeds_from_index(self.my_parse_index(feeds)) return feeds def my_parse_index(self, la_une): articles = [] breves = [] liens = [] confidentiels = [] soup = self.index_to_soup('https://www.mediapart.fr/journal/fil-dactualites') page = soup.find('div', {'class':'page-content bust'}) fils = page.find('ul', {'class':'post-list universe-journal'}) for article in fils.findAll('li'): try: title = article.find('h3',recursive=False) if title is None or title['class'] == 'title-specific': continue # print "found fil ",title article_type = article.find('a', {'href': re.compile(r'.*\/type-darticles\/.*')}).renderContents() # print "kind: ",article_type for s in title('span'): s.replaceWith(s.renderContents() + "\n") url = title.find('a', href=True)['href'] #article_date = self.parse_french_date(article.find("span", "article-date").renderContents()) #print("################################# 9") #print(article_date) #if article_date < self.oldest_article_date: # print "too old" # continue authors = article.findAll('a',{'class':re.compile(r'\bjournalist\b')}) authors = [self.tag_to_string(a) for a in authors] #description = article.find('div', {'class': lambda c: c != 'taxonomy-teaser'}, recursive=False).findAll('p') # print "fil ",title," by ",authors," : ",description summary = { 'title': self.tag_to_string(title).strip(), 'author': ', '.join(authors), 'url': url, #'date': u'' + article_date.strftime("%A %d %b %Y"), 'description': '\n'.join([self.tag_to_string(d) for d in description]), } { "Brève": breves, "Lien": liens, "Confidentiel": confidentiels, }.get(article_type).append(summary) except: pass # print 'La Une: ', len(la_une), ' articles' # for a in la_une: print a["title"] # print 'Brèves: ', len(breves), ' articles' # print 'Revue web: ', len(liens), ' articles' # print 'Confidentiel: ', len(confidentiels), ' articles' articles += [('Brèves', breves)] if breves else [] articles += [('Revue du Web', liens)] if liens else [] articles += [('Confidentiel', confidentiels)] if confidentiels else [] return articles # -- print-version conversion_options = {'smarten_punctuation' : True} remove_tags = [dict(name='div', attrs={'class':'print-source_url'})] # non-locale specific date parse (strptime("%d %b %Y",s) would work with french locale) def parse_french_date(self, date_str): date_arr = date_str.lower().split() return date(day=int(date_arr[0]), year=int(date_arr[2]), month=[None, 'janvier', 'février', 'mars', 'avril', 'mai', 'juin', 'juillet', 'août', 'septembre', 'octobre', 'novembre', 'décembre'].index(date_arr[1])) def print_version(self, url): raw = self.browser.open(url).read() soup = BeautifulSoup(raw.decode('utf8', 'replace')) # Filter old articles # article_date = self.parse_french_date(self.tag_to_string(soup.find('span', 'article-date'))) # if article_date < self.oldest_article_date: # return None tools = soup.find('li', {'class':'print'}) link = tools.find('a', {'href': re.compile(r'\/print\/.*')}) print(link['href']) # if link is None: # print 'Error: print link not found' # return None return 'https://mediapart.fr' + link['href'] # return url # -- Handle login def get_browser(self): br = BasicNewsRecipe.get_browser(self) if self.username is not None and self.password is not None: br.open('https://www.mediapart.fr/login') br.select_form(nr=1) br['name'] = self.username br['password'] = self.password br.submit() return br # This is a workaround articles with scribd content that include # <body></body> tags _within_ the body preprocess_regexps = [ (re.compile(r'(<body.*?>)(.*)</body>', re.IGNORECASE|re.DOTALL), lambda match: match.group(1) + re.sub( re.compile(r'</?body>', re.IGNORECASE|re.DOTALL),'', match.group(2)) + '</body>') ] Last edited by DanielBonnery; 03-03-2016 at 09:37 PM. |
![]() |
![]() |
![]() |
#14 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 93
Karma: 32466
Join Date: Jul 2013
Location: Paris
Device: Kobo Desktop, Kindle Desktop, Kobo Forma
|
|
![]() |
![]() |
![]() |
#15 | |
Member
![]() Posts: 11
Karma: 10
Join Date: Jan 2016
Device: no
|
Quote:
thank you for your answer, the mediapart recipe may not have been updated yet in calibre, have you tried the one posted above ? On my PC it works fine, the log is posted below. Bests, Daniel Code:
Récupérer des informations de Mediapart Resolved conversion options calibre version: 2.49.0 {'asciiize': False, 'author_sort': None, 'authors': None, 'base_font_size': 0, 'book_producer': None, 'change_justification': 'original', 'chapter': None, 'chapter_mark': 'pagebreak', 'comments': None, 'cover': None, 'debug_pipeline': None, 'dehyphenate': True, 'delete_blank_paragraphs': True, 'disable_font_rescaling': False, 'dont_compress': False, 'dont_download_recipe': False, 'duplicate_links_in_toc': False, 'embed_all_fonts': False, 'embed_font_family': None, 'enable_heuristics': False, 'expand_css': False, 'extra_css': None, 'extract_to': None, 'filter_css': None, 'fix_indents': True, 'font_size_mapping': None, 'format_scene_breaks': True, 'html_unwrap_factor': 0.4, 'input_encoding': None, 'input_profile': <calibre.customize.profiles.InputProfile object at 0x7f17f06a6090>, 'insert_blank_line': False, 'insert_blank_line_size': 0.5, 'insert_metadata': False, 'isbn': None, 'italicize_common_cases': True, 'keep_ligatures': False, 'language': None, 'level1_toc': None, 'level2_toc': None, 'level3_toc': None, 'line_height': 0, 'linearize_tables': False, 'lrf': False, 'margin_bottom': 5.0, 'margin_left': 5.0, 'margin_right': 5.0, 'margin_top': 5.0, 'markup_chapter_headings': True, 'max_toc_links': 50, 'minimum_line_height': 120.0, 'mobi_file_type': 'old', 'mobi_ignore_margins': False, 'mobi_keep_original_images': False, 'mobi_toc_at_start': False, 'no_chapters_in_toc': False, 'no_inline_navbars': True, 'no_inline_toc': False, 'output_profile': <calibre.customize.profiles.KindleDXOutput object at 0x7f17f06a6790>, 'page_breaks_before': None, 'personal_doc': '[PDOC]', 'prefer_author_sort': False, 'prefer_metadata_cover': False, 'pretty_print': False, 'pubdate': None, 'publisher': None, 'rating': None, 'read_metadata_from_opf': None, 'remove_fake_margins': True, 'remove_first_image': False, 'remove_paragraph_spacing': False, 'remove_paragraph_spacing_indent_size': 1.5, 'renumber_headings': True, 'replace_scene_breaks': '', 'search_replace': None, 'series': None, 'series_index': None, 'share_not_sync': False, 'smarten_punctuation': False, 'sr1_replace': '', 'sr1_search': '', 'sr2_replace': '', 'sr2_search': '', 'sr3_replace': '', 'sr3_search': '', 'start_reading_at': None, 'subset_embedded_fonts': False, 'tags': None, 'test': False, 'timestamp': None, 'title': None, 'title_sort': None, 'toc_filter': None, 'toc_threshold': 6, 'toc_title': None, 'unsmarten_punctuation': False, 'unwrap_lines': True, 'use_auto_toc': False, 'verbose': 2} InputFormatPlugin: Recipe Input running Using custom recipe Synthesizing mastheadImage [ part of the log output by print statements, deleted here to hide mediapart secret static urls ] Parsing all content... Parsing feed_0/article_1/index.html ... Parsing feed_0/index.html ... Initial parse failed, using more forgiving parsers Parsing feed_0/index.html as HTML Parsing feed_0/article_9/index.html ... Parsing feed_0/article_5/index.html ... Parsing feed_0/article_7/index.html ... Parsing feed_0/article_3/index.html ... Parsing feed_0/article_4/index.html ... Parsing feed_0/article_0/index.html ... Parsing feed_0/article_8/index.html ... Parsing feed_0/article_2/index.html ... replaced 2 nbsp indents with inline styles Parsing index.html ... Forcing index.html into XHTML namespace Parsing feed_0/article_6/index.html ... Referenced file u'feed_1/index.html' not found Referenced file u'/favicon.png' not found Reading TOC from NCX... Merging user specified metadata... Detecting structure... Flattening CSS and remapping font sizes... Source base font size is 12.00000pt Removing fake margins... Found 30 items of level: div_2 Found 33 items of level: div_1 Found 4 items of level: div_5 Found 51 items of level: div_4 Found 12 items of level: p_2 Found 220 items of level: p_3 Ignoring level p_2 Ignoring level div_5 div_2 left margin stats: Counter({u'': 23}) div_2 right margin stats: Counter({u'': 23}) div_1 left margin stats: Counter({u'': 11}) div_1 right margin stats: Counter({u'': 11}) div_4 left margin stats: Counter({u'': 51}) div_4 right margin stats: Counter({u'': 51}) p_3 left margin stats: Counter({u'0': 220}) p_3 right margin stats: Counter({u'0': 220}) Cleaning up manifest... Trimming unused files from manifest... Creating MOBI Output... Serializing resources... Converting TOC for MOBI periodical indexing... Using mastheadImage supplied in manifest... Creating MOBI 6 output Generating in-line TOC... Applying case-transforming CSS... Parsing manglecase.css ... Parsing tocstyle.css ... Rasterizing SVG images... Converting XHTML to Mobipocket markup... Serializing markup content... Compressing markup content... Generating MOBI index for a periodical MOBI output written to /tmp/calibre_2.49.0_tmp_xJe3BH/FSqo3L_recipe_out.mobi Last edited by DanielBonnery; 03-06-2016 at 09:26 PM. |
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Rules for mediapart.fr and rue89.com (french news websites) | Metapioca | Recipes | 18 | 08-25-2013 08:48 AM |
Recipe help please | wmaurer | Recipes | 0 | 04-23-2012 03:48 AM |
Recipe works when mocked up as Python file, fails when converted to Recipe | ode | Recipes | 7 | 09-04-2011 04:57 AM |
recipe please | Torx | Recipes | 0 | 01-22-2011 12:18 PM |
Recipe Help | lrain5 | Calibre | 3 | 05-09-2010 10:42 PM |