09-09-2020, 09:16 AM | #1 |
Member
Posts: 10
Karma: 10
Join Date: Sep 2020
Device: KOBO Forma
|
No articles (only big titles) in 'Mediapart'
Hi folks,
I first tried to mail contact (last sunday) Daniel Bonnery, in order not to bother you with it, but hmmm... no answer (for the moment). So, today I decided to ask here. The issue -- as said in the title -- is that Calibre is fetching the news on daily french journal "Mediapart", no problem, BUT ... when it comes to look inside the epub, it results in only one blank page (plus the cover image) with just the big titles ("La Une" and "Brèves") plus the links ("Section suivante" and "Menu principal") and ... that's all I got. No article, nor at least their titles, almost nothing. I'm using OS debian (buster / stable) And 'Calibre' (from calibre site, not debian package, as recommanded) 4.23 version. No problem with 'Calibre' itself (which is an awesome work ! ), only with the recipe. I've tried with another recipe ("20minutes") and this one works as expected. I've also try some python snippets that I've seen Kovid added recently in some problematic recipes, they broke nothing ... nor fixed the issue(s). :/ As i've said in my « Introducing myself », I'm not good at all at Python, I just can try a few things, as I'm coding shell scripts since 20 years, that's all. As you'll evidently see, the issue seems coming from a 'Failed to find print version for: https://...' error for each article URI. Here is the result of my tests in CLI (with 'ebook-convert') [I omit the green lines as they aren't so relevant (IMHO) in regard of the place they take] : [I must precise that plugins which seems failing to initialize (in the 4 first lines) are well in the good (?) place : '/home/xxxxx/.config/calibre/plugins/'] Code:
ebook-convert mediapart.recipe .epub --username 'xxxxxxxx' --password 'xxxxxxxxx.' --test -vv --debug-pipeline debug Failed to initialize plugin: u'/home/xxxxx/.config/calibre/plugins/KePub Output.zip' Failed to initialize plugin: u'/home/xxxxx/.config/calibre/plugins/KoboTouchExtended.zip' Failed to initialize plugin: u'/home/xxxxx/.config/calibre/plugins/KePub Metadata Reader.zip' Failed to initialize plugin: u'/home/xxxxxx/.config/calibre/plugins/KePub Metadata Writer.zip' Conversion options changed from defaults: debug_pipeline: u'debug' test: (2, 2) verbose: 2 [snipped irrelevant green lines here] 1% Conversion de l’entrée en HTML… InputFormatPlugin: Recipe Input running Using custom recipe Using user agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36 1% Récupération des flux… 1% Récupération du flux La Une... 1% Tentative de téléchargement de la couverture… 34% Téléchargement de la couverture de https://static.mediapart.fr/files/M%20Philips/logo-mediapart.png 1% Génération du titre de journal Synthesizing mastheadImage QApplication: invalid style override 'gtk2' passed, ignoring it. Available styles: Windows, Fusion Failed to find print version for: https://www.mediapart.fr/journal/international/060920/l-armee-francaise-arme-ses-drones-mais-le-debat-est-confisque Traceback (most recent call last): File "site-packages/calibre/web/feeds/news.py", line 1247, in build_index File "<string>", line 143, in print_version TypeError: 'NoneType' object has no attribute '__getitem__' Failed to find print version for: https://www.mediapart.fr/journal/international/060920/au-sahel-le-spectre-de-la-menace-fantome Traceback (most recent call last): File "site-packages/calibre/web/feeds/news.py", line 1247, in build_index File "<string>", line 143, in print_version TypeError: 'NoneType' object has no attribute '__getitem__' Failed to find print version for: https://www.mediapart.fr/journal/fil-dactualites/060920/le-patron-de-suez-juge-l-offre-de-veolia-aberrante-pour-suez-et-funeste-pour-la-france Traceback (most recent call last): File "site-packages/calibre/web/feeds/news.py", line 1247, in build_index File "<string>", line 143, in print_version TypeError: 'NoneType' object has no attribute '__getitem__' Failed to find print version for: https://www.mediapart.fr/journal/fil-dactualites/060920/coronavirus-la-corse-passe-en-zone-rouge-comme-5-autres-departements-francais Traceback (most recent call last): File "site-packages/calibre/web/feeds/news.py", line 1247, in build_index File "<string>", line 143, in print_version TypeError: 'NoneType' object has no attribute '__getitem__' 1% Démarrage du téléchargements de [4 fils]... 34% Flux téléchargés vers /tmp/calibre_4.23.0_tmp_wIwwWu/lcDvQ9_plumber/index.html 34% Téléchargement terminé Input debug saved to: /home/xxxxx/debug/input Parsing all content... Parsing feed_1/index.html ... Initial parse failed, using more forgiving parsers Parsing feed_1/index.html as HTML Parsing feed_0/index.html ... Initial parse failed, using more forgiving parsers Parsing feed_0/index.html as HTML Parsing index.html ... Forcing index.html into XHTML namespace Reading TOC from NCX... Parsed HTML written to: /home/xxxxx/debug/parsed 34% Exécution des transformations du livre numérique… Merging user specified metadata... Detecting structure... Structured HTML written to: /home/xxxxx/debug/structure Flattening CSS and remapping font sizes... Source base font size is 12.00000pt Removing fake margins... Found 2 items of level: p_2 Found 3 items of level: div_2 Found 5 items of level: div_1 Ignoring level p_2 div_2 left margin stats: Counter() div_2 right margin stats: Counter() div_1 left margin stats: Counter() div_1 right margin stats: Counter() Cleaning up manifest... Trimming unused files from manifest... Processed HTML written to: /home/xxxxx/debug/processed Creating EPUB Output... 67% Exécution de l'extension EPUB Output Found non-unique filenames, renaming to support broken EPUB readers like FBReader, Aldiko and Stanza... {u'feed_0/index.html': u'feed_0/index_u1.html', u'index.html': u'index_u2.html'} Splitting markup on page breaks and flow limits, if any... Looking for large trees in feed_1/index.html... No large trees found Looking for large trees in index_u2.html... No large trees found Looking for large trees in feed_0/index_u1.html... No large trees found The cover image has an id != "cover". Renaming to work around bug in Nook Color EPUB output written to /home/xxxxx/mediapart.epub Sortie sauvegardée vers /home/xxxxx/mediapart.epub As I don't see what other information would be OK for you, feel free to ask me if needed. Thanks a lot to have red my poor english till the end of this loooong post ! Hoping to read you later. |
09-09-2020, 10:19 AM | #2 |
Member
Posts: 10
Karma: 10
Join Date: Sep 2020
Device: KOBO Forma
|
Oops... !
Replying to myself 'cause I realize I've forgotten to mention that I began my tests with "Mediapart" recipe with the 4.22 version of Calibre and it was also broken (as the recipe version is the same, that seems to be logical ; but at least that's not linked to the new code in 4.23). |
Advert | |
|
09-09-2020, 10:45 AM | #3 |
creator of calibre
Posts: 43,835
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Somebody that reads french is going to have to update this recipe. I made some preliminary fixes, but given I dont read frech nor subscripbe to this website, that's all I am going to do. https://github.com/kovidgoyal/calibr...8a7d39bae550ca
|
09-09-2020, 12:53 PM | #4 |
Member
Posts: 10
Karma: 10
Join Date: Sep 2020
Device: KOBO Forma
|
Erf,
you're really incredibly responsive (as I seen in other threads) ! :O [sorry but I can't activate the "Quote message ..." button] « Somebody that reads french is going to have to update this recipe. » Ahem... I'm afraid of it ! [got no answer from Daniel Bonnery] « I made some preliminary fixes, » Urf... thanxs a lot, you're too much Kovid ! :O I'll have a glance at these (no time just of now). « but given I dont read frech nor subscripbe to this website, [...] » Yes, of course, no problem, as I finish by understand a few Python words and I -- of course -- speak french and have a subscription to "Mediapart" news == that is MY work from now. I'll test your fixes and come back soon with the 'stderr' stuff ... maybe these fixes will be sufficient enough, who knows ? Another : « merci beaucoup, Kovid ! » |
09-10-2020, 12:44 PM | #5 |
Member
Posts: 10
Karma: 10
Join Date: Sep 2020
Device: KOBO Forma
|
It's me again, coming back from "debugland˝.
So ... as I expected (and, since you re-wrote almost the whole recipe) there's no more red lines with 'ebook-convert' and it -- at last ! -- yielded the articles. BUT ... it remains one « small » problem as, actually, it yields only 10 or 15 lines of the articles and these ended by a sentence like : « Please, feel free to subscribe to Mediapart in order to read blahhh... » So, I wondered : « had it a link with the "logging snippet"? », « is it useful? », etc...? At this stage, I think the best thing to do could be ... opening a new thread? I go opening "Subscribing in 'Mediapart' [new] recipe don't work" ... |
Advert | |
|
01-12-2021, 09:29 AM | #6 |
Junior Member
Posts: 4
Karma: 16
Join Date: Dec 2020
Device: Kindle 10th Generation
|
New major update of the Mediapart recipe
Hi! Just to let you know that I updated a new version of the Mediapart recipe. The new major changes in the new Mediapart recipe are:
1) Summary of the article are now available 2) Additional sections International, France, Economie and Culture have been added through custom entries in the function my_parse_index. 3) The cover image so it doesnt disappear from the Kindle menu You can see the new version of the recipe here: https://github.com/kovidgoyal/calibr...diapart.recipe |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Titles missing in the beginning of some articles | sgillf | Recipes | 2 | 11-09-2019 02:07 AM |
Sort titles with articles | moredrowsy | Calibre | 3 | 02-01-2019 11:19 PM |
Mediapart recipe | bernard.ryefield | Recipes | 24 | 12-14-2017 04:58 PM |
Big sale of Quercus Publishing titles at Kobo and Amazon | slex | Deals and Resources (No Self-Promotion or Affiliate Links) | 7 | 10-26-2013 06:57 PM |
Rules for mediapart.fr and rue89.com (french news websites) | Metapioca | Recipes | 18 | 08-25-2013 08:48 AM |