Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 09-09-2020, 09:16 AM   #1
j33p
Member
j33p began at the beginning.
 
j33p's Avatar
 
Posts: 10
Karma: 10
Join Date: Sep 2020
Device: KOBO Forma
No articles (only big titles) in 'Mediapart'

Hi folks,

I first tried to mail contact (last sunday) Daniel Bonnery, in order not to bother you with it, but hmmm... no answer (for the moment).
So, today I decided to ask here.

The issue -- as said in the title -- is that Calibre is fetching the news on daily french journal "Mediapart", no problem, BUT ... when it comes to look inside the epub, it results in only one blank page (plus the cover image) with just the big titles ("La Une" and "Brèves") plus
the links ("Section suivante" and "Menu principal") and ... that's all I got.
No article, nor at least their titles, almost nothing.

I'm using OS debian (buster / stable)
And 'Calibre' (from calibre site, not debian package, as recommanded) 4.23 version.

No problem with 'Calibre' itself (which is an awesome work ! ), only with the recipe.
I've tried with another recipe ("20minutes") and this one works as expected.

I've also try some python snippets that I've seen Kovid added recently in some problematic recipes, they broke nothing ... nor fixed the issue(s). :/

As i've said in my « Introducing myself », I'm not good at all at Python, I just can try a few things, as I'm coding shell scripts since 20 years, that's all.

As you'll evidently see, the issue seems coming from a 'Failed to find print version for: https://...' error for each article URI.


Here is the result of my tests in CLI (with 'ebook-convert') [I omit the green lines as they aren't so relevant (IMHO) in regard of the place they take] :

[I must precise that plugins which seems failing to initialize (in the 4 first lines) are well in the good (?) place : '/home/xxxxx/.config/calibre/plugins/']

Code:
 ebook-convert mediapart.recipe  .epub  --username 'xxxxxxxx' --password 'xxxxxxxxx.' --test -vv --debug-pipeline debug
Failed to initialize plugin: u'/home/xxxxx/.config/calibre/plugins/KePub Output.zip'
Failed to initialize plugin: u'/home/xxxxx/.config/calibre/plugins/KoboTouchExtended.zip'
Failed to initialize plugin: u'/home/xxxxx/.config/calibre/plugins/KePub Metadata Reader.zip'
Failed to initialize plugin: u'/home/xxxxxx/.config/calibre/plugins/KePub Metadata Writer.zip'
Conversion options changed from defaults:
  debug_pipeline: u'debug'
  test: (2, 2)
  verbose: 2

                   [snipped irrelevant green lines here]

1% Conversion de l’entrée en HTML…
InputFormatPlugin: Recipe Input running
Using custom recipe
Using user agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36
1% Récupération des flux…
1% Récupération du flux La Une...
1% Tentative de téléchargement de la couverture…
34% Téléchargement de la couverture de https://static.mediapart.fr/files/M%20Philips/logo-mediapart.png
1% Génération du titre de journal
Synthesizing mastheadImage
QApplication: invalid style override 'gtk2' passed, ignoring it.
    Available styles: Windows, Fusion
Failed to find print version for: https://www.mediapart.fr/journal/international/060920/l-armee-francaise-arme-ses-drones-mais-le-debat-est-confisque
Traceback (most recent call last):
  File "site-packages/calibre/web/feeds/news.py", line 1247, in build_index
  File "<string>", line 143, in print_version
TypeError: 'NoneType' object has no attribute '__getitem__'

Failed to find print version for: https://www.mediapart.fr/journal/international/060920/au-sahel-le-spectre-de-la-menace-fantome
Traceback (most recent call last):
  File "site-packages/calibre/web/feeds/news.py", line 1247, in build_index
  File "<string>", line 143, in print_version
TypeError: 'NoneType' object has no attribute '__getitem__'

Failed to find print version for: https://www.mediapart.fr/journal/fil-dactualites/060920/le-patron-de-suez-juge-l-offre-de-veolia-aberrante-pour-suez-et-funeste-pour-la-france
Traceback (most recent call last):
  File "site-packages/calibre/web/feeds/news.py", line 1247, in build_index
  File "<string>", line 143, in print_version
TypeError: 'NoneType' object has no attribute '__getitem__'

Failed to find print version for: https://www.mediapart.fr/journal/fil-dactualites/060920/coronavirus-la-corse-passe-en-zone-rouge-comme-5-autres-departements-francais
Traceback (most recent call last):
  File "site-packages/calibre/web/feeds/news.py", line 1247, in build_index
  File "<string>", line 143, in print_version
TypeError: 'NoneType' object has no attribute '__getitem__'

1% Démarrage du téléchargements de [4 fils]...
34% Flux téléchargés vers /tmp/calibre_4.23.0_tmp_wIwwWu/lcDvQ9_plumber/index.html
34% Téléchargement terminé
Input debug saved to: /home/xxxxx/debug/input
Parsing all content...
Parsing feed_1/index.html ...
Initial parse failed, using more forgiving parsers
Parsing feed_1/index.html as HTML
Parsing feed_0/index.html ...
Initial parse failed, using more forgiving parsers
Parsing feed_0/index.html as HTML
Parsing index.html ...
Forcing index.html into XHTML namespace
Reading TOC from NCX...
Parsed HTML written to: /home/xxxxx/debug/parsed
34% Exécution des transformations du livre numérique…
Merging user specified metadata...
Detecting structure...
Structured HTML written to: /home/xxxxx/debug/structure
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Removing fake margins...
Found 2 items of level: p_2
Found 3 items of level: div_2
Found 5 items of level: div_1
Ignoring level p_2
div_2  left margin stats: Counter()
div_2  right margin stats: Counter()
div_1  left margin stats: Counter()
div_1  right margin stats: Counter()
Cleaning up manifest...
Trimming unused files from manifest...
Processed HTML written to: /home/xxxxx/debug/processed
Creating EPUB Output...
67% Exécution de l'extension EPUB Output
Found non-unique filenames, renaming to support broken EPUB readers like FBReader, Aldiko and Stanza...
{u'feed_0/index.html': u'feed_0/index_u1.html',
 u'index.html': u'index_u2.html'}
Splitting markup on page breaks and flow limits, if any...
    Looking for large trees in feed_1/index.html...
    No large trees found
    Looking for large trees in index_u2.html...
    No large trees found
    Looking for large trees in feed_0/index_u1.html...
    No large trees found
The cover image has an id != "cover". Renaming to work around bug in Nook Color
EPUB output written to /home/xxxxx/mediapart.epub
Sortie sauvegardée vers   /home/xxxxx/mediapart.epub
Et voilà !

As I don't see what other information would be OK for you, feel free to ask me if needed.

Thanks a lot to have red my poor english till the end of this loooong post !
Hoping to read you later.
j33p is offline   Reply With Quote
Old 09-09-2020, 10:19 AM   #2
j33p
Member
j33p began at the beginning.
 
j33p's Avatar
 
Posts: 10
Karma: 10
Join Date: Sep 2020
Device: KOBO Forma
Oops... !

Replying to myself 'cause I realize I've forgotten to mention that I began my tests with "Mediapart" recipe with the 4.22 version of Calibre and it was also broken (as the recipe version is the same, that seems to be logical ; but at least that's not linked to the new code in 4.23).
j33p is offline   Reply With Quote
Advert
Old 09-09-2020, 10:45 AM   #3
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,835
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Somebody that reads french is going to have to update this recipe. I made some preliminary fixes, but given I dont read frech nor subscripbe to this website, that's all I am going to do. https://github.com/kovidgoyal/calibr...8a7d39bae550ca
kovidgoyal is offline   Reply With Quote
Old 09-09-2020, 12:53 PM   #4
j33p
Member
j33p began at the beginning.
 
j33p's Avatar
 
Posts: 10
Karma: 10
Join Date: Sep 2020
Device: KOBO Forma
Erf,

you're really incredibly responsive (as I seen in other threads) ! :O

[sorry but I can't activate the "Quote message ..." button]

« Somebody that reads french is going to have to update this recipe. »

Ahem... I'm afraid of it !
[got no answer from Daniel Bonnery]

« I made some preliminary fixes, »

Urf... thanxs a lot, you're too much Kovid ! :O
I'll have a glance at these (no time just of now).

« but given I dont read frech nor subscripbe to this website, [...] »

Yes, of course, no problem, as I finish by understand a few Python words and I -- of course -- speak french and have a subscription to "Mediapart" news == that is MY work from now.

I'll test your fixes and come back soon with the 'stderr' stuff ... maybe these fixes will be sufficient enough, who knows ?
Another : « merci beaucoup, Kovid ! »
j33p is offline   Reply With Quote
Old 09-10-2020, 12:44 PM   #5
j33p
Member
j33p began at the beginning.
 
j33p's Avatar
 
Posts: 10
Karma: 10
Join Date: Sep 2020
Device: KOBO Forma
It's me again, coming back from "debugland˝.

So ... as I expected (and, since you re-wrote almost the whole recipe) there's
no more red lines with 'ebook-convert' and it -- at last ! -- yielded the articles.

BUT ... it remains one « small » problem as, actually, it yields only 10 or 15 lines of the articles and these ended by a sentence like : « Please, feel free to subscribe to Mediapart in order to read blahhh... »
So, I wondered : « had it a link with the "logging snippet"? », « is it useful? », etc...?

At this stage, I think the best thing to do could be ... opening a new thread?
I go opening "Subscribing in 'Mediapart' [new] recipe don't work" ...
j33p is offline   Reply With Quote
Advert
Old 01-12-2021, 09:29 AM   #6
loic2000
Junior Member
loic2000 began at the beginning.
 
Posts: 4
Karma: 16
Join Date: Dec 2020
Device: Kindle 10th Generation
New major update of the Mediapart recipe

Hi! Just to let you know that I updated a new version of the Mediapart recipe. The new major changes in the new Mediapart recipe are:
1) Summary of the article are now available
2) Additional sections International, France, Economie and Culture have been added through custom entries in the function my_parse_index.
3) The cover image so it doesnt disappear from the Kindle menu


You can see the new version of the recipe here:
https://github.com/kovidgoyal/calibr...diapart.recipe
loic2000 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Titles missing in the beginning of some articles sgillf Recipes 2 11-09-2019 02:07 AM
Sort titles with articles moredrowsy Calibre 3 02-01-2019 11:19 PM
Mediapart recipe bernard.ryefield Recipes 24 12-14-2017 04:58 PM
Big sale of Quercus Publishing titles at Kobo and Amazon slex Deals and Resources (No Self-Promotion or Affiliate Links) 7 10-26-2013 06:57 PM
Rules for mediapart.fr and rue89.com (french news websites) Metapioca Recipes 18 08-25-2013 08:48 AM


All times are GMT -4. The time now is 08:50 AM.


MobileRead.com is a privately owned, operated and funded community.