View Single Post
Old 09-09-2020, 09:16 AM   #1
j33p
Member
j33p began at the beginning.
 
j33p's Avatar
 
Posts: 10
Karma: 10
Join Date: Sep 2020
Device: KOBO Forma
No articles (only big titles) in 'Mediapart'

Hi folks,

I first tried to mail contact (last sunday) Daniel Bonnery, in order not to bother you with it, but hmmm... no answer (for the moment).
So, today I decided to ask here.

The issue -- as said in the title -- is that Calibre is fetching the news on daily french journal "Mediapart", no problem, BUT ... when it comes to look inside the epub, it results in only one blank page (plus the cover image) with just the big titles ("La Une" and "Brèves") plus
the links ("Section suivante" and "Menu principal") and ... that's all I got.
No article, nor at least their titles, almost nothing.

I'm using OS debian (buster / stable)
And 'Calibre' (from calibre site, not debian package, as recommanded) 4.23 version.

No problem with 'Calibre' itself (which is an awesome work ! ), only with the recipe.
I've tried with another recipe ("20minutes") and this one works as expected.

I've also try some python snippets that I've seen Kovid added recently in some problematic recipes, they broke nothing ... nor fixed the issue(s). :/

As i've said in my « Introducing myself », I'm not good at all at Python, I just can try a few things, as I'm coding shell scripts since 20 years, that's all.

As you'll evidently see, the issue seems coming from a 'Failed to find print version for: https://...' error for each article URI.


Here is the result of my tests in CLI (with 'ebook-convert') [I omit the green lines as they aren't so relevant (IMHO) in regard of the place they take] :

[I must precise that plugins which seems failing to initialize (in the 4 first lines) are well in the good (?) place : '/home/xxxxx/.config/calibre/plugins/']

Code:
 ebook-convert mediapart.recipe  .epub  --username 'xxxxxxxx' --password 'xxxxxxxxx.' --test -vv --debug-pipeline debug
Failed to initialize plugin: u'/home/xxxxx/.config/calibre/plugins/KePub Output.zip'
Failed to initialize plugin: u'/home/xxxxx/.config/calibre/plugins/KoboTouchExtended.zip'
Failed to initialize plugin: u'/home/xxxxx/.config/calibre/plugins/KePub Metadata Reader.zip'
Failed to initialize plugin: u'/home/xxxxxx/.config/calibre/plugins/KePub Metadata Writer.zip'
Conversion options changed from defaults:
  debug_pipeline: u'debug'
  test: (2, 2)
  verbose: 2

                   [snipped irrelevant green lines here]

1% Conversion de l’entrée en HTML…
InputFormatPlugin: Recipe Input running
Using custom recipe
Using user agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36
1% Récupération des flux…
1% Récupération du flux La Une...
1% Tentative de téléchargement de la couverture…
34% Téléchargement de la couverture de https://static.mediapart.fr/files/M%20Philips/logo-mediapart.png
1% Génération du titre de journal
Synthesizing mastheadImage
QApplication: invalid style override 'gtk2' passed, ignoring it.
    Available styles: Windows, Fusion
Failed to find print version for: https://www.mediapart.fr/journal/international/060920/l-armee-francaise-arme-ses-drones-mais-le-debat-est-confisque
Traceback (most recent call last):
  File "site-packages/calibre/web/feeds/news.py", line 1247, in build_index
  File "<string>", line 143, in print_version
TypeError: 'NoneType' object has no attribute '__getitem__'

Failed to find print version for: https://www.mediapart.fr/journal/international/060920/au-sahel-le-spectre-de-la-menace-fantome
Traceback (most recent call last):
  File "site-packages/calibre/web/feeds/news.py", line 1247, in build_index
  File "<string>", line 143, in print_version
TypeError: 'NoneType' object has no attribute '__getitem__'

Failed to find print version for: https://www.mediapart.fr/journal/fil-dactualites/060920/le-patron-de-suez-juge-l-offre-de-veolia-aberrante-pour-suez-et-funeste-pour-la-france
Traceback (most recent call last):
  File "site-packages/calibre/web/feeds/news.py", line 1247, in build_index
  File "<string>", line 143, in print_version
TypeError: 'NoneType' object has no attribute '__getitem__'

Failed to find print version for: https://www.mediapart.fr/journal/fil-dactualites/060920/coronavirus-la-corse-passe-en-zone-rouge-comme-5-autres-departements-francais
Traceback (most recent call last):
  File "site-packages/calibre/web/feeds/news.py", line 1247, in build_index
  File "<string>", line 143, in print_version
TypeError: 'NoneType' object has no attribute '__getitem__'

1% Démarrage du téléchargements de [4 fils]...
34% Flux téléchargés vers /tmp/calibre_4.23.0_tmp_wIwwWu/lcDvQ9_plumber/index.html
34% Téléchargement terminé
Input debug saved to: /home/xxxxx/debug/input
Parsing all content...
Parsing feed_1/index.html ...
Initial parse failed, using more forgiving parsers
Parsing feed_1/index.html as HTML
Parsing feed_0/index.html ...
Initial parse failed, using more forgiving parsers
Parsing feed_0/index.html as HTML
Parsing index.html ...
Forcing index.html into XHTML namespace
Reading TOC from NCX...
Parsed HTML written to: /home/xxxxx/debug/parsed
34% Exécution des transformations du livre numérique…
Merging user specified metadata...
Detecting structure...
Structured HTML written to: /home/xxxxx/debug/structure
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Removing fake margins...
Found 2 items of level: p_2
Found 3 items of level: div_2
Found 5 items of level: div_1
Ignoring level p_2
div_2  left margin stats: Counter()
div_2  right margin stats: Counter()
div_1  left margin stats: Counter()
div_1  right margin stats: Counter()
Cleaning up manifest...
Trimming unused files from manifest...
Processed HTML written to: /home/xxxxx/debug/processed
Creating EPUB Output...
67% Exécution de l'extension EPUB Output
Found non-unique filenames, renaming to support broken EPUB readers like FBReader, Aldiko and Stanza...
{u'feed_0/index.html': u'feed_0/index_u1.html',
 u'index.html': u'index_u2.html'}
Splitting markup on page breaks and flow limits, if any...
    Looking for large trees in feed_1/index.html...
    No large trees found
    Looking for large trees in index_u2.html...
    No large trees found
    Looking for large trees in feed_0/index_u1.html...
    No large trees found
The cover image has an id != "cover". Renaming to work around bug in Nook Color
EPUB output written to /home/xxxxx/mediapart.epub
Sortie sauvegardée vers   /home/xxxxx/mediapart.epub
Et voilà !

As I don't see what other information would be OK for you, feel free to ask me if needed.

Thanks a lot to have red my poor english till the end of this loooong post !
Hoping to read you later.
j33p is offline   Reply With Quote