Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 03-20-2018, 03:26 PM   #31
hegi
Enthusiast
hegi began at the beginning.
 
Posts: 44
Karma: 10
Join Date: Dec 2012
Device: Kindle 4 & Kindle PW 3G
Hi Divingduck,

thanks for your hints. - Well the debug directory is what I already used for my last post. - The bit with the print statements comes in handy, however, when I try to fill these with the regexps, e.g. like this:

Code:
    print '*** c-overline tag    --->:', (re.compile(r'(<span class="c-overline">[^<]*)(</span>)', re.DOTALL|re.IGNORECASE), lambda match: match.group(1) + ': ' + match.group(2))
    print '*** hcf-location-mark --->:', (re.compile(r'(<span class="hcf-location-mark">[^<]*)(</span>)', re.DOTALL|re.IGNORECASE), lambda match: match.group(1) + '. ' + match.group(2))
I don't get any meaningful output. - Or can you make sense from stuff like:
Code:
*** c-overline tag    --->: (<_sre.SRE_Pattern object at 0x7fdef18de540>, <function <lambda> at 0x7fdee09ddaa0>)
*** hcf-location-mark --->: (<_sre.SRE_Pattern object at 0x7fdee0dcb5e8>, <function <lambda> at 0x7fdee09ddaa0>)
In theory, I'd be a step further, If I could manage to grab the match.group information for the logfile.

I think I'm really stuck here, and this is quite frustrating.

Thanks a lot in advance.

Hegi.
hegi is offline   Reply With Quote
Old 03-20-2018, 07:39 PM   #32
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,161
Karma: 1404241
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
Your welcome.

I had a bit time to take a closer look at the problem.

There are two things I saw.
One is, to remember when a regex will happen. You are using preprocess_regexps. This means this refer to the downloaded HTML as source input. Therefore you can check debug\input\ as your source for the regex to find out how the downloaded HTML file looks for calibre at the moment you are manipulate the file.
Second problem is the class you are looking for include spaces in its name and that do not to work (I think that had never work).

Taking that in account, I would make it slightly different. I don't take care about the complete class string, I look only for the end of the class name for a unique identification:

... c-overline--article"> ... </span> ...
Code:
(re.compile(r'(c-overline--article">[^>]*)(</span>)', re.DOTALL|re.IGNORECASE), lambda match: match.group(1) + ': ' + match.group(2))
I attach an updated version of the recipe.
Attached Files
File Type: zip WirtschaftsWoche_AGe_V4.3.zip (1.8 KB, 254 views)
Divingduck is offline   Reply With Quote
Old 03-22-2018, 04:10 PM   #33
hegi
Enthusiast
hegi began at the beginning.
 
Posts: 44
Karma: 10
Join Date: Dec 2012
Device: Kindle 4 & Kindle PW 3G
Thanks Divingduck,

... as usual, the problem lies in open sight and once you know the solution, everything seems simple and easy.

I took the freedom to merge my earlier fork from your recipe with your actual version, to come up with an improved version. - Please feel free to review and edit or enhance even further.

My evolutionary changes over the last five years:
  • add also regexp to add ". " after hcf-location-mark (the Place where the article is set).
  • further css entries for teaser text and other elements
  • options for conversion and duplicate articles
  • optional settings to reduce size on b/w readers
  • played a bit with tags filtering

For Amazone Kindle [4|Paperwhite] these settings work nicely:
Code:
    # if you want to reduce size for an b/w or E-ink device, uncomment the following 4 lines:
    compress_news_images  = True
    #compress_news_images_auto_size = 16
    scale_news_images     = (400,300)
    compress_news_images_max_size = 35
Currently one of my former versions ships with calibre OOTB. So, once you are happy with the combined efforts as well, we should ask Kovid to integrate the recipe upstream.

Thanks again and looking forward to your comments.

Hegi.
Attached Files
File Type: zip WiwoOnline_4.4.zip (2.0 KB, 250 views)
hegi is offline   Reply With Quote
Old 03-22-2018, 05:04 PM   #34
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,161
Karma: 1404241
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
Thanks, you are welcome. It's fine for me.

DD

PS: No need to ask for approval. I like your changes for the recipe.
Divingduck is offline   Reply With Quote
Old 12-27-2020, 05:58 AM   #35
hegi
Enthusiast
hegi began at the beginning.
 
Posts: 44
Karma: 10
Join Date: Dec 2012
Device: Kindle 4 & Kindle PW 3G
Hi Divingduck,
I noticed for some time, that for some format of articles, pictures are no longer downloaded with the recipe, while for other articles it still works.

Havn't had a chance yet to dig deeper, but wonder, if you maybe had already a look at it?
Cheers,
Hegi.
hegi is offline   Reply With Quote
Old 12-28-2020, 03:16 AM   #36
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,161
Karma: 1404241
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
Hi Hegi,
Hope you are well these day's.

Yes I did, but quite some time ago
Attached Files
File Type: zip WirtschaftsWoche_AGe_V4.3.zip (1.8 KB, 176 views)
Divingduck is offline   Reply With Quote
Old 12-29-2020, 04:55 AM   #37
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,161
Karma: 1404241
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
I made a quick update as I saw a small issue today.

@hegi , forgot to mention, you need to integrate your additional code for your kindle. I still use my old Sony device

Best regards,
DD
Attached Files
File Type: zip WirtschaftsWoche_AGe_V4.4.zip (1.8 KB, 161 views)

Last edited by Divingduck; 12-29-2020 at 05:01 AM.
Divingduck is offline   Reply With Quote
Old 12-29-2020, 05:22 AM   #38
hegi
Enthusiast
hegi began at the beginning.
 
Posts: 44
Karma: 10
Join Date: Dec 2012
Device: Kindle 4 & Kindle PW 3G
Hi Divingduck,
thanks a lot for your quick reply and the new version of the recipe.

According to your comments, this is the from March 2018, when we wrote about this the last time. - However, when doing a quick diff on the versions, there seems to be some changes.

I currently load both versions (mine and this one) to compare the output.
Cut it be, that you ommitted updating the comments (date/version) on your last adaptations?

According to my analysis the most relevant difference between our versions is the following code within my recipe:

Quote:
# don't duplicate articles from "Schlagzeilen" / "Exklusiv" to other rubrics
ignore_duplicate_articles = {'title', 'url'}
However, the problem observed remains. If there are picture-galleries like in this article (https://www.wiwo.de/unternehmen/auto.../26185402.html), you get in the output only the text of the gallery like this:

Quote:
1 / 8

Volkswagens neues E-Modell ID.4 feierte Ende September digitale Weltpremiere. Vorbestellt werden konnte er schon, nun soll er in den ersten Wochen des neuen Jahres auch zu den Kunden rollen.

Bild: Volkswagen

2 / 8

Wo der ID.3 in der schrumpfenden Kompaktklasse antritt, startet der ID.4 im Boom-Segment der handlichen Geländewagen. Und während es den einen nur in Europa geben wird, feiern die Niedersachsen den anderen als Weltauto. Kein anderes Auto, so meint man bei VW, wird wichtiger im Kampf gegen Tesla & Co. Kein Wunder also, dass der Konzern reichlich trommelt für den elektrischen Weltbürger in Spe und bereits vor der offiziellen Enthüllung im Spätsommer zu einer ersten Ausfahrt im nur noch dezent getarnten Prototypen auf das sonst so streng geheime Testgelände in Ehra-Lessien bat.

Bild: Volkswagen
Any ideas as how to tackle this issue?

Thanks a lot in advance ...

Hegi.
hegi is offline   Reply With Quote
Old 12-31-2020, 10:34 AM   #39
hegi
Enthusiast
hegi began at the beginning.
 
Posts: 44
Karma: 10
Join Date: Dec 2012
Device: Kindle 4 & Kindle PW 3G
Hi Divingduck,
... just saw that you send another post on Tuesday, while I was preparing mine. Sorry for the confusion caused (if any)

As it appears, your change fixed the picture-gallery-issue ... well at least almost . Within the galleries, there is some "extra content" - mostly "internal adds". - Have a look here: https://www.wiwo.de/politik/deutschl.../26760374.html. - As a result only the first 5 pics go into the ebook, and the additional text is only for 7 out of 17 in ... I suspect these are tweaks to nag readers to buy premium ...

Due to the changes in the Articles (The Teaser-Text no longer seems to start with a location), this additional code of mine (no longer included in your version) for the css seems deprecated:
Code:
   .hcf-location-mark {font-style: italic; font-weight:bold}                                 
   .c-overline {font-size: 1em; text-align: left;font-style: normal; font-weight:bold}
However, I'm a bit surprised, that the .c-overline format does not catch any more (also tried other stuff, like italics and smaller fonts), as the insertion of the colon afterwards in your code still works ...

I won't play with this any longer for now ... well let's say for this year .

Thanks and all the best to you
Hegi.
hegi is offline   Reply With Quote
Old 08-04-2022, 01:56 PM   #40
hegi
Enthusiast
hegi began at the beginning.
 
Posts: 44
Karma: 10
Join Date: Dec 2012
Device: Kindle 4 & Kindle PW 3G
Hi Divingduck,
I realised the picture galleries are broken again. - Did you have another go at the recipe in the last 19 months to optimise it?

If so, I'd be glad if you could share your current version.
Cheers,
Hegi.
hegi is offline   Reply With Quote
Old 08-05-2022, 04:31 AM   #41
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,161
Karma: 1404241
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
Hi hegi,

this is the version I actually use. There are missing as well for some articles pictures, but the recipe itself should work.

I will take a look to the recipe this week end. Maybe there is something I can improve.

Let me know how the attached version is working for you.

Best regards,
DD
Attached Files
File Type: zip WirtschaftsWoche_AGe_v4.5.zip (2.0 KB, 65 views)
Divingduck is offline   Reply With Quote
Old 08-05-2022, 10:44 AM   #42
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,161
Karma: 1404241
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
I decided to build a new one as there was too many old obsolete stuff in. It should show all article pictures again.

Best regards,
DD
Attached Files
File Type: zip WirtschaftsWoche_AGe_v5.0.zip (1.6 KB, 59 views)
Divingduck is offline   Reply With Quote
Old 08-06-2022, 05:26 AM   #43
hegi
Enthusiast
hegi began at the beginning.
 
Posts: 44
Karma: 10
Join Date: Dec 2012
Device: Kindle 4 & Kindle PW 3G
Hi Divingduck,
thanks for your effort and sharing the results. That's appreciated.

While in your new version, pictures of galleries are shown, the old version does show the explanatory texts to the pictures (but not the pics itself) .

Here is an example article for this behaviour: https://www.wiwo.de/technologie/mobi.../28567350.html

Looking at the html-source there seems to be extra "</a>" tags at the end of each picture entry without corresponding "<a ...>" tags. Look here (last line of section):

Code:
<div data-macro="lead-media" id="biga_wrapper">
<div
class="o-media o-media--lead o-media--imagegallery u-margin-xxxxl ">
<div class="c-imagegallery u-margin-xl " data-cy="imageGallery">
<div class="c-imagegallery__slick ajaxify"
data-command='{"imageGallery": {"showAllURL": "/technologie/mobilitaet/e-mobility/585-ps-so-faehrt-sich-der-kia-ev6-gt/28567350.html", "nextSlidesToPreload": 5, "main" : true, "lastButtonText": "Alle Bilder anzeigen", "isPaid" : false}, "trackSlickNav": ""}'>
<div class="c-imagegallery__slick-item">
<div class="c-imagegallery__image-wrapper u-margin-xxl u-margin-mobile-bottom-xl">
<div class="c-imagegallery__hs">
<div class="c-imagegallery__image--preloader"></div>
<picture
class="c-imagegallery__image c-imagegallery__image--landscape">
<source srcset="/images/kia_ev6gt_1/28567348/3-format10380.jpg, /images/kia_ev6gt_1/28567348/3-format10760.jpg 2x"
class="js-picture--mobile"
media="screen and (max-width: 767px)"
>
<source srcset="/images/kia_ev6gt_1/28567348/3-format11000.jpg, /images/kia_ev6gt_1/28567348/3-format12000.jpg 2x"
media="screen and (min-width: 768px)"
class="js-picture--desktop"
>
<img src="/images/kia_ev6gt_1/28567348/3-format11000.jpg" srcset="/images/kia_ev6gt_1/28567348/3-format11000.jpg, /images/kia_ev6gt_1/28567348/3-format12000.jpg 2x"
alt="Fast 600 PS schlummern unter dem Blechkleid des neuen Kia EV6 GT Quelle: Kia">
</picture> </div>
</div>
<div class="c-metadata o-article__element u-flex u-margin-xl">
<div class="u-flex__item u-flex__item--static">
<div class="c-imagegallery__counter">1 / 10</div>
</div>
<div class="u-flex__item">
<p>Kia hat in den vergangenen Jahren einen bemerkenswerten Aufstieg hingelegt. Aus der einstigen Low-Budget-Marke ist einer der härtesten Konkurrenten für die europäischen und japanischen Autogiganten geworden. Mittlerweile wildern die Südkoreaner sogar im Revier von Tesla – dank der konzerneigenen Plattform E-GMP, auf dem auch der Konzernbruder Hyundai Ioniq 5 steht, nehmen sie bei reinen Elektrofahrzeugen eine technologische Führungsrolle ein. Auch die wahrscheinlich mindestens 70.000 Euro teure neue Spitzenversion der EV6-Baureihe, der 430 kW/585 PS starke GT, kann dank seines 800-Volt-Bordnetzes den 77,4 kWh großen Akku theoretisch in 18 Minuten von 10 auf 80 Prozent laden.</p>
<p>
Bild:
Kia
</a>
Had no chance to find out yet, whether this is a general issue in the CMS or if its only related to this article ...

Hope this helps
Cheers,
Hegi.
hegi is offline   Reply With Quote
Old 08-07-2022, 04:11 AM   #44
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,161
Karma: 1404241
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
Hi Hegi,
I just saw your comment. Indeed, there is a problem with this kind of articles. It seems the parsing is somehow wrong. The pictures are in the download and in the recipe (you can see this when you download with debugging enable).

I recognize, there is always a set of two pictures for one, a background mask and the picture itself, for the picture to display.

Code:
<div class="calibre9">
<div class="calibre9">
<div class="calibre9"></div>
<picture>
<source media="screen and (max-width: 767px)" srcset="/images/kia_ev6gt_1/28567348/3-format10380.jpg, /images/kia_ev6gt_1/28567348/3-format10760.jpg 2x">
<source media="screen and (min-width: 768px)" srcset="/images/kia_ev6gt_1/28567348/3-format11000.jpg, /images/kia_ev6gt_1/28567348/3-format12000.jpg 2x">
<img alt="Fast 600 PS schlummern unter dem Blechkleid des neuen Kia EV6 GT Quelle: Kia" src="images/img1_u8.jpg" class="calibre3">
</picture> </div>
</div>
The two <source media ...> lines are the problem. The pictures will be shown when I delete these tags manually. I need to find a solution to delete those tags before processing.
And maybe as well for the lost long description inside this container without destroying other articles.

Guess I need to dig deeper in the api and look if there is something that can help me to do a better recipe.

Last edited by Divingduck; 08-07-2022 at 04:48 AM. Reason: Edit because I was not ready with the answer...
Divingduck is offline   Reply With Quote
Old 08-07-2022, 06:38 AM   #45
hegi
Enthusiast
hegi began at the beginning.
 
Posts: 44
Karma: 10
Join Date: Dec 2012
Device: Kindle 4 & Kindle PW 3G
Hi Divingduck,
... no hurry needed. But i'd be glad if you could share the results, in case you reach a breaktrough at this issue.

Thanks a lot in advance
Hegi.
hegi is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
LWN.net Weekly News recipe davide125 Recipes 22 11-12-2014 09:44 PM
Business Week Recipe duplicates Mixx Recipes 0 09-16-2012 06:43 AM
beam-ebooks.de: Recipe to download weekly new content? Rince123 Recipes 0 01-02-2012 03:39 AM
Recipe for Sunday Business Post - Ireland anne.oneemas Recipes 15 12-13-2010 05:13 PM
Recipe for Business Spectator (Australia) RedDogInCan Recipes 1 12-01-2010 12:34 AM


All times are GMT -4. The time now is 05:01 AM.


MobileRead.com is a privately owned, operated and funded community.