03-20-2018, 03:26 PM | #31 |
Enthusiast
Posts: 44
Karma: 10
Join Date: Dec 2012
Device: Kindle 4 & Kindle PW 3G
|
Hi Divingduck,
thanks for your hints. - Well the debug directory is what I already used for my last post. - The bit with the print statements comes in handy, however, when I try to fill these with the regexps, e.g. like this: Code:
print '*** c-overline tag --->:', (re.compile(r'(<span class="c-overline">[^<]*)(</span>)', re.DOTALL|re.IGNORECASE), lambda match: match.group(1) + ': ' + match.group(2)) print '*** hcf-location-mark --->:', (re.compile(r'(<span class="hcf-location-mark">[^<]*)(</span>)', re.DOTALL|re.IGNORECASE), lambda match: match.group(1) + '. ' + match.group(2)) Code:
*** c-overline tag --->: (<_sre.SRE_Pattern object at 0x7fdef18de540>, <function <lambda> at 0x7fdee09ddaa0>) *** hcf-location-mark --->: (<_sre.SRE_Pattern object at 0x7fdee0dcb5e8>, <function <lambda> at 0x7fdee09ddaa0>) I think I'm really stuck here, and this is quite frustrating. Thanks a lot in advance. Hegi. |
03-20-2018, 07:39 PM | #32 |
Wizard
Posts: 1,161
Karma: 1404241
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
|
Your welcome.
I had a bit time to take a closer look at the problem. There are two things I saw. One is, to remember when a regex will happen. You are using preprocess_regexps. This means this refer to the downloaded HTML as source input. Therefore you can check debug\input\ as your source for the regex to find out how the downloaded HTML file looks for calibre at the moment you are manipulate the file. Second problem is the class you are looking for include spaces in its name and that do not to work (I think that had never work). Taking that in account, I would make it slightly different. I don't take care about the complete class string, I look only for the end of the class name for a unique identification: ... c-overline--article"> ... </span> ... Code:
(re.compile(r'(c-overline--article">[^>]*)(</span>)', re.DOTALL|re.IGNORECASE), lambda match: match.group(1) + ': ' + match.group(2)) |
03-22-2018, 04:10 PM | #33 |
Enthusiast
Posts: 44
Karma: 10
Join Date: Dec 2012
Device: Kindle 4 & Kindle PW 3G
|
Thanks Divingduck,
... as usual, the problem lies in open sight and once you know the solution, everything seems simple and easy. I took the freedom to merge my earlier fork from your recipe with your actual version, to come up with an improved version. - Please feel free to review and edit or enhance even further. My evolutionary changes over the last five years:
For Amazone Kindle [4|Paperwhite] these settings work nicely: Code:
# if you want to reduce size for an b/w or E-ink device, uncomment the following 4 lines: compress_news_images = True #compress_news_images_auto_size = 16 scale_news_images = (400,300) compress_news_images_max_size = 35 Thanks again and looking forward to your comments. Hegi. |
03-22-2018, 05:04 PM | #34 |
Wizard
Posts: 1,161
Karma: 1404241
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
|
Thanks, you are welcome. It's fine for me.
DD PS: No need to ask for approval. I like your changes for the recipe. |
12-27-2020, 05:58 AM | #35 |
Enthusiast
Posts: 44
Karma: 10
Join Date: Dec 2012
Device: Kindle 4 & Kindle PW 3G
|
Hi Divingduck,
I noticed for some time, that for some format of articles, pictures are no longer downloaded with the recipe, while for other articles it still works. Havn't had a chance yet to dig deeper, but wonder, if you maybe had already a look at it? Cheers, Hegi. |
12-28-2020, 03:16 AM | #36 |
Wizard
Posts: 1,161
Karma: 1404241
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
|
Hi Hegi,
Hope you are well these day's. Yes I did, but quite some time ago |
12-29-2020, 04:55 AM | #37 |
Wizard
Posts: 1,161
Karma: 1404241
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
|
I made a quick update as I saw a small issue today.
@hegi , forgot to mention, you need to integrate your additional code for your kindle. I still use my old Sony device Best regards, DD Last edited by Divingduck; 12-29-2020 at 05:01 AM. |
12-29-2020, 05:22 AM | #38 | ||
Enthusiast
Posts: 44
Karma: 10
Join Date: Dec 2012
Device: Kindle 4 & Kindle PW 3G
|
Hi Divingduck,
thanks a lot for your quick reply and the new version of the recipe. According to your comments, this is the from March 2018, when we wrote about this the last time. - However, when doing a quick diff on the versions, there seems to be some changes. I currently load both versions (mine and this one) to compare the output. Cut it be, that you ommitted updating the comments (date/version) on your last adaptations? According to my analysis the most relevant difference between our versions is the following code within my recipe: Quote:
Quote:
Thanks a lot in advance ... Hegi. |
||
12-31-2020, 10:34 AM | #39 |
Enthusiast
Posts: 44
Karma: 10
Join Date: Dec 2012
Device: Kindle 4 & Kindle PW 3G
|
Hi Divingduck,
... just saw that you send another post on Tuesday, while I was preparing mine. Sorry for the confusion caused (if any) As it appears, your change fixed the picture-gallery-issue ... well at least almost . Within the galleries, there is some "extra content" - mostly "internal adds". - Have a look here: https://www.wiwo.de/politik/deutschl.../26760374.html. - As a result only the first 5 pics go into the ebook, and the additional text is only for 7 out of 17 in ... I suspect these are tweaks to nag readers to buy premium ... Due to the changes in the Articles (The Teaser-Text no longer seems to start with a location), this additional code of mine (no longer included in your version) for the css seems deprecated: Code:
.hcf-location-mark {font-style: italic; font-weight:bold} .c-overline {font-size: 1em; text-align: left;font-style: normal; font-weight:bold} I won't play with this any longer for now ... well let's say for this year . Thanks and all the best to you Hegi. |
08-04-2022, 01:56 PM | #40 |
Enthusiast
Posts: 44
Karma: 10
Join Date: Dec 2012
Device: Kindle 4 & Kindle PW 3G
|
Hi Divingduck,
I realised the picture galleries are broken again. - Did you have another go at the recipe in the last 19 months to optimise it? If so, I'd be glad if you could share your current version. Cheers, Hegi. |
08-05-2022, 04:31 AM | #41 |
Wizard
Posts: 1,161
Karma: 1404241
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
|
Hi hegi,
this is the version I actually use. There are missing as well for some articles pictures, but the recipe itself should work. I will take a look to the recipe this week end. Maybe there is something I can improve. Let me know how the attached version is working for you. Best regards, DD |
08-05-2022, 10:44 AM | #42 |
Wizard
Posts: 1,161
Karma: 1404241
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
|
I decided to build a new one as there was too many old obsolete stuff in. It should show all article pictures again.
Best regards, DD |
08-06-2022, 05:26 AM | #43 |
Enthusiast
Posts: 44
Karma: 10
Join Date: Dec 2012
Device: Kindle 4 & Kindle PW 3G
|
Hi Divingduck,
thanks for your effort and sharing the results. That's appreciated. While in your new version, pictures of galleries are shown, the old version does show the explanatory texts to the pictures (but not the pics itself) . Here is an example article for this behaviour: https://www.wiwo.de/technologie/mobi.../28567350.html Looking at the html-source there seems to be extra "</a>" tags at the end of each picture entry without corresponding "<a ...>" tags. Look here (last line of section): Code:
<div data-macro="lead-media" id="biga_wrapper"> <div class="o-media o-media--lead o-media--imagegallery u-margin-xxxxl "> <div class="c-imagegallery u-margin-xl " data-cy="imageGallery"> <div class="c-imagegallery__slick ajaxify" data-command='{"imageGallery": {"showAllURL": "/technologie/mobilitaet/e-mobility/585-ps-so-faehrt-sich-der-kia-ev6-gt/28567350.html", "nextSlidesToPreload": 5, "main" : true, "lastButtonText": "Alle Bilder anzeigen", "isPaid" : false}, "trackSlickNav": ""}'> <div class="c-imagegallery__slick-item"> <div class="c-imagegallery__image-wrapper u-margin-xxl u-margin-mobile-bottom-xl"> <div class="c-imagegallery__hs"> <div class="c-imagegallery__image--preloader"></div> <picture class="c-imagegallery__image c-imagegallery__image--landscape"> <source srcset="/images/kia_ev6gt_1/28567348/3-format10380.jpg, /images/kia_ev6gt_1/28567348/3-format10760.jpg 2x" class="js-picture--mobile" media="screen and (max-width: 767px)" > <source srcset="/images/kia_ev6gt_1/28567348/3-format11000.jpg, /images/kia_ev6gt_1/28567348/3-format12000.jpg 2x" media="screen and (min-width: 768px)" class="js-picture--desktop" > <img src="/images/kia_ev6gt_1/28567348/3-format11000.jpg" srcset="/images/kia_ev6gt_1/28567348/3-format11000.jpg, /images/kia_ev6gt_1/28567348/3-format12000.jpg 2x" alt="Fast 600 PS schlummern unter dem Blechkleid des neuen Kia EV6 GT Quelle: Kia"> </picture> </div> </div> <div class="c-metadata o-article__element u-flex u-margin-xl"> <div class="u-flex__item u-flex__item--static"> <div class="c-imagegallery__counter">1 / 10</div> </div> <div class="u-flex__item"> <p>Kia hat in den vergangenen Jahren einen bemerkenswerten Aufstieg hingelegt. Aus der einstigen Low-Budget-Marke ist einer der härtesten Konkurrenten für die europäischen und japanischen Autogiganten geworden. Mittlerweile wildern die Südkoreaner sogar im Revier von Tesla – dank der konzerneigenen Plattform E-GMP, auf dem auch der Konzernbruder Hyundai Ioniq 5 steht, nehmen sie bei reinen Elektrofahrzeugen eine technologische Führungsrolle ein. Auch die wahrscheinlich mindestens 70.000 Euro teure neue Spitzenversion der EV6-Baureihe, der 430 kW/585 PS starke GT, kann dank seines 800-Volt-Bordnetzes den 77,4 kWh großen Akku theoretisch in 18 Minuten von 10 auf 80 Prozent laden.</p> <p> Bild: Kia </a> Hope this helps Cheers, Hegi. |
08-07-2022, 04:11 AM | #44 |
Wizard
Posts: 1,161
Karma: 1404241
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
|
Hi Hegi,
I just saw your comment. Indeed, there is a problem with this kind of articles. It seems the parsing is somehow wrong. The pictures are in the download and in the recipe (you can see this when you download with debugging enable). I recognize, there is always a set of two pictures for one, a background mask and the picture itself, for the picture to display. Code:
<div class="calibre9"> <div class="calibre9"> <div class="calibre9"></div> <picture> <source media="screen and (max-width: 767px)" srcset="/images/kia_ev6gt_1/28567348/3-format10380.jpg, /images/kia_ev6gt_1/28567348/3-format10760.jpg 2x"> <source media="screen and (min-width: 768px)" srcset="/images/kia_ev6gt_1/28567348/3-format11000.jpg, /images/kia_ev6gt_1/28567348/3-format12000.jpg 2x"> <img alt="Fast 600 PS schlummern unter dem Blechkleid des neuen Kia EV6 GT Quelle: Kia" src="images/img1_u8.jpg" class="calibre3"> </picture> </div> </div> And maybe as well for the lost long description inside this container without destroying other articles. Guess I need to dig deeper in the api and look if there is something that can help me to do a better recipe. Last edited by Divingduck; 08-07-2022 at 04:48 AM. Reason: Edit because I was not ready with the answer... |
08-07-2022, 06:38 AM | #45 |
Enthusiast
Posts: 44
Karma: 10
Join Date: Dec 2012
Device: Kindle 4 & Kindle PW 3G
|
Hi Divingduck,
... no hurry needed. But i'd be glad if you could share the results, in case you reach a breaktrough at this issue. Thanks a lot in advance Hegi. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
LWN.net Weekly News recipe | davide125 | Recipes | 22 | 11-12-2014 09:44 PM |
Business Week Recipe duplicates | Mixx | Recipes | 0 | 09-16-2012 06:43 AM |
beam-ebooks.de: Recipe to download weekly new content? | Rince123 | Recipes | 0 | 01-02-2012 03:39 AM |
Recipe for Sunday Business Post - Ireland | anne.oneemas | Recipes | 15 | 12-13-2010 05:13 PM |
Recipe for Business Spectator (Australia) | RedDogInCan | Recipes | 1 | 12-01-2010 12:34 AM |