Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 03-20-2018, 04:26 PM   #31
hegi
Enthusiast
hegi began at the beginning.
 
Posts: 34
Karma: 10
Join Date: Dec 2012
Device: Kindle 4 & Kindle PW 3G
Hi Divingduck,

thanks for your hints. - Well the debug directory is what I already used for my last post. - The bit with the print statements comes in handy, however, when I try to fill these with the regexps, e.g. like this:

Code:
    print '*** c-overline tag    --->:', (re.compile(r'(<span class="c-overline">[^<]*)(</span>)', re.DOTALL|re.IGNORECASE), lambda match: match.group(1) + ': ' + match.group(2))
    print '*** hcf-location-mark --->:', (re.compile(r'(<span class="hcf-location-mark">[^<]*)(</span>)', re.DOTALL|re.IGNORECASE), lambda match: match.group(1) + '. ' + match.group(2))
I don't get any meaningful output. - Or can you make sense from stuff like:
Code:
*** c-overline tag    --->: (<_sre.SRE_Pattern object at 0x7fdef18de540>, <function <lambda> at 0x7fdee09ddaa0>)
*** hcf-location-mark --->: (<_sre.SRE_Pattern object at 0x7fdee0dcb5e8>, <function <lambda> at 0x7fdee09ddaa0>)
In theory, I'd be a step further, If I could manage to grab the match.group information for the logfile.

I think I'm really stuck here, and this is quite frustrating.

Thanks a lot in advance.

Hegi.
hegi is offline   Reply With Quote
Old 03-20-2018, 08:39 PM   #32
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,150
Karma: 1404167
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
Your welcome.

I had a bit time to take a closer look at the problem.

There are two things I saw.
One is, to remember when a regex will happen. You are using preprocess_regexps. This means this refer to the downloaded HTML as source input. Therefore you can check debug\input\ as your source for the regex to find out how the downloaded HTML file looks for calibre at the moment you are manipulate the file.
Second problem is the class you are looking for include spaces in its name and that do not to work (I think that had never work).

Taking that in account, I would make it slightly different. I don't take care about the complete class string, I look only for the end of the class name for a unique identification:

... c-overline--article"> ... </span> ...
Code:
(re.compile(r'(c-overline--article">[^>]*)(</span>)', re.DOTALL|re.IGNORECASE), lambda match: match.group(1) + ': ' + match.group(2))
I attach an updated version of the recipe.
Attached Files
File Type: zip WirtschaftsWoche_AGe_V4.3.zip (1.8 KB, 181 views)
Divingduck is offline   Reply With Quote
Advert
Old 03-22-2018, 05:10 PM   #33
hegi
Enthusiast
hegi began at the beginning.
 
Posts: 34
Karma: 10
Join Date: Dec 2012
Device: Kindle 4 & Kindle PW 3G
Thanks Divingduck,

... as usual, the problem lies in open sight and once you know the solution, everything seems simple and easy.

I took the freedom to merge my earlier fork from your recipe with your actual version, to come up with an improved version. - Please feel free to review and edit or enhance even further.

My evolutionary changes over the last five years:
  • add also regexp to add ". " after hcf-location-mark (the Place where the article is set).
  • further css entries for teaser text and other elements
  • options for conversion and duplicate articles
  • optional settings to reduce size on b/w readers
  • played a bit with tags filtering

For Amazone Kindle [4|Paperwhite] these settings work nicely:
Code:
    # if you want to reduce size for an b/w or E-ink device, uncomment the following 4 lines:
    compress_news_images  = True
    #compress_news_images_auto_size = 16
    scale_news_images     = (400,300)
    compress_news_images_max_size = 35
Currently one of my former versions ships with calibre OOTB. So, once you are happy with the combined efforts as well, we should ask Kovid to integrate the recipe upstream.

Thanks again and looking forward to your comments.

Hegi.
Attached Files
File Type: zip WiwoOnline_4.4.zip (2.0 KB, 187 views)
hegi is offline   Reply With Quote
Old 03-22-2018, 06:04 PM   #34
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,150
Karma: 1404167
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
Thanks, you are welcome. It's fine for me.

DD

PS: No need to ask for approval. I like your changes for the recipe.
Divingduck is offline   Reply With Quote
Old 12-27-2020, 06:58 AM   #35
hegi
Enthusiast
hegi began at the beginning.
 
Posts: 34
Karma: 10
Join Date: Dec 2012
Device: Kindle 4 & Kindle PW 3G
Hi Divingduck,
I noticed for some time, that for some format of articles, pictures are no longer downloaded with the recipe, while for other articles it still works.

Havn't had a chance yet to dig deeper, but wonder, if you maybe had already a look at it?
Cheers,
Hegi.
hegi is offline   Reply With Quote
Advert
Old 12-28-2020, 04:16 AM   #36
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,150
Karma: 1404167
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
Hi Hegi,
Hope you are well these day's.

Yes I did, but quite some time ago
Attached Files
File Type: zip WirtschaftsWoche_AGe_V4.3.zip (1.8 KB, 106 views)
Divingduck is offline   Reply With Quote
Old 12-29-2020, 05:55 AM   #37
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,150
Karma: 1404167
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
I made a quick update as I saw a small issue today.

@hegi , forgot to mention, you need to integrate your additional code for your kindle. I still use my old Sony device

Best regards,
DD
Attached Files
File Type: zip WirtschaftsWoche_AGe_V4.4.zip (1.8 KB, 92 views)

Last edited by Divingduck; 12-29-2020 at 06:01 AM.
Divingduck is offline   Reply With Quote
Old 12-29-2020, 06:22 AM   #38
hegi
Enthusiast
hegi began at the beginning.
 
Posts: 34
Karma: 10
Join Date: Dec 2012
Device: Kindle 4 & Kindle PW 3G
Hi Divingduck,
thanks a lot for your quick reply and the new version of the recipe.

According to your comments, this is the from March 2018, when we wrote about this the last time. - However, when doing a quick diff on the versions, there seems to be some changes.

I currently load both versions (mine and this one) to compare the output.
Cut it be, that you ommitted updating the comments (date/version) on your last adaptations?

According to my analysis the most relevant difference between our versions is the following code within my recipe:

Quote:
# don't duplicate articles from "Schlagzeilen" / "Exklusiv" to other rubrics
ignore_duplicate_articles = {'title', 'url'}
However, the problem observed remains. If there are picture-galleries like in this article (https://www.wiwo.de/unternehmen/auto.../26185402.html), you get in the output only the text of the gallery like this:

Quote:
1 / 8

Volkswagens neues E-Modell ID.4 feierte Ende September digitale Weltpremiere. Vorbestellt werden konnte er schon, nun soll er in den ersten Wochen des neuen Jahres auch zu den Kunden rollen.

Bild: Volkswagen

2 / 8

Wo der ID.3 in der schrumpfenden Kompaktklasse antritt, startet der ID.4 im Boom-Segment der handlichen Geländewagen. Und während es den einen nur in Europa geben wird, feiern die Niedersachsen den anderen als Weltauto. Kein anderes Auto, so meint man bei VW, wird wichtiger im Kampf gegen Tesla & Co. Kein Wunder also, dass der Konzern reichlich trommelt für den elektrischen Weltbürger in Spe und bereits vor der offiziellen Enthüllung im Spätsommer zu einer ersten Ausfahrt im nur noch dezent getarnten Prototypen auf das sonst so streng geheime Testgelände in Ehra-Lessien bat.

Bild: Volkswagen
Any ideas as how to tackle this issue?

Thanks a lot in advance ...

Hegi.
hegi is offline   Reply With Quote
Old 12-31-2020, 11:34 AM   #39
hegi
Enthusiast
hegi began at the beginning.
 
Posts: 34
Karma: 10
Join Date: Dec 2012
Device: Kindle 4 & Kindle PW 3G
Hi Divingduck,
... just saw that you send another post on Tuesday, while I was preparing mine. Sorry for the confusion caused (if any)

As it appears, your change fixed the picture-gallery-issue ... well at least almost . Within the galleries, there is some "extra content" - mostly "internal adds". - Have a look here: https://www.wiwo.de/politik/deutschl.../26760374.html. - As a result only the first 5 pics go into the ebook, and the additional text is only for 7 out of 17 in ... I suspect these are tweaks to nag readers to buy premium ...

Due to the changes in the Articles (The Teaser-Text no longer seems to start with a location), this additional code of mine (no longer included in your version) for the css seems deprecated:
Code:
   .hcf-location-mark {font-style: italic; font-weight:bold}                                 
   .c-overline {font-size: 1em; text-align: left;font-style: normal; font-weight:bold}
However, I'm a bit surprised, that the .c-overline format does not catch any more (also tried other stuff, like italics and smaller fonts), as the insertion of the colon afterwards in your code still works ...

I won't play with this any longer for now ... well let's say for this year .

Thanks and all the best to you
Hegi.
hegi is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
LWN.net Weekly News recipe davide125 Recipes 22 11-12-2014 10:44 PM
Business Week Recipe duplicates Mixx Recipes 0 09-16-2012 07:43 AM
beam-ebooks.de: Recipe to download weekly new content? Rince123 Recipes 0 01-02-2012 04:39 AM
Recipe for Sunday Business Post - Ireland anne.oneemas Recipes 15 12-13-2010 06:13 PM
Recipe for Business Spectator (Australia) RedDogInCan Recipes 1 12-01-2010 01:34 AM


All times are GMT -4. The time now is 09:05 PM.


MobileRead.com is a privately owned, operated and funded community.