Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 03-20-2018, 03:26 PM   #31
hegi
Enthusiast
hegi began at the beginning.
 
Posts: 26
Karma: 10
Join Date: Dec 2012
Device: Kindle 4 & Kindle PW 3G
Hi Divingduck,

thanks for your hints. - Well the debug directory is what I already used for my last post. - The bit with the print statements comes in handy, however, when I try to fill these with the regexps, e.g. like this:

Code:
    print '*** c-overline tag    --->:', (re.compile(r'(<span class="c-overline">[^<]*)(</span>)', re.DOTALL|re.IGNORECASE), lambda match: match.group(1) + ': ' + match.group(2))
    print '*** hcf-location-mark --->:', (re.compile(r'(<span class="hcf-location-mark">[^<]*)(</span>)', re.DOTALL|re.IGNORECASE), lambda match: match.group(1) + '. ' + match.group(2))
I don't get any meaningful output. - Or can you make sense from stuff like:
Code:
*** c-overline tag    --->: (<_sre.SRE_Pattern object at 0x7fdef18de540>, <function <lambda> at 0x7fdee09ddaa0>)
*** hcf-location-mark --->: (<_sre.SRE_Pattern object at 0x7fdee0dcb5e8>, <function <lambda> at 0x7fdee09ddaa0>)
In theory, I'd be a step further, If I could manage to grab the match.group information for the logfile.

I think I'm really stuck here, and this is quite frustrating.

Thanks a lot in advance.

Hegi.
hegi is offline   Reply With Quote
Old 03-20-2018, 07:39 PM   #32
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,058
Karma: 1293081
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
Your welcome.

I had a bit time to take a closer look at the problem.

There are two things I saw.
One is, to remember when a regex will happen. You are using preprocess_regexps. This means this refer to the downloaded HTML as source input. Therefore you can check debug\input\ as your source for the regex to find out how the downloaded HTML file looks for calibre at the moment you are manipulate the file.
Second problem is the class you are looking for include spaces in its name and that do not to work (I think that had never work).

Taking that in account, I would make it slightly different. I don't take care about the complete class string, I look only for the end of the class name for a unique identification:

... c-overline--article"> ... </span> ...
Code:
(re.compile(r'(c-overline--article">[^>]*)(</span>)', re.DOTALL|re.IGNORECASE), lambda match: match.group(1) + ': ' + match.group(2))
I attach an updated version of the recipe.
Attached Files
File Type: zip WirtschaftsWoche_AGe_V4.3.zip (1.8 KB, 11 views)
Divingduck is offline   Reply With Quote
Old 03-22-2018, 04:10 PM   #33
hegi
Enthusiast
hegi began at the beginning.
 
Posts: 26
Karma: 10
Join Date: Dec 2012
Device: Kindle 4 & Kindle PW 3G
Thanks Divingduck,

... as usual, the problem lies in open sight and once you know the solution, everything seems simple and easy.

I took the freedom to merge my earlier fork from your recipe with your actual version, to come up with an improved version. - Please feel free to review and edit or enhance even further.

My evolutionary changes over the last five years:
  • add also regexp to add ". " after hcf-location-mark (the Place where the article is set).
  • further css entries for teaser text and other elements
  • options for conversion and duplicate articles
  • optional settings to reduce size on b/w readers
  • played a bit with tags filtering

For Amazone Kindle [4|Paperwhite] these settings work nicely:
Code:
    # if you want to reduce size for an b/w or E-ink device, uncomment the following 4 lines:
    compress_news_images  = True
    #compress_news_images_auto_size = 16
    scale_news_images     = (400,300)
    compress_news_images_max_size = 35
Currently one of my former versions ships with calibre OOTB. So, once you are happy with the combined efforts as well, we should ask Kovid to integrate the recipe upstream.

Thanks again and looking forward to your comments.

Hegi.
Attached Files
File Type: zip WiwoOnline_4.4.zip (2.0 KB, 12 views)
hegi is offline   Reply With Quote
Old 03-22-2018, 05:04 PM   #34
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,058
Karma: 1293081
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
Thanks, you are welcome. It's fine for me.

DD

PS: No need to ask for approval. I like your changes for the recipe.
Divingduck is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
LWN.net Weekly News recipe davide125 Recipes 22 11-12-2014 09:44 PM
Business Week Recipe duplicates Mixx Recipes 0 09-16-2012 06:43 AM
beam-ebooks.de: Recipe to download weekly new content? Rince123 Recipes 0 01-02-2012 03:39 AM
Recipe for Sunday Business Post - Ireland anne.oneemas Recipes 15 12-13-2010 05:13 PM
Recipe for Business Spectator (Australia) RedDogInCan Recipes 1 12-01-2010 12:34 AM


All times are GMT -4. The time now is 05:05 AM.


MobileRead.com is a privately owned, operated and funded community.