remove_tags_after with more values

niederrhymer · 01-17-2012, 07:45 AM

Hello,

I want to put feeds from more than a website into a single .epub. Basicly it works quiet good, but I run into problems with the details.

"feeds" obviously is an array:

PHP Code:


			
feeds = [

        (u'FAZ - Politik',u'http://www.faz.net/aktuell/politik/?rssview=1'),

        (u'RP - Duesseldorf',u'http://feeds.rp-online.de/rp-online/rss/duesseldorf-stadt?format=xml'),

        (u'International',u'http://www.nzz.ch/nachrichten/international?rss=true')    

]

Also "remove_tags".

But "remove_tags_before" and "remove_tags_after" requiere an single value.

Is it possible to rewrite the code so that I can use an array as well? I may have to rewrite (overwrite) a function of "BasicNewsRecipe". But which one?

Please, can you help me?

I am new to phyton but I got some expirience in other programming languages, e.g. Java, VB, C#. So please feel free to talk a bit more technicaly.

regards
Tom

roedi06 · 01-17-2012, 08:08 AM

Quote:

But "remove_tags_before" and "remove_tags_after" requiere an single value.

Is it possible to rewrite the code so that I can use an array as well? I may have to rewrite (overwrite) a function of "BasicNewsRecipe". But which one?

Can you explain me what te point is of multiple "remove_tags_before" and "remove_tags_after". If you use more of those, you are basicly implementing "remove_tags". The functions "before" and "after" are a bonus to make life easier if you just want to keep a certain portion, without the hassle of removing every other tag. From withing those tags, you need to specify other remove_tags ofcourse.

niederrhymer · 01-17-2012, 09:00 AM

Yes, I can. Sorry for that, I thought it was obviously because I use more than one newssource.

I need this for the "Neue Züricher Zeitung" and for the "Rheinische Post" and certainly for some more.

All those newssources should be collected in _one_ single EPUB File.

Quote:

Originally Posted by roedi06

Can you explain me what te point is of multiple "remove_tags_before" and "remove_tags_after".

"remove_tags" is quite good if you only want to remove some pics, or pic undertitle. It's a mess to substitute "remove_tags_after" and "remove_tags_before" with it.

Quote:

If you use more of those, you are basicly implementing "remove_tags".

That's exactly what I want and I want it depending on my current source, e.g. "remove_tags_before = dict(id='headline')" for the Rheinische Post and "remove_tags_before = dict(name='p', attrs={'class':'dachzeile'})" for the NZZ. And I don't want to have two recipes and two epub files.

Quote:

The functions "before" and "after" are a bonus to make life easier if you just want to keep a certain portion, without the hassle of removing every other tag. From withing those tags, you need to specify other remove_tags ofcourse.

Any ideas?

roedi06 · 01-17-2012, 12:42 PM

I see what you mean now. I didn't read it well enough. For multiple feeds, it does make sense what you ask.

Can't you work with; IF this feed is selected than remove_before and after is THIS, and so on?

kovidgoyal · 01-17-2012, 12:48 PM

No, you cannot have per feed values in remove_tags_* . You will need to implement the cleanup yourself, in preprocess_html

niederrhymer · 01-18-2012, 04:15 AM

Thanks kovidgoyal,

I've seen "preprocess_html" yesterday by myself and started to cope with soup. If I got a solution, I'll post it here, but I think it'll take some time. Lerning Python + a new lib + less spare time.

I'll keep you up-to-date.

sup · 08-20-2016, 10:44 PM

Quote:

Originally Posted by kovidgoyal

No, you cannot have per feed values in remove_tags_* . You will need to implement the cleanup yourself, in preprocess_html

If that is indeed the case, would you please update the documentation here ?

remove_tags_after shows this example:

Code:

remove_tags_after = [dict(id='content')]

which is actually wrong because if used like that (with the "[]" making it into a list), I got this:

Code:

TypeError: find() argument after ** must be a mapping, not list

Also, the documentation should note that even if the basic syntax is the same as with remove_tags, it must not be a list.

(BTW: I would love if one could use lists in these cases as well. I am writing a recipe for one magazine and it uses some special formatting for certain articles, so for those articles I have to somehow re-implement remove_tags_before myself, hopefully it wont be that hard:-)).

kovidgoyal · 08-21-2016, 12:15 AM

Lists containing dicts work perfectly well for remove_tag_after. That has nothing to do with them being per-feed.

remove_tag_after can be either a dict or a list containing dicts.

sup · 08-21-2016, 08:28 AM

Sorry, my bad. I meant remove_tag_before. True, the example for that is not a list, but it still links to the remove_tags syntax that says it takes a list of dicts. A sentence saying that remove_tag_before only accepts single dicts and not lists of them would be helpful (making it accepts lists would be even better:-)).

kovidgoyal · 08-21-2016, 10:41 AM

You can match multiple kinds of things with a single dict, but here you go:

https://github.com/kovidgoyal/calibr...dd72bb84cfd12e

sup · 08-21-2016, 10:51 AM

Thanks!

01-17-2012, 07:45 AM	#1
niederrhymer Junior Member Posts: 3 Karma: 10 Join Date: Jan 2012 Device: SGT	remove_tags_after with more values Hello, I want to put feeds from more than a website into a single .epub. Basicly it works quiet good, but I run into problems with the details. "feeds" obviously is an array: PHP Code: `feeds = [ (u'FAZ - Politik',u'http://www.faz.net/aktuell/politik/?rssview=1'), (u'RP - Duesseldorf',u'http://feeds.rp-online.de/rp-online/rss/duesseldorf-stadt?format=xml'), (u'International',u'http://www.nzz.ch/nachrichten/international?rss=true') ]` Also "remove_tags". But "remove_tags_before" and "remove_tags_after" requiere an single value. Is it possible to rewrite the code so that I can use an array as well? I may have to rewrite (overwrite) a function of "BasicNewsRecipe". But which one? Please, can you help me? I am new to phyton but I got some expirience in other programming languages, e.g. Java, VB, C#. So please feel free to talk a bit more technicaly. regards Tom

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Boolean custom column values	sengian	Library Management	4	01-05-2012 06:39 PM
Recalculate all author sort values	sparrowdclxvi	Library Management	8	01-05-2012 12:48 PM
Help finding Metadata Names and Values?	Sabardeyn	ePub	3	04-03-2010 12:16 AM
Could we adjust the time-out values?	Darqref	Feedback	9	01-04-2010 03:43 PM
PRS-500 layout values in cache.xml	kenbaldwin	Sony Reader Dev Corner	12	03-03-2009 08:02 PM

01-17-2012, 12:42 PM	#4
roedi06 Junior Member Posts: 9 Karma: 10 Join Date: Jan 2012 Device: SONY PRS-T1	I see what you mean now. I didn't read it well enough. For multiple feeds, it does make sense what you ask. Can't you work with; IF this feed is selected than remove_before and after is THIS, and so on?

01-17-2012, 12:48 PM	#5
kovidgoyal creator of calibre Posts: 45,699 Karma: 28549304 Join Date: Oct 2006 Location: Mumbai, India Device: Various	No, you cannot have per feed values in remove_tags_* . You will need to implement the cleanup yourself, in preprocess_html

01-18-2012, 04:15 AM	#6
niederrhymer Junior Member Posts: 3 Karma: 10 Join Date: Jan 2012 Device: SGT	Thanks kovidgoyal, I've seen "preprocess_html" yesterday by myself and started to cope with soup. If I got a solution, I'll post it here, but I think it'll take some time. Lerning Python + a new lib + less spare time. I'll keep you up-to-date.

08-21-2016, 12:15 AM	#8
kovidgoyal creator of calibre Posts: 45,699 Karma: 28549304 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Lists containing dicts work perfectly well for remove_tag_after. That has nothing to do with them being per-feed. remove_tag_after can be either a dict or a list containing dicts.

08-21-2016, 08:28 AM	#9
sup Zealot Posts: 106 Karma: 10 Join Date: Sep 2013 Device: Kindle Paperwhite (2012)	Sorry, my bad. I meant remove_tag_before. True, the example for that is not a list, but it still links to the remove_tags syntax that says it takes a list of dicts. A sentence saying that remove_tag_before only accepts single dicts and not lists of them would be helpful (making it accepts lists would be even better:-)).

08-21-2016, 10:41 AM	#10
kovidgoyal creator of calibre Posts: 45,699 Karma: 28549304 Join Date: Oct 2006 Location: Mumbai, India Device: Various	You can match multiple kinds of things with a single dict, but here you go: https://github.com/kovidgoyal/calibr...dd72bb84cfd12e

08-21-2016, 10:51 AM	#11
sup Zealot Posts: 106 Karma: 10 Join Date: Sep 2013 Device: Kindle Paperwhite (2012)	Thanks!

Advert

Advert