|
|
#1 |
|
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Feb 2013
Device: Kindle Keyboard
|
Include Author and Publication Date from feed
I believe this information is included in the feed data and in the page of the article itself. The feed in question is: http://politikon.es/feed/ Many thanks for your help. Perhaps seeing how this is done will help me make heads or tails of the rest.
|
|
|
|
|
|
#2 |
|
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Feb 2013
Device: Kindle Keyboard
|
First efforts
Here is my first effort. The articles are pretty clean and auto cleanup works fine. I just wanted to include a line under the title with the date and author. It looks like it is contained in a div class "post-meta", but auto_cleanup_keep is not enough to pull in this data it seems.
Spoiler:
Thanks for any help |
|
|
|
|
Enthusiast
|
|
|
|
#3 |
|
Creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 22,509
Karma: 2944574
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
If you cant get it to work with auto_cleanup_keep you will have to cleanup manually using the remove_tags keep_only_tags directives instead.
__________________
Get calibre Notice to all: I can not provide assistance with DRM removal, for legal reasons, so please do not contact me about it. |
|
|
|
|
|
#4 |
|
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Feb 2013
Device: Kindle Keyboard
|
A little victory
Revisiting this again, I think I have achieved what I wanted. But I have some questions and would appreciate anyone shining some light on the subject.
The solution All the information I wanted was there in the HTML of the article website. I used auto_cleanup, since it had been working fine, but used auto_cleanup_keep to include all the tags around that information. This was three levels deep in some cases. Also, one of the tags I wanted was <abbr>, which being strange I substitute for a wildcard (*), since I suspect that might have been causing it to fail previously. I also had to choose an unusual attribute for one <span> (rel) since there was no 'id' or 'class' and the title was too specific. To achieve all this I had to set use_embedded_content=False. Here's how it came out: Spoiler:
The problems As I said, I had to force Calibre not to use the embedded content, although it is all there and I can identify the bits of information I want very easily in the source of the RSS feed. Applying the same technique, however, does not yield the results I want. I don't understand how Calibre is picking up and using the tags from the RSS source. I am not a programmer and from what I have read I cannot understand enough of what is going on behind the scenes. Enabling a few HTML tags I get, but the RSS content surely requires more processing. Cheers for any advice/pointers regarding the RSS issue. Since the data is in the feed it seems preferable to use it from there. |
|
|
|
![]() |
| Tags |
| author, publication date |
| Thread Tools | Search this Thread |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Publication Date or Copyright Date or ??? | hd_cal_dave | Library Management | 8 | 05-25-2012 01:50 PM |
| Problem: Date of Publication... | samy2 | Calibre | 2 | 03-02-2012 05:09 AM |
| How to Include Date in Title? | awitko | Recipes | 2 | 11-02-2011 04:40 PM |
| Date of Publication Metadata | crutledge | Sigil | 5 | 01-10-2011 01:27 PM |
| Is there any way to control publication date? | weasal | Recipes | 4 | 09-27-2010 12:37 PM |