news feed to text file recipe
Dear All,
I am new to the forum but I think I searched pretty good for anything related and found nothing close to the answer, so please do not humiliate me if I missed something obvious.
I would like the news articles to be fetched from RSS and stored in txt format, all in one file. I am playing with Calibre for some time now, and appreciate the flexibility of the tool very much, but I am ground zero in python unfortunately.
Below is how far I got so far on the subject, if anyone could advance, or give any suggestions, or tell me if I am going in the wrong direction, please do.
The best option is ofcourse a sample recipe, but I think it is too much to ask for...
Thanks a lot in advance,
Redp
how far I got:
1. So I understand that I need to find a tag name identifying the main body of text in a specific newspaper. For one newspaper, it is div class="text", for example.
2. Then I shall use the keep_only_tags thing to have only that text in the html stored.
3. Then I suspect I need to redefine the preprocess_html and inside it somehow make use of tag_to_string to have only plain text stored.
4. Finally, I have to concat all the kind-of "html" into one file -- the txt file having all news in plain txt inside.
Huh, I managed to write a very simple recipe to account for items 1 and 2, but I have no idea if items 3 and 4 are a good plan. Even if it is a good plan, I feel so miserable because I do not understand python...
|