Calibre recipes - Page 5

Happy Mullet · 02-02-2013, 07:38 AM

Quote:

Originally Posted by kovidgoyal

https://calibre.kovidgoyal.net/wiki/UserRecipes

Normally, when you try to connect securely,
sites will present trusted identification to prove that you are
going to the right place. However, this site's identity can't be verified.

Mozilla gave me this when I tried to access the page? Scared me of course.
Thanks,
Dave

dgvirtual · 10-17-2013, 06:55 AM

I was wondering if someone could help me make a recipe for a news source http://www.lrytas.lt. The website divides longer articles into pages, but you can access the whole article via print version. However, I do not know how to produce the print version.

Here is the link of an multipage article accessed through the rss channel (http://www.lrytas.lt/kiti/rss.htm):

http://www.lrytas.lt/-13819312591379...m_campaign=rss

and here is the print version:

http://www.lrytas.lt/print.asp?k=new...12591379871087

I would also like to cut away the header "printed from www.lrytas.lt", the code of which reads like this:

<tr align="left">
<td width="140"><img src="/img/logo_small.gif" alt="Lietuvos Rytas Logo" style="border: 0pt none ;"></td><td ><strong>Šis puslapis atspausdintas iš http://www.lrytas.lt</strong>
</td>
</tr>

Help would be appreciated. I tried to do it by trying to learn from other recipies, but failed.

mauropiccolo · 10-20-2013, 03:29 PM

Quote:

Originally Posted by dgvirtual

I was wondering if someone could help me make a recipe for a news source http://www.lrytas.lt. The website divides longer articles into pages, but you can access the whole article via print version. However, I do not know how to produce the print version.
....

Try this,
it is not optimized, but seems to work.
Regards, Mauro

Code:

#!/usr/bin/env  python
# -*- coding: utf-8 -*-

__license__   = 'GPL v3'
__author__ = "mauropiccolo"

import re

class AdvancedUserRecipe1382294260(BasicNewsRecipe):
    title          = u'http://www.lrytas.lt/'
    oldest_article = 7
    max_articles_per_feed = 100
    auto_cleanup = True

    feeds = [(u'Energetika',u'http://www.lrytas.lt/rss/?tema=47')]
    
    def print_version(self, url):
        soup = self.index_to_soup(url)
        a = soup.find("a", attrs={"href":re.compile('^/print\.asp')})
        if a:
            url = 'http://www.lrytas.lt'+a["href"]
        return url

dgvirtual · 10-23-2013, 07:19 AM

Quote:

Originally Posted by mauropiccolo

Try this,
it is not optimized, but seems to work.
Regards, Mauro

Thank you a lot! It does fetch whole articles now, but it cuts away each title in the article page (titles are there in the TOC pages as well as ebook TOC), and every article has a title "lrytas.lt Puslapis spausdinimui" (which means "lrytas.lt print page".

How do I bring back the titles?

mauropiccolo · 10-23-2013, 02:08 PM

Quote:

Originally Posted by dgvirtual

... but it cuts away each title in the article page (titles are there in the TOC pages as well as ebook TOC), and every article has a title "lrytas.lt Puslapis spausdinimui" (which means "lrytas.lt print page".
How do I bring back the titles?

Sorry, but i can't reproduce the problem, at least with calibre ebook-viewer.

See attach, there is recipe and output.
$ ebook-convert lrytas.recipe .mobi --output-profile kindle --test

Do You see the problem in my output too ?

dgvirtual · 10-29-2013, 04:28 PM

I tried running it in command line and producing mobi file (to get fuller picture I omitted "--test"). And yes, I do get the same (or a very similar) problem. Only about every third article has a title. Here you can see the file I generated: https://db.tt/wavF6eN4

Here is a page that ended up without a title: http://www.lrytas.lt/print.asp?k=new...15991382340219
and this one got a title: http://www.lrytas.lt/print.asp?k=new...75391380839523

I have no clue why...

mauropiccolo · 10-30-2013, 12:08 PM

Quote:

Originally Posted by dgvirtual

.....I have no clue why...

me too,
try this

Code:

#!/usr/bin/env  python
# -*- coding: utf-8 -*-

__license__   = 'GPL v3'
__author__ = "mauropiccolo"

import re

class AdvancedUserRecipe1382294260(BasicNewsRecipe):
    title          = u'http://www.lrytas.lt/'
    oldest_article = 7
    max_articles_per_feed = 100
    auto_cleanup = True
    recursions = 5
    
    feeds = [(u'Energetika',u'http://www.lrytas.lt/rss/?tema=47')]
    
    def is_link_wanted(self, url, tag):
        desc = self.tag_to_string(tag,False)
        if "psl. &gt;&gt;" in desc:
            self.log('Following multipage link: %s'%url)
            return True
        else:
            return False

dgvirtual · 10-30-2013, 03:09 PM

Now I see all the headings, but some headings are not followed by the right text

Here is the output file:
https://db.tt/kIAAmZIH

And here is a screenshot of a fragment of wrong text you will see in every second article instead of the right text:

https://db.tt/OyBs9od7

Just search for text fragment "Lietuvoje gaminama elektros energija yra pernelyg brangi", it is repeated in 12 different articles.

Sorryt to trouble you with this problem. It is easier to read the news online

10-17-2013, 06:55 AM	#62
dgvirtual Enthusiast Posts: 30 Karma: 2848 Join Date: Feb 2013 Location: Lithuania Device: Kobo Glo	help needed (print version of an article) I was wondering if someone could help me make a recipe for a news source http://www.lrytas.lt. The website divides longer articles into pages, but you can access the whole article via print version. However, I do not know how to produce the print version. Here is the link of an multipage article accessed through the rss channel (http://www.lrytas.lt/kiti/rss.htm): http://www.lrytas.lt/-13819312591379...m_campaign=rss and here is the print version: http://www.lrytas.lt/print.asp?k=new...12591379871087 I would also like to cut away the header "printed from www.lrytas.lt", the code of which reads like this: <tr align="left"> <td width="140"><img src="/img/logo_small.gif" alt="Lietuvos Rytas Logo" style="border: 0pt none ;"></td><td ><strong>Šis puslapis atspausdintas iš http://www.lrytas.lt</strong> </td> </tr> Help would be appreciated. I tried to do it by trying to learn from other recipies, but failed.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Help with calibre recipes	CaptainJSK	Calibre	1	07-11-2010 01:12 AM
Calibre Recipes and iPad/iBooks	jbambridge	Calibre	8	05-16-2010 04:30 PM
Classification of Recipes in Calibre	wayner	Calibre	3	11-27-2009 09:48 AM
Problem with my recipes (Calibre 0.6.2)	MikeBoud	Calibre	18	08-05-2009 10:20 PM

10-29-2013, 04:28 PM	#66
dgvirtual Enthusiast Posts: 30 Karma: 2848 Join Date: Feb 2013 Location: Lithuania Device: Kobo Glo	I tried running it in command line and producing mobi file (to get fuller picture I omitted "--test"). And yes, I do get the same (or a very similar) problem. Only about every third article has a title. Here you can see the file I generated: https://db.tt/wavF6eN4 Here is a page that ended up without a title: http://www.lrytas.lt/print.asp?k=new...15991382340219 and this one got a title: http://www.lrytas.lt/print.asp?k=new...75391380839523 I have no clue why...

10-30-2013, 03:09 PM	#68
dgvirtual Enthusiast Posts: 30 Karma: 2848 Join Date: Feb 2013 Location: Lithuania Device: Kobo Glo	Now I see all the headings, but some headings are not followed by the right text Here is the output file: https://db.tt/kIAAmZIH And here is a screenshot of a fragment of wrong text you will see in every second article instead of the right text: https://db.tt/OyBs9od7 Just search for text fragment "Lietuvoje gaminama elektros energija yra pernelyg brangi", it is repeated in 12 different articles. Sorryt to trouble you with this problem. It is easier to read the news online