Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 02-02-2013, 07:38 AM   #61
Happy Mullet
New User
Happy Mullet began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Feb 2013
Location: Onboard boat
Device: Nextbook7-se
Question

Quote:
Originally Posted by kovidgoyal View Post
Normally, when you try to connect securely,
sites will present trusted identification to prove that you are
going to the right place. However, this site's identity can't be verified.

Mozilla gave me this when I tried to access the page? Scared me of course.
Thanks,
Dave
Happy Mullet is offline   Reply With Quote
Old 10-17-2013, 06:55 AM   #62
dgvirtual
Enthusiast
dgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with others
 
dgvirtual's Avatar
 
Posts: 30
Karma: 2848
Join Date: Feb 2013
Location: Lithuania
Device: Kobo Glo
Post help needed (print version of an article)

I was wondering if someone could help me make a recipe for a news source http://www.lrytas.lt. The website divides longer articles into pages, but you can access the whole article via print version. However, I do not know how to produce the print version.

Here is the link of an multipage article accessed through the rss channel (http://www.lrytas.lt/kiti/rss.htm):

http://www.lrytas.lt/-13819312591379...m_campaign=rss

and here is the print version:

http://www.lrytas.lt/print.asp?k=new...12591379871087

I would also like to cut away the header "printed from www.lrytas.lt", the code of which reads like this:

<tr align="left">
<td width="140"><img src="/img/logo_small.gif" alt="Lietuvos Rytas Logo" style="border: 0pt none ;"></td><td ><strong>Šis puslapis atspausdintas iš http://www.lrytas.lt</strong>
</td>
</tr>

Help would be appreciated. I tried to do it by trying to learn from other recipies, but failed.
dgvirtual is offline   Reply With Quote
Advert
Old 10-20-2013, 03:29 PM   #63
mauropiccolo
Member
mauropiccolo began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Sep 2013
Device: kindle
Quote:
Originally Posted by dgvirtual View Post
I was wondering if someone could help me make a recipe for a news source http://www.lrytas.lt. The website divides longer articles into pages, but you can access the whole article via print version. However, I do not know how to produce the print version.
....
Try this,
it is not optimized, but seems to work.
Regards, Mauro

Code:
#!/usr/bin/env  python
# -*- coding: utf-8 -*-

__license__   = 'GPL v3'
__author__ = "mauropiccolo"

import re

class AdvancedUserRecipe1382294260(BasicNewsRecipe):
    title          = u'http://www.lrytas.lt/'
    oldest_article = 7
    max_articles_per_feed = 100
    auto_cleanup = True

    feeds = [(u'Energetika',u'http://www.lrytas.lt/rss/?tema=47')]
    
    def print_version(self, url):
        soup = self.index_to_soup(url)
        a = soup.find("a", attrs={"href":re.compile('^/print\.asp')})
        if a:
            url = 'http://www.lrytas.lt'+a["href"]
        return url
mauropiccolo is offline   Reply With Quote
Old 10-23-2013, 07:19 AM   #64
dgvirtual
Enthusiast
dgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with others
 
dgvirtual's Avatar
 
Posts: 30
Karma: 2848
Join Date: Feb 2013
Location: Lithuania
Device: Kobo Glo
Quote:
Originally Posted by mauropiccolo View Post
Try this,
it is not optimized, but seems to work.
Regards, Mauro
Thank you a lot! It does fetch whole articles now, but it cuts away each title in the article page (titles are there in the TOC pages as well as ebook TOC), and every article has a title "lrytas.lt Puslapis spausdinimui" (which means "lrytas.lt print page".

How do I bring back the titles?
dgvirtual is offline   Reply With Quote
Old 10-23-2013, 02:08 PM   #65
mauropiccolo
Member
mauropiccolo began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Sep 2013
Device: kindle
Quote:
Originally Posted by dgvirtual View Post
... but it cuts away each title in the article page (titles are there in the TOC pages as well as ebook TOC), and every article has a title "lrytas.lt Puslapis spausdinimui" (which means "lrytas.lt print page".
How do I bring back the titles?
Sorry, but i can't reproduce the problem, at least with calibre ebook-viewer.

See attach, there is recipe and output.
$ ebook-convert lrytas.recipe .mobi --output-profile kindle --test

Do You see the problem in my output too ?
mauropiccolo is offline   Reply With Quote
Advert
Old 10-29-2013, 04:28 PM   #66
dgvirtual
Enthusiast
dgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with others
 
dgvirtual's Avatar
 
Posts: 30
Karma: 2848
Join Date: Feb 2013
Location: Lithuania
Device: Kobo Glo
I tried running it in command line and producing mobi file (to get fuller picture I omitted "--test"). And yes, I do get the same (or a very similar) problem. Only about every third article has a title. Here you can see the file I generated: https://db.tt/wavF6eN4

Here is a page that ended up without a title: http://www.lrytas.lt/print.asp?k=new...15991382340219
and this one got a title: http://www.lrytas.lt/print.asp?k=new...75391380839523

I have no clue why...
dgvirtual is offline   Reply With Quote
Old 10-30-2013, 12:08 PM   #67
mauropiccolo
Member
mauropiccolo began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Sep 2013
Device: kindle
Quote:
Originally Posted by dgvirtual View Post
.....I have no clue why...
me too,
try this
Code:
#!/usr/bin/env  python
# -*- coding: utf-8 -*-

__license__   = 'GPL v3'
__author__ = "mauropiccolo"

import re

class AdvancedUserRecipe1382294260(BasicNewsRecipe):
    title          = u'http://www.lrytas.lt/'
    oldest_article = 7
    max_articles_per_feed = 100
    auto_cleanup = True
    recursions = 5
    
    feeds = [(u'Energetika',u'http://www.lrytas.lt/rss/?tema=47')]
    
    def is_link_wanted(self, url, tag):
        desc = self.tag_to_string(tag,False)
        if "psl. &gt;&gt;" in desc:
            self.log('Following multipage link: %s'%url)
            return True
        else:
            return False
mauropiccolo is offline   Reply With Quote
Old 10-30-2013, 03:09 PM   #68
dgvirtual
Enthusiast
dgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with others
 
dgvirtual's Avatar
 
Posts: 30
Karma: 2848
Join Date: Feb 2013
Location: Lithuania
Device: Kobo Glo
Now I see all the headings, but some headings are not followed by the right text

Here is the output file:
https://db.tt/kIAAmZIH

And here is a screenshot of a fragment of wrong text you will see in every second article instead of the right text:

https://db.tt/OyBs9od7

Just search for text fragment "Lietuvoje gaminama elektros energija yra pernelyg brangi", it is repeated in 12 different articles.

Sorryt to trouble you with this problem. It is easier to read the news online
dgvirtual is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Help with calibre recipes CaptainJSK Calibre 1 07-11-2010 01:12 AM
Calibre Recipes and iPad/iBooks jbambridge Calibre 8 05-16-2010 04:30 PM
Classification of Recipes in Calibre wayner Calibre 3 11-27-2009 09:48 AM
Problem with my recipes (Calibre 0.6.2) MikeBoud Calibre 18 08-05-2009 10:20 PM


All times are GMT -4. The time now is 11:26 AM.


MobileRead.com is a privately owned, operated and funded community.