02-02-2013, 07:38 AM | #61 | |
New User
Posts: 1
Karma: 10
Join Date: Feb 2013
Location: Onboard boat
Device: Nextbook7-se
|
Quote:
sites will present trusted identification to prove that you are going to the right place. However, this site's identity can't be verified. Mozilla gave me this when I tried to access the page? Scared me of course. Thanks, Dave |
|
10-17-2013, 06:55 AM | #62 |
Enthusiast
Posts: 30
Karma: 2848
Join Date: Feb 2013
Location: Lithuania
Device: Kobo Glo
|
help needed (print version of an article)
I was wondering if someone could help me make a recipe for a news source http://www.lrytas.lt. The website divides longer articles into pages, but you can access the whole article via print version. However, I do not know how to produce the print version.
Here is the link of an multipage article accessed through the rss channel (http://www.lrytas.lt/kiti/rss.htm): http://www.lrytas.lt/-13819312591379...m_campaign=rss and here is the print version: http://www.lrytas.lt/print.asp?k=new...12591379871087 I would also like to cut away the header "printed from www.lrytas.lt", the code of which reads like this: <tr align="left"> <td width="140"><img src="/img/logo_small.gif" alt="Lietuvos Rytas Logo" style="border: 0pt none ;"></td><td ><strong>Šis puslapis atspausdintas iš http://www.lrytas.lt</strong> </td> </tr> Help would be appreciated. I tried to do it by trying to learn from other recipies, but failed. |
10-20-2013, 03:29 PM | #63 | |
Member
Posts: 12
Karma: 10
Join Date: Sep 2013
Device: kindle
|
Quote:
it is not optimized, but seems to work. Regards, Mauro Code:
#!/usr/bin/env python # -*- coding: utf-8 -*- __license__ = 'GPL v3' __author__ = "mauropiccolo" import re class AdvancedUserRecipe1382294260(BasicNewsRecipe): title = u'http://www.lrytas.lt/' oldest_article = 7 max_articles_per_feed = 100 auto_cleanup = True feeds = [(u'Energetika',u'http://www.lrytas.lt/rss/?tema=47')] def print_version(self, url): soup = self.index_to_soup(url) a = soup.find("a", attrs={"href":re.compile('^/print\.asp')}) if a: url = 'http://www.lrytas.lt'+a["href"] return url |
|
10-23-2013, 07:19 AM | #64 | |
Enthusiast
Posts: 30
Karma: 2848
Join Date: Feb 2013
Location: Lithuania
Device: Kobo Glo
|
Quote:
How do I bring back the titles? |
|
10-23-2013, 02:08 PM | #65 | |
Member
Posts: 12
Karma: 10
Join Date: Sep 2013
Device: kindle
|
Quote:
See attach, there is recipe and output. $ ebook-convert lrytas.recipe .mobi --output-profile kindle --test Do You see the problem in my output too ? |
|
10-29-2013, 04:28 PM | #66 |
Enthusiast
Posts: 30
Karma: 2848
Join Date: Feb 2013
Location: Lithuania
Device: Kobo Glo
|
I tried running it in command line and producing mobi file (to get fuller picture I omitted "--test"). And yes, I do get the same (or a very similar) problem. Only about every third article has a title. Here you can see the file I generated: https://db.tt/wavF6eN4
Here is a page that ended up without a title: http://www.lrytas.lt/print.asp?k=new...15991382340219 and this one got a title: http://www.lrytas.lt/print.asp?k=new...75391380839523 I have no clue why... |
10-30-2013, 12:08 PM | #67 |
Member
Posts: 12
Karma: 10
Join Date: Sep 2013
Device: kindle
|
me too,
try this Code:
#!/usr/bin/env python # -*- coding: utf-8 -*- __license__ = 'GPL v3' __author__ = "mauropiccolo" import re class AdvancedUserRecipe1382294260(BasicNewsRecipe): title = u'http://www.lrytas.lt/' oldest_article = 7 max_articles_per_feed = 100 auto_cleanup = True recursions = 5 feeds = [(u'Energetika',u'http://www.lrytas.lt/rss/?tema=47')] def is_link_wanted(self, url, tag): desc = self.tag_to_string(tag,False) if "psl. >>" in desc: self.log('Following multipage link: %s'%url) return True else: return False |
10-30-2013, 03:09 PM | #68 |
Enthusiast
Posts: 30
Karma: 2848
Join Date: Feb 2013
Location: Lithuania
Device: Kobo Glo
|
Now I see all the headings, but some headings are not followed by the right text
Here is the output file: https://db.tt/kIAAmZIH And here is a screenshot of a fragment of wrong text you will see in every second article instead of the right text: https://db.tt/OyBs9od7 Just search for text fragment "Lietuvoje gaminama elektros energija yra pernelyg brangi", it is repeated in 12 different articles. Sorryt to trouble you with this problem. It is easier to read the news online |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Help with calibre recipes | CaptainJSK | Calibre | 1 | 07-11-2010 01:12 AM |
Calibre Recipes and iPad/iBooks | jbambridge | Calibre | 8 | 05-16-2010 04:30 PM |
Classification of Recipes in Calibre | wayner | Calibre | 3 | 11-27-2009 09:48 AM |
Problem with my recipes (Calibre 0.6.2) | MikeBoud | Calibre | 18 | 08-05-2009 10:20 PM |