Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 08-24-2010, 04:40 PM   #2506
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by poluk View Post
I try based on the financial times recipes to adapt it to lloyd's List and I get this error
Could you tell me what to change in "log-in-box" with the webpage source concerning that part for login?
You didn't post your recipe or the login page you are trying to access, so it's a bit hard to advise you. However, from the error, it looks like your recipe probably attempts to find the login form by "name" and you have used the "id."

I don't do many login recipes, but it's been my experience that if the form is not identified by "name=" in the html, you need to use this:

Code:
br.select_form(nr=0) 
or 
br.select_form(nr=1)
to find the form by sequential number on the page instead of :

Code:
br.select_form(name='log-in-box')
Starson17 is offline  
Old 08-24-2010, 04:53 PM   #2507
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by kerrware View Post
It seemed to download the first two articles into seperate directories each with an index.html first and an image subdirectory. Displaying the index file in Firefox shows the article data is being downloaded ok.
When I run the recipe in Calibre I get the the index summary pages ok but all the artciles refered to just contain header (Next Link, etc.) and footer lines (downloaded by Calibre, etc.).
Have I missed a something out?
What happens when you click on the index.html in the first directory? Does Firefox allow you to click through to the articles and see the article content? (As dwanthny said, if you had used CODE tags, it would have been easier to run your recipe to check it out.)
Starson17 is offline  
Old 08-24-2010, 05:25 PM   #2508
cisaak
Member
cisaak began at the beginning.
 
Posts: 17
Karma: 10
Join Date: Aug 2010
Device: Kindle DX
Formatting Masthead

In my newspaper recipe, I have replaced the standard Kindle masthead with "MYTEXT" using the following command:

def get_masthead_title(self)
return 'MYTEXT'

Unfortunately, MYTEXT is truncated when viewed on my Kindle's screen. Apparently, I must use a CSS command to format the substitute masthead. I have used CSS to format other tags, e.g., the body of the article, but I do not know how to apply a CSS to the masthead. Can anyone help?
cisaak is offline  
Old 08-24-2010, 05:43 PM   #2509
poluk
Enthusiast
poluk is on a distinguished road
 
Posts: 34
Karma: 54
Join Date: Jul 2008
Device: not yet
Thanks for your help Starson17 !
Here is the recipes code:

Code:
#!/usr/bin/env  python

__license__   = 'GPL v3'
__copyright__ = '2008, Darko Miletic <darko.miletic at gmail.com>'
'''
Lloyds
'''

from calibre.web.feeds.news import BasicNewsRecipe

class Lloyd(BasicNewsRecipe):
    title                 = u'Lloyd'
   __author__            = 'Darko Miletic and Sujata Raman'
    description           = 'Shipping News'
    oldest_article        = 2
    language = 'en'

    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    needs_subscription    = True
    simultaneous_downloads= 1
    delay                 = 1

    LOGIN = 'http://www.lloydslist.com/ll/login.htm'

    def get_browser(self):
        br = BasicNewsRecipe.get_browser()
        if self.username is not None and self.password is not None:
            br.open(self.LOGIN)
            br.select_form(nr=0) 
            br['username'] = self.username
            br['password'] = self.password
            br.submit()
        return br

   

    feeds = [(u'Containers', u'http://www.lloydslist.com/ll/sector/containers/?service=rss')
, (u'Dry Cargo', u'http://www.lloydslist.com/ll/sector/dry-cargo/?service=rss')
, (u'Finance', u'http://www.lloydslist.com/ll/sector/finance/?service=rss')
, (u'Insurance', u'http://www.lloydslist.com/ll/sector/insurance/?service=rss')
, (u'Port and Logistic', u'http://www.lloydslist.com/ll/sector/ports-and-logistics/?service=rss')
, (u'Regulation', u'http://www.lloydslist.com/ll/sector/regulation/?service=rss')
, (u'Ship Operation', u'http://www.lloydslist.com/ll/sector/ship-operations/?service=rss')
]

    def preprocess_html(self, soup):
        content_type = soup.find('meta', {'http-equiv':'Content-Type'})
        if content_type:
            content_type['content'] = 'text/html; charset=utf-8'
        return soup
As you said I changed the way of looking for the form and now I get a new error (so we progress thanks to you !!!)

Quote:
ClientForm.ControlNotFoundError: no control matching name 'username'
poluk is offline  
Old 08-24-2010, 05:52 PM   #2510
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by poluk View Post
As you said I changed the way of looking for the form and now I get a new error (so we progress thanks to you !!!)

Code:
ClientForm.ControlNotFoundError: no control matching name 'username'
Your username control is not named 'username'. Find the form and determine the name of the control that is submitted as the username.

IOW, this is wrong:
Code:
br['username'] = self.username
it should be:
Code:
br['something_else_not_username'] = self.username
You probably have the name of the password control wrong, too.

Last edited by Starson17; 08-24-2010 at 06:05 PM.
Starson17 is offline  
Old 08-24-2010, 06:01 PM   #2511
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by cisaak View Post
In my newspaper recipe, I have replaced the standard Kindle masthead with "MYTEXT" ...

Unfortunately, MYTEXT is truncated when viewed on my Kindle's screen. Apparently, I must use a CSS command to format the substitute masthead. I have used CSS to format other tags, e.g., the body of the article, but I do not know how to apply a CSS to the masthead. Can anyone help?
Not without a Kindle (anyone want to send me one? ) as I'm not sure where in the recipe the Kindle is picking up the masthead.

However, the masthead is only used in a few places in an EPUB. Open the EPUB, find the masthead and change the css file to modify its properties, then convert the EPUB to whatever format Kindle uses and see if that fixes it. If so, modify the extra_css in your recipe to make the same change.

If you have a problem understanding this, take it a step at a time, and let me know which step you have trouble with.
Starson17 is offline  
Old 08-24-2010, 08:17 PM   #2512
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
I know in the calibre preferences under conversion and mobi output there is a dropdown that allows you to pick the font you wish to use. It would be good to have a user customized size as well in there.
TonytheBookworm is offline  
Old 08-24-2010, 11:25 PM   #2513
naisren
Enthusiast
naisren began at the beginning.
 
Posts: 41
Karma: 12
Join Date: Jul 2009
Device: ppc
main menu, section menu, css for calibre mobipocket output

Calibre give us many choices to customize news from any possible site, I use Calibre to get news instead of using Mobipocket Reader.
I met several issues during using calibre, could you kindly help solve them?
1. Menu in navigation part of each article
When click the link of menu, pop up an error in PC or PDA
2. How to avoid or reduce "Property: Invalid value for "CSS Level 2.1" property: 225 [85:1: width]" using recipe to output?
Attached Thumbnails
Click image for larger version

Name:	SectionMenu.gif
Views:	336
Size:	20.0 KB
ID:	57118   Click image for larger version

Name:	mainMenu.gif
Views:	323
Size:	7.6 KB
ID:	57119   Click image for larger version

Name:	CSS.gif
Views:	320
Size:	2.2 KB
ID:	57120  
naisren is offline  
Old 08-24-2010, 11:43 PM   #2514
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,890
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
Updated National Review Online Recipe

This recipe wasn't working due to a redirected feed. I corrected the recipe. Removed one old feed and added two new feeds.
Attached Files
File Type: zip nationalreviewonline.zip (1.8 KB, 248 views)
DoctorOhh is offline  
Old 08-24-2010, 11:46 PM   #2515
naisren
Enthusiast
naisren began at the beginning.
 
Posts: 41
Karma: 12
Join Date: Jul 2009
Device: ppc
Code:
<li><a href="/Business_Etiquette_1.html" />Business Etiquette</a></li>
as you see, there is "/" in the code
Code:
<a href="/Business_Etiquette_1.html" />
, and another "/" in
Code:
</a>
In reality, the browser can deal with it as without the first "/" , viz
Code:
<li><a href="/Business_Etiquette_1.html">Business Etiquette</a></li>
It seems Calibre can not deal with it as the browser, firefox or IE, it will skip after meeting the first "/".

link "a" tag is one case, division div tag has also such problems, such as
Code:
<div id="text"/>......</div>
How to deal with such codes using recipe, I can't get any links using:
soup.find(id='text').findAll('a') to handle the mentioned code.
naisren is offline  
Old 08-25-2010, 04:09 AM   #2516
kerrware
Junior Member
kerrware began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Jun 2010
Device: none
My Recipe fails to place Articles data in epub

Thanks for feedback - new to forum so still learning.
Hopefully I've added the recipe code correctly this time.

Quote:
Originally Posted by Starson17
What happens when you click on the index.html in the first directory? Does Firefox allow you to click through to the articles and see the article content? (As dwanthny said, if you had used CODE tags, it would have been easier to run your recipe to check it out.)
Yes, Firefox does allow me to click through to the articles and see the article content. I've since ceated a second recipe for another site (which does not require a login so only used the Basic Add New Recipe page in Calibre) and that worked first time (apart from possibly needing a bit of pruning).

Code:
from calibre.web.feeds.news import BasicNewsRecipe
import re

class AdvancedUserRecipe1282596648(BasicNewsRecipe):
    title          = u'Ilkeston Advertsier'
    oldest_article = 7
    max_articles_per_feed = 100
    needs_subscription = True

    def get_browser(self):
        br = BasicNewsRecipe.get_browser()
        if self.username is not None and self.password is not None:
            br.open('http://auth.jpress.co.uk/login.aspx?ReturnURL=http%3a%2f%2fwww.ilkestonadvertiser.co.uk%2ftemplate%2fRegister.aspx%3fReturnURL%3dhttp%3a%2f%2fwww.ilkestonadvertiser.co.uk%2ffrontpage.aspx&SiteRef=IAS')
            br.select_form(name='Form1')
            br['ctl00$txtEmailAddress']  = self.username
            br['ctl00$txtPassword'] = self.password
            br.submit()
        return br

    feeds          = [(u'Ilkeston Today - News', u'http://www.ilkestonadvertiser.co.uk/getfeed.aspx?sectionid=795&format=rss')]
kerrware is offline  
Old 08-25-2010, 08:49 AM   #2517
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by kerrware View Post
Yes, Firefox does allow me to click through to the articles and see the article content.
If you are seeing the article content stored locally (when running ebook-convert), and you can click through from the initial index.html to the index.html files in the folders to see that content, then I see no reason why you should have problems converting the html structure, with article content, to an EPUB. Where is the problem occurring? I'd check it for you, but have no username/password for the site.
Starson17 is offline  
Old 08-25-2010, 09:06 AM   #2518
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by naisren View Post
as you see, there is "/" in the code
Code:
<a href="/Business_Etiquette_1.html" />
, and another "/" in
Code:
</a>
It seems Calibre can not deal with it as the browser, firefox or IE, it will skip after meeting the first "/".
link "a" tag is one case, division div tag has also such problems, such as
Code:
<div id="text"/>......</div>
How to deal with such codes using recipe, I can't get any links using:
soup.find(id='text').findAll('a') to handle the mentioned code.
Sorry, but I can't quite follow your question. Are you saying you can't reference tags by "id" or "href," etc.?

I've never run into the trailing slashes inside opening tags like you've posted, so I have no first hand experience. I would still expect normal referencing to work, but if it doesn't, you have various options. You can try search and replace to remove them with preprocess_regexps. You could remove just the slashes, or modify the whole tag with S&R, or use pre or postprocess_html and Beautiful Soup to identify the tag and extract or modify it. It's possible the slashes are confusing Beautiful Soup, so printing the results (see code in my post above on how to do this) might help you figure out what the recipe is seeing and where it's being confused.

More info would be needed to advise further.
Starson17 is offline  
Old 08-25-2010, 11:47 AM   #2519
naisren
Enthusiast
naisren began at the beginning.
 
Posts: 41
Karma: 12
Join Date: Jul 2009
Device: ppc
Quote:
Originally Posted by Starson17 View Post
Sorry, but I can't quite follow your question. Are you saying you can't reference tags by "id" or "href," etc.?

I've never run into the trailing slashes inside opening tags like you've posted, so I have no first hand experience. I would still expect normal referencing to work, but if it doesn't, you have various options. You can try search and replace to remove them with preprocess_regexps. You could remove just the slashes, or modify the whole tag with S&R, or use pre or postprocess_html and Beautiful Soup to identify the tag and extract or modify it. It's possible the slashes are confusing Beautiful Soup, so printing the results (see code in my post above on how to do this) might help you figure out what the recipe is seeing and where it's being confused.

More info would be needed to advise further.
Thanks for your help and sorry for my confusing expression.

The following is part of the source code, frow which I try to get feed.

Code:
<div id="rightContainer" />
<span id="list" />
<ul><li><a href="/Health_Report_1.html" target="_blank">[ <font color=#E43026>Health Report</font> ] </a> <a href="/lrc/201008/se-health-cancer-developing-world-25aug10.lrc" target=_blank><img src=/images/lrc.gif border=0></a> <a href="/VOA_Special_English/Experts-Urge-More-Efforts-to-Fight-Cancer-in-Poor-Countries-38652_1.html" target="_blank"><img src=/images/yi.gif border=0></a> <a href="/VOA_Special_English/Experts-Urge-More-Efforts-to-Fight-Cancer-in-Poor-Countries-38652.html" target="_blank">Experts Urge More Efforts to Fight Cancer in Poor Countries  (2010-8-25)</a></li></ul>
</span>
</div>
My recipe is
Code:
import re
from calibre.web.feeds.news import BasicNewsRecipe

class VOA(BasicNewsRecipe):

    title      = 'VOA News'
    __author__ = 'voa'
    description = 'VOA through 51'
    language = 'en'
    remove_javascript = True

    remove_tags_before = dict(id=['rightContainer'])
    remove_tags_after  = dict(id=['listads'])
    remove_tags        = [
                          dict(id=['contentAds']), dict(id=['playbar']), dict(id=['menubar']), 
                         ]    
    no_stylesheets = True
    extra_css = '''
                '''


    def parse_index(self):
        soup = self.index_to_soup('http://www.51voa.com/')
        feeds = []
        section = []
        title = None

       #for x in soup.find(id='list').findAll('a'):
        for x in soup.find(id='rightContainer').findAll('a'):
                if '/VOA_Special_English/' in x['href'] or '/VOA_Standard_English/' in x['href'] or '/VOA_Standard_English/' in x['href']:
                    article = {
                            'url' : 'http://www.51voa.com/' + x['href'],
                            'title' : self.tag_to_string(x),
                            'date': '',
                            'description': '',
                        }
                    section.append(article)

        feeds.append(('Newest', section))

        return feeds
I use the recipe here to fetch the feed from the source code, but get no links. could you give an example for how to use "regexps" to deal with the weird code here, and in case
Code:
<br/>
tag comes in. Thanks a lot for your teaching.
naisren is offline  
Old 08-25-2010, 01:26 PM   #2520
miangue
Junior Member
miangue began at the beginning.
 
miangue's Avatar
 
Posts: 4
Karma: 10
Join Date: Aug 2010
Location: Colombia
Device: Sony PRS-300
Quote:
Originally Posted by Starson17 View Post
extra_css is used to control formatting. Search this thread for some samples and read here.
Starson Thanks, I put the line "extra_css" and it came out like this:

Code:
class AdvancedUserRecipe1282450582(BasicNewsRecipe):
    title          = u'LaRepublica.com'
    oldest_article = 7
    max_articles_per_feed = 100
    use_embedded_content   = False
    no_stylesheets = True
    extra_css = '''
                    .titulo {font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
                    .periodista {font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;}
                    .fecha_publicacion {font-family:Helvetica,Arial,sans-serif;font-size:small;}
	'''
    keep_only_tags    = [
                       dict(name='div', attrs={'id':['noticia']})
                             ]
    remove_tags = [
                       dict(name='div', attrs={'id':['iconos', 'relacionados', 'documentos_adjuntos']}),
                       dict(name='span', attrs={'id':['comentarios']})
                        ]

    feeds          = [(u'Noticias', u'http://www.larepublica.com.co/rss/larepublica.xml')]
But todoas forms does not work. What I can be doing wrong?

Can anyone help me please? ...

I should clarify that the labels want to change the format are:

Code:
<div id="titulo">
<div id="periodista">
<div id="fecha_publicacion">
THANK YOU!!!
miangue is offline  
Closed Thread


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column read ? pchrist7 Calibre 2 10-04-2010 02:52 AM
Archive for custom screensavers sleeplessdave Amazon Kindle 1 07-07-2010 12:33 PM
How to back up preferences and custom recipes? greenapple Calibre 3 03-29-2010 05:08 AM
Donations for Custom Recipes ddavtian Calibre 5 01-23-2010 04:54 PM
Help understanding custom recipes andersent Calibre 0 12-17-2009 02:37 PM


All times are GMT -4. The time now is 11:54 PM.


MobileRead.com is a privately owned, operated and funded community.