![]() |
#2506 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
I don't do many login recipes, but it's been my experience that if the form is not identified by "name=" in the html, you need to use this: Code:
br.select_form(nr=0) or br.select_form(nr=1) Code:
br.select_form(name='log-in-box') |
|
![]() |
![]() |
#2507 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
|
|
![]() |
![]() |
#2508 |
Member
![]() Posts: 17
Karma: 10
Join Date: Aug 2010
Device: Kindle DX
|
Formatting Masthead
In my newspaper recipe, I have replaced the standard Kindle masthead with "MYTEXT" using the following command:
def get_masthead_title(self) return 'MYTEXT' Unfortunately, MYTEXT is truncated when viewed on my Kindle's screen. Apparently, I must use a CSS command to format the substitute masthead. I have used CSS to format other tags, e.g., the body of the article, but I do not know how to apply a CSS to the masthead. Can anyone help? |
![]() |
![]() |
#2509 | |
Enthusiast
![]() Posts: 34
Karma: 54
Join Date: Jul 2008
Device: not yet
|
Thanks for your help Starson17 !
Here is the recipes code: Code:
#!/usr/bin/env python __license__ = 'GPL v3' __copyright__ = '2008, Darko Miletic <darko.miletic at gmail.com>' ''' Lloyds ''' from calibre.web.feeds.news import BasicNewsRecipe class Lloyd(BasicNewsRecipe): title = u'Lloyd' __author__ = 'Darko Miletic and Sujata Raman' description = 'Shipping News' oldest_article = 2 language = 'en' max_articles_per_feed = 100 no_stylesheets = True use_embedded_content = False needs_subscription = True simultaneous_downloads= 1 delay = 1 LOGIN = 'http://www.lloydslist.com/ll/login.htm' def get_browser(self): br = BasicNewsRecipe.get_browser() if self.username is not None and self.password is not None: br.open(self.LOGIN) br.select_form(nr=0) br['username'] = self.username br['password'] = self.password br.submit() return br feeds = [(u'Containers', u'http://www.lloydslist.com/ll/sector/containers/?service=rss') , (u'Dry Cargo', u'http://www.lloydslist.com/ll/sector/dry-cargo/?service=rss') , (u'Finance', u'http://www.lloydslist.com/ll/sector/finance/?service=rss') , (u'Insurance', u'http://www.lloydslist.com/ll/sector/insurance/?service=rss') , (u'Port and Logistic', u'http://www.lloydslist.com/ll/sector/ports-and-logistics/?service=rss') , (u'Regulation', u'http://www.lloydslist.com/ll/sector/regulation/?service=rss') , (u'Ship Operation', u'http://www.lloydslist.com/ll/sector/ship-operations/?service=rss') ] def preprocess_html(self, soup): content_type = soup.find('meta', {'http-equiv':'Content-Type'}) if content_type: content_type['content'] = 'text/html; charset=utf-8' return soup Quote:
|
|
![]() |
![]() |
#2510 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
IOW, this is wrong: Code:
br['username'] = self.username Code:
br['something_else_not_username'] = self.username Last edited by Starson17; 08-24-2010 at 06:05 PM. |
|
![]() |
![]() |
#2511 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
![]() However, the masthead is only used in a few places in an EPUB. Open the EPUB, find the masthead and change the css file to modify its properties, then convert the EPUB to whatever format Kindle uses and see if that fixes it. If so, modify the extra_css in your recipe to make the same change. If you have a problem understanding this, take it a step at a time, and let me know which step you have trouble with. |
|
![]() |
![]() |
#2512 |
Addict
![]() Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
I know in the calibre preferences under conversion and mobi output there is a dropdown that allows you to pick the font you wish to use. It would be good to have a user customized size as well in there.
|
![]() |
![]() |
#2513 |
Enthusiast
![]() Posts: 41
Karma: 12
Join Date: Jul 2009
Device: ppc
|
main menu, section menu, css for calibre mobipocket output
Calibre give us many choices to customize news from any possible site, I use Calibre to get news instead of using Mobipocket Reader.
I met several issues during using calibre, could you kindly help solve them? 1. Menu in navigation part of each article When click the link of menu, pop up an error in PC or PDA ![]() 2. How to avoid or reduce "Property: Invalid value for "CSS Level 2.1" property: 225 [85:1: width]" using recipe to output? ![]() |
![]() |
![]() |
#2514 |
US Navy, Retired
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,890
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
|
Updated National Review Online Recipe
This recipe wasn't working due to a redirected feed. I corrected the recipe. Removed one old feed and added two new feeds.
|
![]() |
![]() |
#2515 |
Enthusiast
![]() Posts: 41
Karma: 12
Join Date: Jul 2009
Device: ppc
|
Code:
<li><a href="/Business_Etiquette_1.html" />Business Etiquette</a></li> Code:
<a href="/Business_Etiquette_1.html" /> Code:
</a> Code:
<li><a href="/Business_Etiquette_1.html">Business Etiquette</a></li> link "a" tag is one case, division div tag has also such problems, such as Code:
<div id="text"/>......</div> soup.find(id='text').findAll('a') to handle the mentioned code. |
![]() |
![]() |
#2516 | |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Jun 2010
Device: none
|
My Recipe fails to place Articles data in epub
Thanks for feedback - new to forum so still learning.
Hopefully I've added the recipe code correctly this time. Quote:
Code:
from calibre.web.feeds.news import BasicNewsRecipe import re class AdvancedUserRecipe1282596648(BasicNewsRecipe): title = u'Ilkeston Advertsier' oldest_article = 7 max_articles_per_feed = 100 needs_subscription = True def get_browser(self): br = BasicNewsRecipe.get_browser() if self.username is not None and self.password is not None: br.open('http://auth.jpress.co.uk/login.aspx?ReturnURL=http%3a%2f%2fwww.ilkestonadvertiser.co.uk%2ftemplate%2fRegister.aspx%3fReturnURL%3dhttp%3a%2f%2fwww.ilkestonadvertiser.co.uk%2ffrontpage.aspx&SiteRef=IAS') br.select_form(name='Form1') br['ctl00$txtEmailAddress'] = self.username br['ctl00$txtPassword'] = self.password br.submit() return br feeds = [(u'Ilkeston Today - News', u'http://www.ilkestonadvertiser.co.uk/getfeed.aspx?sectionid=795&format=rss')] |
|
![]() |
![]() |
#2517 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
If you are seeing the article content stored locally (when running ebook-convert), and you can click through from the initial index.html to the index.html files in the folders to see that content, then I see no reason why you should have problems converting the html structure, with article content, to an EPUB. Where is the problem occurring? I'd check it for you, but have no username/password for the site.
|
![]() |
![]() |
#2518 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
I've never run into the trailing slashes inside opening tags like you've posted, so I have no first hand experience. I would still expect normal referencing to work, but if it doesn't, you have various options. You can try search and replace to remove them with preprocess_regexps. You could remove just the slashes, or modify the whole tag with S&R, or use pre or postprocess_html and Beautiful Soup to identify the tag and extract or modify it. It's possible the slashes are confusing Beautiful Soup, so printing the results (see code in my post above on how to do this) might help you figure out what the recipe is seeing and where it's being confused. More info would be needed to advise further. |
|
![]() |
![]() |
#2519 | |
Enthusiast
![]() Posts: 41
Karma: 12
Join Date: Jul 2009
Device: ppc
|
Quote:
The following is part of the source code, frow which I try to get feed. Code:
<div id="rightContainer" /> <span id="list" /> <ul><li><a href="/Health_Report_1.html" target="_blank">[ <font color=#E43026>Health Report</font> ] </a> <a href="/lrc/201008/se-health-cancer-developing-world-25aug10.lrc" target=_blank><img src=/images/lrc.gif border=0></a> <a href="/VOA_Special_English/Experts-Urge-More-Efforts-to-Fight-Cancer-in-Poor-Countries-38652_1.html" target="_blank"><img src=/images/yi.gif border=0></a> <a href="/VOA_Special_English/Experts-Urge-More-Efforts-to-Fight-Cancer-in-Poor-Countries-38652.html" target="_blank">Experts Urge More Efforts to Fight Cancer in Poor Countries (2010-8-25)</a></li></ul> </span> </div> Code:
import re from calibre.web.feeds.news import BasicNewsRecipe class VOA(BasicNewsRecipe): title = 'VOA News' __author__ = 'voa' description = 'VOA through 51' language = 'en' remove_javascript = True remove_tags_before = dict(id=['rightContainer']) remove_tags_after = dict(id=['listads']) remove_tags = [ dict(id=['contentAds']), dict(id=['playbar']), dict(id=['menubar']), ] no_stylesheets = True extra_css = ''' ''' def parse_index(self): soup = self.index_to_soup('http://www.51voa.com/') feeds = [] section = [] title = None #for x in soup.find(id='list').findAll('a'): for x in soup.find(id='rightContainer').findAll('a'): if '/VOA_Special_English/' in x['href'] or '/VOA_Standard_English/' in x['href'] or '/VOA_Standard_English/' in x['href']: article = { 'url' : 'http://www.51voa.com/' + x['href'], 'title' : self.tag_to_string(x), 'date': '', 'description': '', } section.append(article) feeds.append(('Newest', section)) return feeds Code:
<br/> |
|
![]() |
![]() |
#2520 | |
Junior Member
![]() Posts: 4
Karma: 10
Join Date: Aug 2010
Location: Colombia
Device: Sony PRS-300
|
Quote:
Code:
class AdvancedUserRecipe1282450582(BasicNewsRecipe): title = u'LaRepublica.com' oldest_article = 7 max_articles_per_feed = 100 use_embedded_content = False no_stylesheets = True extra_css = ''' .titulo {font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;} .periodista {font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;} .fecha_publicacion {font-family:Helvetica,Arial,sans-serif;font-size:small;} ''' keep_only_tags = [ dict(name='div', attrs={'id':['noticia']}) ] remove_tags = [ dict(name='div', attrs={'id':['iconos', 'relacionados', 'documentos_adjuntos']}), dict(name='span', attrs={'id':['comentarios']}) ] feeds = [(u'Noticias', u'http://www.larepublica.com.co/rss/larepublica.xml')] ![]() Can anyone help me please? ... I should clarify that the labels want to change the format are: Code:
<div id="titulo"> <div id="periodista"> <div id="fecha_publicacion"> |
|
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Custom column read ? | pchrist7 | Calibre | 2 | 10-04-2010 02:52 AM |
Archive for custom screensavers | sleeplessdave | Amazon Kindle | 1 | 07-07-2010 12:33 PM |
How to back up preferences and custom recipes? | greenapple | Calibre | 3 | 03-29-2010 05:08 AM |
Donations for Custom Recipes | ddavtian | Calibre | 5 | 01-23-2010 04:54 PM |
Help understanding custom recipes | andersent | Calibre | 0 | 12-17-2009 02:37 PM |