|
|
#1 |
|
Zealot
![]() Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
|
cant seem to keep my articles!
i whiped up this recipe of globes.co.il.
i have everything down, but the articles are wider than the pdf output. i have no idea what is going on. any ideas? the code: Spoiler:
example of print page here. thank you |
|
|
|
|
|
#2 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,598
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
you need to remove non reflowable marjup like tables, pre tags, or other tags with an explicitly specified width.
|
|
|
|
| Advert | |
|
|
|
|
#3 |
|
Zealot
![]() Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
|
got it
thanks kovid
ready to be built in: Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import re
class AdvancedUserRecipe1283848012(BasicNewsRecipe):
description = 'This is a recipe of Globs.co.il.'
cover_url = 'http://www.the7eye.org.il/SiteCollectionImages/BAKTANA/arye_avnery_010709_377.jpg'
title = u'Globes'
language = 'he'
__author__ = 'marbs'
extra_css='img {max-width:100%;} body{direction: rtl;max-width:100%;}title{direction: rtl; } article_description{direction: rtl; }, a.article{direction: rtl;max-width:100%;} calibre_feed_description{direction: rtl; }'
simultaneous_downloads = 5
remove_javascript = True
timefmt = '[%a, %d %b, %Y]'
oldest_article = 1
max_articles_per_feed = 100
remove_attributes = ['width','style']
feeds = [(u'שוק ההון', u'http://www.globes.co.il/webservice/rss/rssfeeder.asmx/FeederNode?iID=585'),
(u'נדל"ן', u'http://www.globes.co.il/webservice/rss/rssfeeder.asmx/FeederNode?iID=607'),
(u'וול סטריט ושווקי העולם', u'http://www.globes.co.il/webservice/rss/rssfeeder.asmx/FeederNode?iID=1225'),
(u'ניתוח טכני', u'http://www.globes.co.il/webservice/rss/rssfeeder.asmx/FeederNode?iID=1294'),
(u'היי טק', u'http://www.globes.co.il/webservice/rss/rssfeeder.asmx/FeederNode?iID=594'),
(u'נתח שוק וצרכנות', u'http://www.globes.co.il/webservice/rss/rssfeeder.asmx/FeederNode?iID=821'),
(u'דין וחשבון', u'http://www.globes.co.il/webservice/rss/rssfeeder.asmx/FeederNode?iID=829'),
(u'רכב', u'http://www.globes.co.il/webservice/rss/rssfeeder.asmx/FeederNode?iID=3220'),
(u'דעות', u'http://www.globes.co.il/webservice/rss/rssfeeder.asmx/FeederNode?iID=845'),
(u'קניון המניות - טור שבועי', u'http://www.globes.co.il/webservice/rss/rssfeeder.asmx/FeederNode?iID=3175'),
(u'סביבה', u'http://www.globes.co.il/webservice/rss/rssfeeder.asmx/FeederNode?iID=3221')]
def print_version(self, url):
split1 = url.split("=")
print_url = 'http://www.globes.co.il/serve/globes/printwindow.asp?did=' + split1[1]
return print_url
def preprocess_html(self, soup):
soup.find('tr',attrs={'bgcolor':'black'}).findPrevious('tr').extract()
soup.find('tr',attrs={'bgcolor':'black'}).extract()
print 'soup is',soup,'end of soup'
return soup
def fixChars(self,string):
# Replace lsquo (\x91)
fixed = re.sub("■","■",string)
return fixed
Last edited by kovidgoyal; 11-18-2010 at 01:08 PM. |
|
|
|
![]() |
| Thread Tools | Search this Thread |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Where to submit articles ? | spoudaios | Writers' Corner | 6 | 05-26-2010 08:43 PM |
| PRS-600 Articles like this | scottjl | Sony Reader | 31 | 12-30-2009 05:41 AM |
| Wikipedia articles | Sordelka | Calibre | 1 | 04-20-2009 09:02 AM |
| Submit my articles | Shannon | Lounge | 3 | 01-08-2009 12:56 PM |
| A crop of articles from the UK | Argel | News | 3 | 09-08-2008 10:13 AM |