![]() |
#1 |
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 49
Karma: 475062
Join Date: Aug 2012
Device: nook simple touch
|
![]()
Hi,all!
I am a new user of recipes,it's a awesome function to creat an epub book for my nook2 to read news. Today i modified BBC news in Chinese recipes,however my epub book have multiple class attribute like this Code:
<div class="module bodytext"> what I want. Code:
<div class="module"> |
![]() |
![]() |
![]() |
#2 |
onlinenewsreader.net
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 327
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
Code:
for div in soup.findAll('div','module bodytext'): div['class']='module' |
![]() |
![]() |
Advert | |
|
![]() |
#3 | |
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 49
Karma: 475062
Join Date: Aug 2012
Device: nook simple touch
|
Quote:
![]() ![]() Code:
from calibre.web.feeds.news import BasicNewsRecipe from calibre.ebooks.BeautifulSoup import BeautifulSoup, NavigableString, CData, Tag class AdvancedUserRecipe1277443634(BasicNewsRecipe): title = u'BBC中文网' oldest_article = 1.5 max_articles_per_feed = 1000 feeds = [ (u'\u4e3b\u9875', u'http://www.bbc.co.uk/zhongwen/simp/index.xml'), (u'\u56fd\u9645\u65b0\u95fb', u'http://www.bbc.co.uk/zhongwen/simp/world/index.xml'), (u'\u4e24\u5cb8\u4e09\u5730', u'http://www.bbc.co.uk/zhongwen/simp/china/index.xml'), (u'\u91d1\u878d\u8d22\u7ecf', u'http://www.bbc.co.uk/zhongwen/simp/business/index.xml'), (u'\u7f51\u4e0a\u4e92\u52a8', u'http://www.bbc.co.uk/zhongwen/simp/interactive/index.xml'), (u'\u97f3\u89c6\u56fe\u7247', u'http://www.bbc.co.uk/zhongwen/simp/multimedia/index.xml'), (u'\u5206\u6790\u8bc4\u8bba', u'http://www.bbc.co.uk/zhongwen/simp/indepth/index.xml'), (u'\u82f1\u8bed\u6559\u5b66', u'http://www.bbc.co.uk/zhongwen/simp/elt/index.xml') ] template_css = u''' .article_date {color: gray;font-family:"仿宋","fs",serif;} .article_description {font-family:"微软雅黑","黑体","ht",sans-serif; text-indent: 0pt;font-size: 0.8em;} a.article {font-weight: bold; text-align:left;font-family:"宋体","zw",serif;} a.feed {font-weight: bold;} .calibre_navbar {font-family:"微软雅黑","黑体","ht",sans-serif;}''' extra_css = u''' @font-face { font-family:"zw"; src:local("宋体"),A local("DK-SONGTI"), url(../fonts/zw.ttf), url(res:///opt/sony/ebook/FONT/zw.ttf), url(res:///Data/FONT/zw.ttf), url(res:///opt/sony/ebook/FONT/tt0011m_.ttf), url(res:///ebook/fonts/../../mnt/sdcard/fonts/zw.ttf), url(res:///ebook/fonts/../../mnt/extsd/fonts/zw.ttf), url(res:///ebook/fonts/zw.ttf), url(res:///ebook/fonts/DroidSansFallback.ttf), url(res:///fonts/ttf/zw.ttf), url(res:///../../media/mmcblk0p1/fonts/zw.ttf), url(res:///DK_System/system/font/zw.ttf), url(res:///abook/fonts/zw.ttf), url(res:///system/fonts/zw.ttf), url(res:///system/media/sdcard/fonts/zw.ttf), url(res:///media/fonts/zw.ttf), url(res:///sdcard/fonts/zw.ttf), url(res:///system/fonts/DroidSansFallback.ttf), url(res:///mnt/MOVIFAT/font/zw.ttf), url(res:///media/flash/fonts/zw.ttf), url(res:///media/sd/fonts/zw.ttf), url(res:///opt/onyx/arm/lib/fonts/AdobeHeitiStd-Regular.otf), url(res:///../../fonts/zw.ttf), url(res:///../fonts/zw.ttf);} @font-face { font-family:"fs"; src:local("仿宋"), local("DK-FANGSONG"), url(../fonts/fs.ttf), url(res:///opt/sony/ebook/FONT/fs.ttf), url(res:///Data/FONT/fs.ttf), url(res:///opt/sony/ebook/FONT/tt0011m_.ttf), url(res:///ebook/fonts/../../mnt/sdcard/fonts/fs.ttf), url(res:///ebook/fonts/../../mnt/extsd/fonts/fs.ttf), url(res:///ebook/fonts/fs.ttf), url(res:///ebook/fonts/DroidSansFallback.ttf), url(res:///fonts/ttf/fs.ttf), url(res:///../../media/mmcblk0p1/fonts/fs.ttf), url(res:///DK_System/system/font/fs.ttf), url(res:///abook/fonts/fs.ttf), url(res:///system/fonts/fs.ttf), url(res:///system/media/sdcard/fonts/fs.ttf), url(res:///media/fonts/fs.ttf), url(res:///sdcard/fonts/fs.ttf), url(res:///system/fonts/DroidSansFallback.ttf), url(res:///mnt/MOVIFAT/font/fs.ttf), url(res:///media/flash/fonts/fs.ttf), url(res:///media/sd/fonts/fs.ttf), url(res:///opt/onyx/arm/lib/fonts/AdobeHeitiStd-Regular.otf), url(res:///../../fonts/fs.ttf), url(res:///../fonts/fs.ttf);} @font-face { font-family:"kt"; src:local("楷体"), local("DK-KAITI"), url(../fonts/kt.ttf), url(res:///opt/sony/ebook/FONT/kt.ttf), url(res:///Data/FONT/kt.ttf), url(res:///opt/sony/ebook/FONT/tt0011m_.ttf), url(res:///ebook/fonts/../../mnt/sdcard/fonts/kt.ttf), url(res:///ebook/fonts/../../mnt/extsd/fonts/kt.ttf), url(res:///ebook/fonts/kt.ttf), url(res:///ebook/fonts/DroidSansFallback.ttf), url(res:///fonts/ttf/kt.ttf), url(res:///../../media/mmcblk0p1/fonts/kt.ttf), url(res:///DK_System/system/font/kt.ttf), url(res:///abook/fonts/kt.ttf), url(res:///system/fonts/kt.ttf), url(res:///system/media/sdcard/fonts/kt.ttf), url(res:///media/fonts/kt.ttf), url(res:///sdcard/fonts/kt.ttf), url(res:///system/fonts/DroidSansFallback.ttf), url(res:///mnt/MOVIFAT/font/kt.ttf), url(res:///media/flash/fonts/kt.ttf), url(res:///media/sd/fonts/kt.ttf), url(res:///opt/onyx/arm/lib/fonts/AdobeHeitiStd-Regular.otf), url(res:///../../fonts/kt.ttf), url(res:///../fonts/kt.ttf);} @font-face { font-family:"ht"; src:local("微软雅黑"), local("DK-HEITI"), url(../fonts/ht.ttf), url(res:///opt/sony/ebook/FONT/ht.ttf), url(res:///Data/FONT/ht.ttf), url(res:///opt/sony/ebook/FONT/tt0011m_.ttf), url(res:///ebook/fonts/../../mnt/sdcard/fonts/ht.ttf), url(res:///ebook/fonts/../../mnt/extsd/fonts/ht.ttf), url(res:///ebook/fonts/ht.ttf), url(res:///ebook/fonts/DroidSansFallback.ttf), url(res:///fonts/ttf/ht.ttf), url(res:///../../media/mmcblk0p1/fonts/ht.ttf), url(res:///DK_System/system/font/ht.ttf), url(res:///abook/fonts/ht.ttf), url(res:///system/fonts/ht.ttf), url(res:///system/media/sdcard/fonts/ht.ttf), url(res:///media/fonts/ht.ttf), url(res:///sdcard/fonts/ht.ttf), url(res:///system/fonts/DroidSansFallback.ttf), url(res:///mnt/MOVIFAT/font/ht.ttf), url(res:///media/flash/fonts/ht.ttf), url(res:///media/sd/fonts/ht.ttf), url(res:///opt/onyx/arm/lib/fonts/AdobeHeitiStd-Regular.otf), url(res:///../../fonts/ht.ttf), url(res:///../fonts/ht.ttf);} body { padding: 0%; margin-top: 0%; margin-bottom: 0%; margin-left: 1%; margin-right: 1%; line-height:130%; font-family:"宋体","zw",serif; text-align: justify; text-indent: 0em; color: black;} p { margin-top: 5pt; margin-bottom: 5pt; line-height: 130%; font-family:"宋体","zw",serif; text-align: justify; text-indent: 2em;} div { margin:0px; padding:0px; line-height:130%; text-align: justify; font-family:"宋体","zw",serif;} h1 { margin-top: 1em; margin-bottom: 0.5em; font-family:"微软雅黑","黑体","ht",sans-serif; font-size: xx-large; line-height: 130%; text-align: center; text-indent: 0em;} h2 { margin-top: 1em; margin-bottom: 0.5em; font-family:"微软雅黑","黑体","ht",sans-serif; font-size: x-large; line-height: 130%; text-align: center; text-indent: 0em;} h3 { margin-top: 1em; margin-bottom: 0.5em; font-family:"微软雅黑","黑体","ht",sans-serif; font-size: large; line-height: 130%; text-align: center; text-indent: 0em;} h4 { margin-top: 1em; margin-bottom: 0.5em; font-family:"微软雅黑","黑体","ht",sans-serif; font-size: medium; text-align: center; text-indent: 0em; line-height: 130%;} div.datestamp{font-family:"楷体","kt",serif;text-align: justify;text-indent: 0em;text-align: center;} .articledescription {font-family: "微软雅黑", 'ht', sans-serif;} span {font-family:"微软雅黑","黑体","ht",sans-serif;} span.lastupdated {font-family:"楷体","kt",serif;} a {font-family:"微软雅黑","黑体","ht",sans-serif;} ul {font-family:"宋体","zw",serif;} li {font-family:"宋体","zw",serif;} ol {font-family:"宋体","zw",serif;} div.module {text-indent: 0em;text-align: center;} img {text-align: center;} p.caption {font-family:"仿宋","fs",serif;text-align: center;text-indent: 0em;} hr {height: 1px; border: 0px; color: black; background-color: black} ''' __author__ = 'k4user' __version__ = '1.0' language = 'zh' pubisher = 'BBC description = 'BBC news in Chinese' category = 'News, Chinese' remove_javascript = True use_embedded_content = False no_stylesheets = True encoding = 'UTF-8' conversion_options = {'linearize_tables':True} masthead_url = 'http://wscdn.bbc.co.uk/zhongwen/simp/images/1024/brand.jpg' keep_only_tags = [ dict(name='h1'), dict(name='p', attrs={'class':['primary-topic','summary']}), dict(name='div', attrs={'class':['bodytext','datestamp','module']}), ] remove_tags = [dict(name='br', attrs={'class':['calibre12','calibre11']})] def preprocess_html(self, soup): for div in soup.findAll('div','module bodytext'): div['class']='module' return soup |
|
![]() |
![]() |
![]() |
#4 | ||
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 49
Karma: 475062
Join Date: Aug 2012
Device: nook simple touch
|
Quote:
Code:
def preprocess_html(self, soup): for div in soup.findAll('div',attrs={'class':'module bodytext'}): div['class']='module' return soup Quote:
|
||
![]() |
![]() |
![]() |
#5 |
onlinenewsreader.net
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 327
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
There is clesarly some other issue with the recipe. As you can see from this Python session what I proposed does work.
Code:
C:\Python27>python Python 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> from BeautifulSoup import * >>> x=BeautifulSoup('<html><body><div class="module bodytext">content</div></body></html>') >>> print x <html><body><div class="module bodytext">content</div></body></html> >>> y=x.findAll('div','module bodytext') >>> print y [<div class="module bodytext">content</div>] >>> for y in x.findAll('div','module bodytext'): y['class']='module' ... >>> print x <html><body><div class="module">content</div></body></htmt> >>> Last edited by nickredding; 08-13-2012 at 10:01 PM. |
![]() |
![]() |
Advert | |
|
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Value of Attribute "Class" is Invalid Error | TFaire | ePub | 2 | 09-23-2011 11:25 AM |
How do you remove class="whitespace"? | greenlees | Conversion | 8 | 07-03-2011 02:54 AM |
Changing or removing <div class="calibrenavbar"> | ptsefton | Recipes | 3 | 05-28-2011 08:30 AM |
keeping or removing a div with multiple classes | JohnsonZA | Recipes | 1 | 09-25-2010 10:33 AM |
Remove spacing between paragraphs - what about div tags ? | NASCARaddicted | Calibre | 5 | 11-07-2009 05:47 AM |