Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 09-29-2013, 07:34 AM   #1
RayV
Junior Member
RayV began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Sep 2013
Device: Kobo Mini
Does Calibre recognise <image> tags?

I'm downloading articles from Telegraph UK RSS feed http://www.telegraph.co.uk/news/worldnews/rss with the builtin recipe.

The images in the web page in <div id="mainBodyArea" ..> referenced in the <image.. tags are not being saved in the Calibre generated epub.

Example from the web page:

<image refid="3783387" version="c" width="460" height="287" caption="" declared-caption="" src="http://may-be-another-web-site.com/multimedia/archive/03783/picture.jpg" photographer="" name=""></image>

is saved as:

<image src="http://may-be-another-web-site.com/multimedia/archive/03783/picture.jpg" version="c" caption="" photographer="" height="287" width="460" declared-caption="" refid="3783387" name=""/>

in the Calibre epub.



I modified the recipe using Re-usable code 'sticky' #21 "Embed images into an ebook" by kiavash
to change the <image> tags to <img> and it worked - all images are now being embedded in the Calibre epub.


So, is the problem that Calibre doesn't recognise <image> tags?


Modified recipe:

Spoiler:

__license__ = 'GPL v3'
__copyright__ = '2008-2010, Darko Miletic <darko.miletic at gmail.com>'
'''
telegraph.co.uk
'''

from calibre.web.feeds.news import BasicNewsRecipe

class TelegraphUK(BasicNewsRecipe):
title = 'Telegraph World News-4'
__author__ = 'Darko Miletic and Sujata Raman'
description = 'News from United Kingdom'
oldest_article = 1
category = 'news, politics, UK'
publisher = 'Telegraph Media Group ltd.'
max_articles_per_feed = 12
no_stylesheets = True
language = 'en_GB'
remove_empty_feeds = True
use_embedded_content = False

extra_css = '''
h1{font-family :Arial,Helvetica,sans-serif; font-size:1.2 em; }
h2{font-family :Times; font-size:1 em; font-style: italic; color:#444444;}
.story{font-family :Arial,Helvetica,sans-serif; font-size: .6 em;}
.byline{color:#666666; font-family :Arial,Helvetica,sans-serif; font-size: .6 em; font-style: italic}
#a{color:#234B7B; }
.imageExtras{color:#666666; font-family :Arial,Helvetica,sans-serif; font-size: .6 em;}
.caption {font-family :Times; font-size: .7 em; font-style: italic}
sup {font-family :Times; font-size: .7 em; font-style: italic}
'''

conversion_options = {
'comment' : description
, 'tags' : category
, 'publisher' : publisher
, 'language' : language
}


keep_only_tags = [
dict(name='div', attrs={'class':['storyHead','byline']})
,dict(name='div', attrs={'id':'mainBodyArea' })
]
remove_tags = [dict(name='div', attrs={'class':['related_links_inline',"imgindex","next","prev","g utterUnder",'ssImgHide','imageExtras','ssImg hide','related_links_video']})
,dict(name='ul' , attrs={'class':['shareThis shareBottom']})
,dict(name='span', attrs={'class':['num','placeComment','credit']})
]

feeds = [
(u'World News' , u'http://www.telegraph.co.uk/news/worldnews/rss' )
]

# Ref: https://www.mobileread.com/forums/sho...0&postcount=21
def preprocess_html(self, soup):
# Includes all the figures inside the final ebook
# Finds all the jpg links
for figure in soup.findAll('image', attrs = {'src' : lambda x: x and 'jpg' in x}):
figure.name = 'img' # converts the links to img
return soup

def populate_article_metadata(self, article, soup, first):
if first and hasattr(self, 'add_toc_thumbnail'):
picdiv = soup.find('img')
if picdiv is not None:
self.add_toc_thumbnail(article,picdiv['src'])

def get_article_url(self, article):
url = article.get('link', None)
if 'picture-galleries' in url or 'pictures' in url or 'picturegalleries' in url :
url = None
return url
RayV is offline   Reply With Quote
Old 09-29-2013, 09:28 AM   #2
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 30,914
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
the standard tag is <img
theducks is online now   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Calibre will not recognise (Sony PRS 650) styxywyx Devices 39 08-03-2012 04:17 AM
CALIBRE WILL NOT RECOGNISE TITLES OR AUTHORS D.. Calibre 5 09-14-2010 09:33 PM
Calibre doens't recognise my device malfromcessnock Devices 6 07-24-2010 03:31 AM
Calibre doesn't recognise Veiwsonic reader TonyTerra Calibre 3 03-06-2010 06:58 PM
Calibre won't recognise Sony reader mattjg01 Calibre 19 10-01-2008 05:52 PM


All times are GMT -4. The time now is 11:22 AM.


MobileRead.com is a privately owned, operated and funded community.