![]() |
#1 |
Junior Member
![]() Posts: 6
Karma: 10
Join Date: Jun 2013
Device: Kindle Touch
|
Recipe works fine with EPUB output, but not with MOBI
Hi,
Here's an interesting one. The following recipe produces a perfectly decent .epub, but if using .mobi (even when converting the good .epub to .mobi), almost all of the article somehow disappear... Of course I have kindle, so I can't use epub. Does it make sense to anyone? Thanks. The recipe: Code:
import string, re from calibre import strftime from calibre.web.feeds.recipes import BasicNewsRecipe from calibre.ebooks.BeautifulSoup import BeautifulSoup class WV(BasicNewsRecipe): title = 'Workers Vanguard' description = 'Current issue of Workers Vanguard' needs_subscription = False no_stylesheets = True def print_version(self, url): return string.join(["http://www.spartacist.org/print/english/wv/", url],'') def parse_index(self): soup = self.index_to_soup('http://spartacist.org/english/wv/index.html') articles = [] # find print URL of main article in index page for div in soup.findAll(text=re.compile("Printable")): a = div.findParent('a', href=True) if not a: continue else: url1 = string.split(re.sub(r'\?.*', '', a['href']), '/') url = string.join([url1[-2], '/', url1[-1]],'') # get issue number and date. Note we could find issue number from the URLs... for div in soup.findAll(id='folio'): a = div.string if a: date = a print string.join(['Found date: ', date]) self.timefmt = date else: pubname = div.i.string print(pubname) issuenostring = div.i.findNextSibling(text=True) print string.join(['Found issue number string: ', issuenostring]) self.title = string.join([pubname, issuenostring], '') # find headline of main article in index page for div in soup.findAll(id='headline'): headline = div.string print(string.join(['Found article ', headline, 'at url', url])) articles.append({'title':headline, 'url':url, 'description':'', 'date':date}) # find following articles articles (parsing Table of Content at right of index page) for div in soup.findAll(id='smlheadline'): a = div.find('a', href=True) if not a: continue else: url = re.sub(r'\?.*', '', a['href']) headline = a.string print(string.join(['Found article', headline, 'at url', url])) articles.append({'title':headline, 'url':url, 'description':'', 'date':date}) return [(string.join(['Workers Vanguard', issuenostring], ''), articles)] def postprocess_html(self, soup, first): for div in soup.findAll(id='headline'): div.name = 'h1' for div in soup.findAll(id='kicker'): div.name = 'h2' for div in soup.findAll(id='subhead'): div.name = 'h3' for div in soup.findAll(id='nytimes'): div.name = 'h3' for div in soup.findAll(id='wvquote'): div.name = 'blockquote' for div in soup.findAll(id='wvcite'): div.name = 'blockquote' return soup |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,229
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
It's likely that the website uses some kind of markup not supported by MOBI. A common one is tables for which you can try
conversion_options = {'linearize_tables':True} in your recipe. |
![]() |
![]() |
Advert | |
|
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
TOC works fine until adding a new cover? | M4cc45 | Conversion | 0 | 02-27-2012 01:42 PM |
Table of Contents- fine in mobi, problems in epub | jmanque | Calibre | 0 | 11-13-2011 07:51 AM |
Embedding fonts for epub & mobi output. | Nigel Flanagan | Conversion | 11 | 02-23-2011 02:33 PM |
Epub works fine on Reader, fails epubcheck spectacularly | jmatthew | ePub | 3 | 01-05-2011 06:03 AM |
Calibre epub works fine on Reader, fails epubcheck spectacularly | jmatthew | Calibre | 2 | 01-04-2011 03:12 PM |