|
|
#1 |
|
Junior Member
![]() Posts: 6
Karma: 10
Join Date: Jun 2013
Device: Kindle Touch
|
Recipe works fine with EPUB output, but not with MOBI
Hi,
Here's an interesting one. The following recipe produces a perfectly decent .epub, but if using .mobi (even when converting the good .epub to .mobi), almost all of the article somehow disappear... Of course I have kindle, so I can't use epub. Does it make sense to anyone? Thanks. The recipe: Code:
import string, re
from calibre import strftime
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup
class WV(BasicNewsRecipe):
title = 'Workers Vanguard'
description = 'Current issue of Workers Vanguard'
needs_subscription = False
no_stylesheets = True
def print_version(self, url):
return string.join(["http://www.spartacist.org/print/english/wv/", url],'')
def parse_index(self):
soup = self.index_to_soup('http://spartacist.org/english/wv/index.html')
articles = []
# find print URL of main article in index page
for div in soup.findAll(text=re.compile("Printable")):
a = div.findParent('a', href=True)
if not a: continue
else:
url1 = string.split(re.sub(r'\?.*', '', a['href']), '/')
url = string.join([url1[-2], '/', url1[-1]],'')
# get issue number and date. Note we could find issue number from the URLs...
for div in soup.findAll(id='folio'):
a = div.string
if a:
date = a
print string.join(['Found date: ', date])
self.timefmt = date
else:
pubname = div.i.string
print(pubname)
issuenostring = div.i.findNextSibling(text=True)
print string.join(['Found issue number string: ', issuenostring])
self.title = string.join([pubname, issuenostring], '')
# find headline of main article in index page
for div in soup.findAll(id='headline'):
headline = div.string
print(string.join(['Found article ', headline, 'at url', url]))
articles.append({'title':headline, 'url':url, 'description':'', 'date':date})
# find following articles articles (parsing Table of Content at right of index page)
for div in soup.findAll(id='smlheadline'):
a = div.find('a', href=True)
if not a: continue
else:
url = re.sub(r'\?.*', '', a['href'])
headline = a.string
print(string.join(['Found article', headline, 'at url', url]))
articles.append({'title':headline, 'url':url, 'description':'', 'date':date})
return [(string.join(['Workers Vanguard', issuenostring], ''), articles)]
def postprocess_html(self, soup, first):
for div in soup.findAll(id='headline'):
div.name = 'h1'
for div in soup.findAll(id='kicker'):
div.name = 'h2'
for div in soup.findAll(id='subhead'):
div.name = 'h3'
for div in soup.findAll(id='nytimes'):
div.name = 'h3'
for div in soup.findAll(id='wvquote'):
div.name = 'blockquote'
for div in soup.findAll(id='wvcite'):
div.name = 'blockquote'
return soup
|
|
|
|
|
|
#2 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,617
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
It's likely that the website uses some kind of markup not supported by MOBI. A common one is tables for which you can try
conversion_options = {'linearize_tables':True} in your recipe. |
|
|
|
| Advert | |
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| TOC works fine until adding a new cover? | M4cc45 | Conversion | 0 | 02-27-2012 02:42 PM |
| Table of Contents- fine in mobi, problems in epub | jmanque | Calibre | 0 | 11-13-2011 08:51 AM |
| Embedding fonts for epub & mobi output. | Nigel Flanagan | Conversion | 11 | 02-23-2011 03:33 PM |
| Epub works fine on Reader, fails epubcheck spectacularly | jmatthew | ePub | 3 | 01-05-2011 07:03 AM |
| Calibre epub works fine on Reader, fails epubcheck spectacularly | jmatthew | Calibre | 2 | 01-04-2011 04:12 PM |