Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 07-12-2013, 06:28 PM   #1
Rackamouth
Junior Member
Rackamouth began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jun 2013
Device: Kindle Touch
Recipe works fine with EPUB output, but not with MOBI

Hi,

Here's an interesting one. The following recipe produces a perfectly decent .epub, but if using .mobi (even when converting the good .epub to .mobi), almost all of the article somehow disappear... Of course I have kindle, so I can't use epub. Does it make sense to anyone?

Thanks.

The recipe:
Code:
import string, re
from calibre import strftime
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup

class WV(BasicNewsRecipe):

	title       = 'Workers Vanguard'
	description = 'Current issue of Workers Vanguard'
	needs_subscription = False
	no_stylesheets = True    
    
	def print_version(self, url):
		return string.join(["http://www.spartacist.org/print/english/wv/", url],'')

	def parse_index(self):
		soup = self.index_to_soup('http://spartacist.org/english/wv/index.html')
		articles = []
		
		# find print URL of main article in index page
		for div in soup.findAll(text=re.compile("Printable")):
			a = div.findParent('a', href=True)
			if not a: continue 
			else: 
				url1 = string.split(re.sub(r'\?.*', '', a['href']), '/')
				url = string.join([url1[-2], '/', url1[-1]],'')
						
		# get issue number and date. Note we could find issue number from the URLs...
		for div in soup.findAll(id='folio'):
			a = div.string
			if a:
				date = a
				print string.join(['Found date: ', date])
				self.timefmt = date
			else:
				pubname = div.i.string
				print(pubname)
				issuenostring = div.i.findNextSibling(text=True)
				print string.join(['Found issue number string: ', issuenostring]) 
				self.title = string.join([pubname, issuenostring], '')
		
		# find headline of main article in index page
		for div in soup.findAll(id='headline'):
			headline = div.string
			print(string.join(['Found article ', headline, 'at url', url]))
			articles.append({'title':headline, 'url':url, 'description':'', 'date':date})
		
		# find following articles articles (parsing Table of Content at right of index page)
		for div in soup.findAll(id='smlheadline'):
			a = div.find('a', href=True)
			if not a: continue 
			else: 
				url = re.sub(r'\?.*', '', a['href'])
				headline = a.string
				print(string.join(['Found article', headline, 'at url', url]))
				articles.append({'title':headline, 'url':url, 'description':'', 'date':date})
				
		return [(string.join(['Workers Vanguard', issuenostring], ''), articles)]
		
	def postprocess_html(self, soup, first):
		for div in soup.findAll(id='headline'):
			div.name = 'h1'
		for div in soup.findAll(id='kicker'):
			div.name = 'h2'
		for div in soup.findAll(id='subhead'):
			div.name = 'h3'
		for div in soup.findAll(id='nytimes'):
			div.name = 'h3'
		for div in soup.findAll(id='wvquote'):
			div.name = 'blockquote'
		for div in soup.findAll(id='wvcite'):
			div.name = 'blockquote'
		return soup
Rackamouth is offline   Reply With Quote
Old 07-12-2013, 10:48 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,380
Karma: 4961459
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
It's likely that the website uses some kind of markup not supported by MOBI. A common one is tables for which you can try

conversion_options = {'linearize_tables':True}

in your recipe.
kovidgoyal is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
TOC works fine until adding a new cover? M4cc45 Conversion 0 02-27-2012 01:42 PM
Table of Contents- fine in mobi, problems in epub jmanque Calibre 0 11-13-2011 07:51 AM
Embedding fonts for epub & mobi output. Nigel Flanagan Conversion 11 02-23-2011 02:33 PM
Epub works fine on Reader, fails epubcheck spectacularly jmatthew ePub 3 01-05-2011 06:03 AM
Calibre epub works fine on Reader, fails epubcheck spectacularly jmatthew Calibre 2 01-04-2011 03:12 PM


All times are GMT -4. The time now is 09:46 PM.


MobileRead.com is a privately owned, operated and funded community.