Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 02-23-2010, 10:55 PM   #1486
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 328
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
Mobipocket reader, also Kindle. I could try EPUB, but I doubt that's where the issue is. I did a debug pipeline and the emdash has been replaced in the input directory, before any of the output-specific processing is performed. I think the substitution must be happening in BeautifulSoup.
nickredding is offline  
Old 02-23-2010, 11:15 PM   #1487
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,410
Karma: 27757236
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
The news download system replaces entities with their UTF-8 equivalents. That's expected. Are you saying they're being saved as cp1252 in the input sub directory?
kovidgoyal is offline  
Old 02-23-2010, 11:26 PM   #1488
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 328
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
The original news source looks like
Quote:
this — that
and it shows up in the input dirctory of the debug pipeline as
Quote:
this — that
where those three characters are the three UTF-8 byte codes that represent emdash. I'm looking at the input directory with MS Expression Web. I guess I'm not understanding how these three UTF-8 byte codes are supposed to get back to an emdash for display on a device via Mobipocket reader, Kindle, MS Expression Web, or anything else.
nickredding is offline  
Old 02-24-2010, 12:19 AM   #1489
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,410
Karma: 27757236
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Basically the UTF-8 byte sequence for an emdash is rendered as an emdash by viewers that understand UTF-8 and have the necessary fonts to render the character.

Does the resultant MOBI display correctly in the calibre viewer?
kovidgoyal is offline  
Old 02-24-2010, 06:45 AM   #1490
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
Quote:
Originally Posted by nickredding View Post
I'm having a problem with a news feed that has emdash's included literally (instead of using — ) and they are being handled as follows: the emdash is recognized as such, and translated into a unicode emdash (u2014) which then turns up in the output as the UTF-8 equivalent (0xE2 0x80 0x94) and is displayed as — which is the CP1252 interpretation of those three character codes. I can't figure out how to fix this -- preprocess_regexps doesn't work. Can anyone help?
Would you mind sending your code?
kiklop74 is offline  
Old 02-24-2010, 09:41 AM   #1491
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 328
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
Quote:
Basically the UTF-8 byte sequence for an emdash is rendered as an emdash by viewers that understand UTF-8 and have the necessary fonts to render the character.

Does the resultant MOBI display correctly in the calibre viewer?
No -- it shows — just like Mobipocket reader, Kindle and Internet Explorer (from the input directory of the debug pipeline)
nickredding is offline  
Old 02-24-2010, 10:00 AM   #1492
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 328
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
Quote:
Would you mind sending your code?
I was running the standard recipe for the Vancouver Province. The article in question is off the index page now, but if you run the recipe on
http://www.theprovince.com/sports/20...576/story.html you'll see the problem--two emdashes in the body of the article

Last edited by nickredding; 02-24-2010 at 10:07 AM.
nickredding is offline  
Old 02-24-2010, 10:35 AM   #1493
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,410
Karma: 27757236
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
That webpage incorrectly declares its encoding to be is8859-1, when it is actually utf-8. Set encoding='utf-8' in your recipe.
kovidgoyal is offline  
Old 02-24-2010, 10:39 AM   #1494
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 328
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
Quote:
That webpage incorrectly declares its encoding to be is8859-1, when it is actually utf-8. Set encoding='utf-8' in your recipe.
Bingo -- thank you very much!
nickredding is offline  
Old 02-24-2010, 10:42 AM   #1495
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 328
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
How did you deduce the encoding is utf-8?
nickredding is offline  
Old 02-24-2010, 10:53 AM   #1496
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,410
Karma: 27757236
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Simple, I tried decoding it using utf-8, and the emdash was correctly decoded. I use a program called iconv to do this conveniently, but you can use calibre-debug as well
kovidgoyal is offline  
Old 02-25-2010, 10:53 AM   #1497
mrgrossm
Junior Member
mrgrossm began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Feb 2010
Device: Barnes & Noble Nook, Sony 505
Detroit News and Detroit Free Press

I created recipes for both the Detroit News and Free Press, but I can't get it right! The biggest problem is that both have a background, the News one is light enough, but the Free Press is really dark. Also both have lots of junk after the article that I don't know how to get rid of.

Can anybody help?
Attached Files
File Type: zip Detroit.zip (774 Bytes, 204 views)
mrgrossm is offline  
Old 02-26-2010, 06:25 AM   #1498
macrogeek
Junior Member
macrogeek began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Feb 2010
Device: Sony Reader Pocket Ed. (PRS-300)
I used the editor to make a quick and dirty recipe for Kukuburi.com.
I'm pretty happy w/ the result, but can't seem to export the recipe from Calibre.

Would anyone like to clean it up and save it as a file?
I didn't know how to trim the bottom buttons out of the feed.

class AdvancedUserRecipe1267141443(BasicNewsRecipe):

title = u'Kukuburi'

oldest_article = 30

max_articles_per_feed = 100

feeds = [(u'http://feeds.feedburner.com/kukuburi?format=xml', u'http://feeds.feedburner.com/kukuburi?format=xml')]
macrogeek is offline  
Old 02-27-2010, 02:11 AM   #1499
aoitenshi
Junior Member
aoitenshi began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Feb 2010
Device: Sony PRS 600
Can I request for receipe for TODAY online (SG) ?

http://www.todayonline.com/RSS
aoitenshi is offline  
Old 02-27-2010, 08:23 AM   #1500
moriakaice
Memento Mori
moriakaice began at the beginning.
 
Posts: 36
Karma: 10
Join Date: Apr 2007
Device: eClicto, iPad WiFi, Kindle 3 WiFi
Quote:
Originally Posted by macrogeek View Post
I used the editor to make a quick and dirty recipe for Kukuburi.com.
I'm pretty happy w/ the result, but can't seem to export the recipe from Calibre.

Would anyone like to clean it up and save it as a file?
I didn't know how to trim the bottom buttons out of the feed.
Here, this should do the job:
Code:
#!/usr/bin/env  python

__license__ = 'GPL v3'
__author__ = 'Mori'
__version__ = 'v. 0.1'
'''
Kukuburi.com
'''

from calibre.web.feeds.news import BasicNewsRecipe
import re

class KukuburiRecipe(BasicNewsRecipe):
	__author__ = 'Mori'
	language = 'en'

	title = u'Kukuburi'
	publisher = u'Ramón Pérez'
	description = u'KUKUBURI by Ramón Pérez'
	
	no_stylesheets = True
	remove_javascript = True
	
	oldest_article = 100
	max_articles_per_feed = 100
	
	feeds = [
		(u'Kukuburi', u'http://feeds2.feedburner.com/Kukuburi')
	]
	
	preprocess_regexps = [
		(re.compile(i[0], re.IGNORECASE | re.DOTALL), i[1]) for i in 
		[
			(r'<!--.*?-->', lambda match: ''),
			(r'<div class="feedflare".*?</div>', lambda match: '')
		]
	]
moriakaice is offline  
Closed Thread


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column read ? pchrist7 Calibre 2 10-04-2010 02:52 AM
Archive for custom screensavers sleeplessdave Amazon Kindle 1 07-07-2010 12:33 PM
How to back up preferences and custom recipes? greenapple Calibre 3 03-29-2010 05:08 AM
Donations for Custom Recipes ddavtian Calibre 5 01-23-2010 04:54 PM
Help understanding custom recipes andersent Calibre 0 12-17-2009 02:37 PM


All times are GMT -4. The time now is 10:37 AM.


MobileRead.com is a privately owned, operated and funded community.