Custom recipes (archive, read-only) - Page 94

angelad · 02-08-2010, 09:56 PM

What a misleading thread title, hehe.

kiklop74 · 02-09-2010, 11:45 AM

Quote:

Originally Posted by jackietreehorn

I am trying to create a custom recipe for the Turkish newspaper site Radikal.

Here is a proper recipe for Radikal newspaper with support for Sony reader:

Starson17 · 02-09-2010, 12:42 PM

Quote:

Originally Posted by Starson17

The wife asks me to request a recipe (code) for recipes (food) from Epicurious:
http://www.epicurious.com/services/rss

She particularly wants the Healthy Recipes, New Recipes, Latest Features and Blogs (the first four feeds).
Thanks!

I'm still working on this recipe, but I've got some fundamental questions. Two of the four feeds work fine. They link to food recipes and I can pull the print version or the web page, process that and get good results. My question relates mostly to the Latest Features and Blogs feeds.

Those two link to web pages that discuss food recipes. The discussion usually includes a link to a food recipe. That link is similar to (or the same as) the direct food recipe link in the other two feeds, but it is one deeper. How does a link like that get handled? What I see is the link on the page has this format:

Code:

/recipes/food/photo/Easy-Chicken-Masala-357252

and I'm seeing calibre produce a link like this:

Code:

file:///recipes/food/photo/Easy-Chicken-Masala-357252

The calibre link goes nowhere. What controls whether a calibre recipe follows a link and retrieves the content? I've tried setting recursion deep enough. That doesn't seem to do it. I'm keeping the tag that includes that link. Does the get_article_url method apply to links on pages, or only on the links in the feed? I've read the wiki and user manual, but if the answers are there, they haven't sunk in.
Thanks.

kovidgoyal · 02-09-2010, 01:12 PM

recursions and match_regexps are the settings you need to handle multipage articles.

Starson17 · 02-09-2010, 01:25 PM

Quote:

Originally Posted by kovidgoyal

recursions and match_regexps are the settings you need to handle multipage articles.

Thanks. I hadn't realized that what I needed was in the API docs. Reading those has begun to set me straight.

BTW, I enjoy looking at the way you improve the proposed code I've submitted. I just looked over your changes to the bug fix I proposed on preventing the "read metadata only from filename" option from applying to recipes. While mine worked, your code is much more readable and modular.

kovidgoyal · 02-09-2010, 01:32 PM

I've learned the hard way over the last three years that putting in effort into keeping code readable and modular is the best way to ensure that calibre continues to grow and attracts contributions from as many people, like you, as possible.

Starson17 · 02-09-2010, 02:08 PM

Quote:

Originally Posted by kovidgoyal

recursions and match_regexps are the settings you need to handle multipage articles.

This comment solved my problem immediately. (After beating my head for too long.) Do you have a similar pithy comment to help me transform a link that I have followed with match_regexps into a more printer friendly format?

IOW, I'm already transforming links in the RSS feed into the printerfriendly format I need for the food recipes. I'm using the get_article_url method and changing the link with replace().

AFAICT, the get_article_url method only operates on the links in the feed, not the multipage links. Where is the best spot to grab a recursed link on a page that points to a food recipe and transform it into the same type of printerfriendly link before I follow it?

kovidgoyal · 02-09-2010, 03:27 PM

There isn't (recursion following happens in a whole different module). The only workaround is to use preprocess_html and index_to_soup to do it manually

Starson17 · 02-09-2010, 03:48 PM

Quote:

Originally Posted by kovidgoyal

There isn't (recursion following happens in a whole different module). The only workaround is to use preprocess_html and index_to_soup to do it manually

Thank you. Your comment still saves me a lot of effort.
Another question ( I know, they are endless - I will not be offended if you do not answer - I'm sure your time is better spent writing code than dragging me up the learning slope.)

I keep getting this error (below 'Processing images...') when trying to get food recipe images:

Spoiler:

The image it says is "Not Found," however, is easily retrieved in FireFox. I've tried looking at the headers in a FireFox session, I've considered, maybe it is a robots.txt, cookies or user agent issue, but I can't seem to figure it out. It retrieves fine in FF when I block cookies, and AFAICT, the fetch process uses a FF user agent and ignores robots.txt. I've even tried using a delay. Is this something I need to use mechanize for and fetch the image in a browser session, or am I missing something simpler?

Edit:
I think I've figured it out. There is an ASCII 0A character in the middle of the link in the page source, right where it breaks after 'http://www.epicurious.com/images/recipesmenus/2010/2010_february/' before '357252_116.jpg.'

I see another error in the output where it says it can't find 'http://www.epicurious.com/images/recipesmenus/2010/2010_february/%20357252_116.jpg' (Note percent 20 char).

The problem seems to be in the page source, but I'm not sure why it works in FF? Perhaps FF is cleaning it up somehow. Do I need to do a preprocess_html to fix this?

kovidgoyal · 02-09-2010, 05:13 PM

stick some print statements into fetch_url to debug the session. Also try customizing get_browser to disable cookies/handle refreshe, etc.

gwolosh · 02-09-2010, 05:46 PM

Looking for a recipe for feed://www.chabad.org/tools/rss/dailystudy_rss.xml

Support kindle2; not worried about Hebrew fonts

Thanks

Starson17 · 02-09-2010, 05:47 PM

Quote:

Originally Posted by kovidgoyal

stick some print statements into fetch_url to debug the session. Also try customizing get_browser to disable cookies/handle refreshe, etc.

I'm not sure if you saw my edit - the 0A character problem. I hadn't thought of debugging fetch_url. I was going down the road of trying to use preprocess_regexps.

I'm not sure if I understood it, but it looks like that will let me match and replace some portion of the fetched html page before it gets processed. I was thinking I could just remove the 0A character that was causing the problem. (I have some other uses for processing the html with a regex search replace).

However, the API described using re.compile to compile the regex, and I think I need to import re. Would this approach work, and if so, where do I import re from?

Edit:

OK, I should learn to think before typing. I solved it (with your help) The import format was easy to find. I just searched for where you used re.compile and found the answer was just 'import re'.

The print statement in fetch_url was absolutely vital to let me see that the fetch was getting a '\n' at the broken link point. I was able to remove that char with preprocess_regexps.

Thanks for the help!

eagle0877 · 02-09-2010, 06:24 PM

I would like a recipe for the following site:

http://thechronicleherald.ca/

Thanks

Starson17 · 02-09-2010, 07:09 PM

Quote:

Originally Posted by Starson17

The print statement in fetch_url was absolutely vital to let me see that the fetch was getting a '\n' at the broken link point. I was able to remove that char with preprocess_regexps.

My recipe is done, but if anyone else has used preprocess_regexps in a recipe - I'd like to know if there's any way to use the string that matches the regex in the replacement string? Thanks!

kiklop74 · 02-09-2010, 09:20 PM

Quote:

Originally Posted by gwolosh

Looking for a recipe for feed://www.chabad.org/tools/rss/dailystudy_rss.xml

Support kindle2; not worried about Hebrew fonts

Thanks

I'll take a look this weekend but have in mind that right-to-left language support is not quite so good on most of the available devices.

02-09-2010, 06:24 PM	#1408
eagle0877 Junior Member Posts: 1 Karma: 10 Join Date: Jan 2010 Device: Sony E-reader Pocket	Chronicle Herald I would like a recipe for the following site: http://thechronicleherald.ca/ Thanks

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Custom column read ?	pchrist7	Calibre	2	10-04-2010 03:52 AM
Archive for custom screensavers	sleeplessdave	Amazon Kindle	1	07-07-2010 01:33 PM
How to back up preferences and custom recipes?	greenapple	Calibre	3	03-29-2010 06:08 AM
Donations for Custom Recipes	ddavtian	Calibre	5	01-23-2010 05:54 PM
Help understanding custom recipes	andersent	Calibre	0	12-17-2009 03:37 PM

02-08-2010, 09:56 PM	#1396
angelad Groupie Posts: 163 Karma: 64 Join Date: Jun 2009 Device: kindle dx	What a misleading thread title, hehe.

02-09-2010, 01:12 PM	#1399
kovidgoyal creator of calibre Posts: 45,636 Karma: 28549046 Join Date: Oct 2006 Location: Mumbai, India Device: Various	recursions and match_regexps are the settings you need to handle multipage articles.

02-09-2010, 01:32 PM	#1401
kovidgoyal creator of calibre Posts: 45,636 Karma: 28549046 Join Date: Oct 2006 Location: Mumbai, India Device: Various	I've learned the hard way over the last three years that putting in effort into keeping code readable and modular is the best way to ensure that calibre continues to grow and attracts contributions from as many people, like you, as possible.

02-09-2010, 03:27 PM	#1403
kovidgoyal creator of calibre Posts: 45,636 Karma: 28549046 Join Date: Oct 2006 Location: Mumbai, India Device: Various	There isn't (recursion following happens in a whole different module). The only workaround is to use preprocess_html and index_to_soup to do it manually

02-09-2010, 05:13 PM	#1405
kovidgoyal creator of calibre Posts: 45,636 Karma: 28549046 Join Date: Oct 2006 Location: Mumbai, India Device: Various	stick some print statements into fetch_url to debug the session. Also try customizing get_browser to disable cookies/handle refreshe, etc.

02-09-2010, 05:46 PM	#1406
gwolosh Junior Member Posts: 3 Karma: 10 Join Date: Feb 2010 Device: kindle 2	Looking for a recipe for feed://www.chabad.org/tools/rss/dailystudy_rss.xml Support kindle2; not worried about Hebrew fonts Thanks

Advert

Advert