![]() |
#1396 |
Groupie
![]() Posts: 163
Karma: 64
Join Date: Jun 2009
Device: kindle dx
|
What a misleading thread title, hehe.
|
![]() |
![]() |
#1397 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
Here is a proper recipe for Radikal newspaper with support for Sony reader:
|
![]() |
Advert | |
|
![]() |
#1398 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Those two link to web pages that discuss food recipes. The discussion usually includes a link to a food recipe. That link is similar to (or the same as) the direct food recipe link in the other two feeds, but it is one deeper. How does a link like that get handled? What I see is the link on the page has this format: Code:
/recipes/food/photo/Easy-Chicken-Masala-357252 Code:
file:///recipes/food/photo/Easy-Chicken-Masala-357252 Thanks. |
|
![]() |
![]() |
#1399 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,414
Karma: 27757236
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
recursions and match_regexps are the settings you need to handle multipage articles.
|
![]() |
![]() |
#1400 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
BTW, I enjoy looking at the way you improve the proposed code I've submitted. I just looked over your changes to the bug fix I proposed on preventing the "read metadata only from filename" option from applying to recipes. While mine worked, your code is much more readable and modular. |
|
![]() |
Advert | |
|
![]() |
#1401 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,414
Karma: 27757236
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I've learned the hard way over the last three years that putting in effort into keeping code readable and modular is the best way to ensure that calibre continues to grow and attracts contributions from as many people, like you, as possible.
|
![]() |
![]() |
#1402 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
IOW, I'm already transforming links in the RSS feed into the printerfriendly format I need for the food recipes. I'm using the get_article_url method and changing the link with replace(). AFAICT, the get_article_url method only operates on the links in the feed, not the multipage links. Where is the best spot to grab a recursed link on a page that points to a food recipe and transform it into the same type of printerfriendly link before I follow it? Last edited by Starson17; 02-09-2010 at 01:10 PM. |
|
![]() |
![]() |
#1403 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,414
Karma: 27757236
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
There isn't (recursion following happens in a whole different module). The only workaround is to use preprocess_html and index_to_soup to do it manually
|
![]() |
![]() |
#1404 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Another question ( I know, they are endless - I will not be offended if you do not answer - I'm sure your time is better spent writing code than dragging me up the learning slope.) I keep getting this error (below 'Processing images...') when trying to get food recipe images: Spoiler:
The image it says is "Not Found," however, is easily retrieved in FireFox. I've tried looking at the headers in a FireFox session, I've considered, maybe it is a robots.txt, cookies or user agent issue, but I can't seem to figure it out. It retrieves fine in FF when I block cookies, and AFAICT, the fetch process uses a FF user agent and ignores robots.txt. I've even tried using a delay. Is this something I need to use mechanize for and fetch the image in a browser session, or am I missing something simpler? Edit: I think I've figured it out. There is an ASCII 0A character in the middle of the link in the page source, right where it breaks after 'http://www.epicurious.com/images/recipesmenus/2010/2010_february/' before '357252_116.jpg.' I see another error in the output where it says it can't find 'http://www.epicurious.com/images/recipesmenus/2010/2010_february/%20357252_116.jpg' (Note percent 20 char). The problem seems to be in the page source, but I'm not sure why it works in FF? Perhaps FF is cleaning it up somehow. Do I need to do a preprocess_html to fix this? Last edited by Starson17; 02-09-2010 at 04:20 PM. |
|
![]() |
![]() |
#1405 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,414
Karma: 27757236
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
stick some print statements into fetch_url to debug the session. Also try customizing get_browser to disable cookies/handle refreshe, etc.
|
![]() |
![]() |
#1406 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Feb 2010
Device: kindle 2
|
Looking for a recipe for feed://www.chabad.org/tools/rss/dailystudy_rss.xml
Support kindle2; not worried about Hebrew fonts Thanks |
![]() |
![]() |
#1407 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
I'm not sure if I understood it, but it looks like that will let me match and replace some portion of the fetched html page before it gets processed. I was thinking I could just remove the 0A character that was causing the problem. (I have some other uses for processing the html with a regex search replace). However, the API described using re.compile to compile the regex, and I think I need to import re. Would this approach work, and if so, where do I import re from? Edit: OK, I should learn to think before typing. I solved it (with your help) The import format was easy to find. I just searched for where you used re.compile and found the answer was just 'import re'. The print statement in fetch_url was absolutely vital to let me see that the fetch was getting a '\n' at the broken link point. I was able to remove that char with preprocess_regexps. Thanks for the help! Last edited by Starson17; 02-09-2010 at 05:12 PM. |
|
![]() |
![]() |
#1408 |
Junior Member
![]() Posts: 1
Karma: 10
Join Date: Jan 2010
Device: Sony E-reader Pocket
|
Chronicle Herald
|
![]() |
![]() |
#1409 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
My recipe is done, but if anyone else has used preprocess_regexps in a recipe - I'd like to know if there's any way to use the string that matches the regex in the replacement string? Thanks!
|
![]() |
![]() |
#1410 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
I'll take a look this weekend but have in mind that right-to-left language support is not quite so good on most of the available devices.
|
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Custom column read ? | pchrist7 | Calibre | 2 | 10-04-2010 02:52 AM |
Archive for custom screensavers | sleeplessdave | Amazon Kindle | 1 | 07-07-2010 12:33 PM |
How to back up preferences and custom recipes? | greenapple | Calibre | 3 | 03-29-2010 05:08 AM |
Donations for Custom Recipes | ddavtian | Calibre | 5 | 01-23-2010 04:54 PM |
Help understanding custom recipes | andersent | Calibre | 0 | 12-17-2009 02:37 PM |