04-15-2011, 04:18 AM | #1 |
Junior Member
Posts: 6
Karma: 10
Join Date: Apr 2011
Device: Kindle 3
|
skip_ad_pages & bad image links
Hi,
I have a problem with the skip_ad_pages method. The feed I want to parse returns me a "wrong" article URL like "http://bad/advertisement/page/story01.htm" which refers to an advertisement page containing the right article URL like "http://right/article/url/article.shtml" I use the skip_ad_pages method to get the right page and it works except for img links in the real page. Calibre prepend the wrong article URL to all the img tag which have "src" attribute like "path/to/image.jpg" so that the final image URL is "http://bad/advertisement/page/path/to/image.jpg" and not "http://right/article/url/path/to/image.jpg" This causes calibre fail when it tries fetching the image because it follows the wrong link. Which is the best way to solve this? Thankyou all in advance |
04-15-2011, 09:20 AM | #2 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
I have no idea of the "best" way. I'd probably use postprocess_html (as I understand skip_ad_pages is after preprocessing), then findAll the image links and fix them.
|
Advert | |
|
04-15-2011, 09:51 AM | #3 |
Junior Member
Posts: 6
Karma: 10
Join Date: Apr 2011
Device: Kindle 3
|
That is one of the solution I was thinking about...but how can I "forward" the correct URL to the postprocess method?
I think the problem is that calibre maintain the wrong url to the article in its parsed "internal" feed/article structure and doesn't replace it with the correct URL. Is there a way to perform this sobstitution? Perhaps just in the skip_ad_pages method itself? |
04-15-2011, 10:01 AM | #4 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
I've never needed skip_ad_pages, so I'm not familiar with it, and it's only used in two other builtin recipes I know of. I'm a bit surprised that you are finding this, as I would have expected it to have been seen in those other recipes. FYI, here's the code for those other recipes: Spoiler:
Whatever solution you find, post it here. |
|
04-15-2011, 10:33 AM | #5 |
Junior Member
Posts: 6
Karma: 10
Join Date: Apr 2011
Device: Kindle 3
|
I think I didn't explain clearly what is the problem.
I altrady use the skip_ad_pages method in my recipe the same way of the first code you quoted and it works except for the image link...so texts are fetched correctly but not images. I'm looking for something to replace the article url calibre has with the correct url because the final html is the right html code but the article url in the internal structure is still the wrong url. For this reason (I think) calibre prepend the wrong link to build the image source url. I don't know if it is possible inside the skip_ad_pages method because it takes only a "soup" and return only a "soup". I don't care to do this inside the skip_ad_pages method, but I don't know which is the method I can use to do this replacement and how to do that. I hope it is clearer now...my english is very rusty... |
Advert | |
|
04-15-2011, 10:57 AM | #6 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
I think I understood it the first time. (If I didn't, then I still don't understand). Why do you think images worked with those two recipes? Do they not use relative links for images, or is there something different about your site? I'm just curious to know the answer to this, even if it doesn't help you.
I understand you'd like to know how to change the internal base url so that relative urls for images work correctly after the ad page is skipped. I don't know the answer, but I posted how I'd try. I'd change relative links for images to full links with postprocess_html, so the internal base url should be irrelevant. You asked how to pass the correct part. I'd have to think about it. Is it available in the soup of the page? If not, didn't you have it in skip_ad_pages method? |
04-16-2011, 05:32 AM | #7 | |||
Junior Member
Posts: 6
Karma: 10
Join Date: Apr 2011
Device: Kindle 3
|
Quote:
Quote:
Quote:
Anyway, in the meantime I found two workarounds which solve my problem. The first: I discover that the final correct link is also available in the feed page, but inside the "guid" tag and not the "link" tag so I override the get_artcile_url method to extract directly the correct link, with no need to use skip_ad_pages. The second: With a sort of easy "reverse engineering" I understand the method to parse/decode the wrong link obtaining the right link, again overriding the get_artcile_url method. In those ways image works... |
|||
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
SD card firmware (1.4) image links for AU/NZ | sabredog | Kobo Reader | 42 | 03-27-2015 03:11 AM |
Links: Image replacement methods | ckirchho | ePub | 1 | 10-22-2012 04:54 AM |
skip_ad_pages & nmassage | bubak | Recipes | 1 | 04-13-2011 05:00 PM |
Converting to Mobi ignores image links | atjnjk | Conversion | 0 | 03-10-2011 09:03 PM |
Firmware Update Bad Image Refresh and Settings after 2.3 update | Insomnic | Amazon Kindle | 6 | 04-01-2010 11:59 AM |