![]() |
#1 |
Member
![]() Posts: 19
Karma: 10
Join Date: Feb 2010
Location: Los Angeles, CA
Device: Kindle 3
|
Calibre Recipe HTML content differs from raw html of index.html.
First, many thanks for this excellent piece of software. I'm just learning to fully take advantage of its features.
The Question, then an explanation. > Why would calibre be receiving different html content than wget or or my browser (with javascript disabled)? Explanation: I'm writing a custom recipe for a blog I like, www.worldofweirdthings.com. It should be pretty basic, just want to pull down the daily article and strip some of the social widget images out. I started with the auto-generated recipe. Which looks like: Code:
# vim:ft=python from calibre.web.feeds.recipes import BasicNewsRecipe class WorldofWeirdThingsBlog(BasicNewsRecipe): title = u'World of Weird Things' oldest_article = 1 max_articles_per_feed = 5 feeds = [(u'World of Weird Things', u'feed://worldofweirdthings.com/feed/')] Code:
<div class="addtoany_share_save_container"> <ul class="addtoany_list"> <li> <a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save"><img src="http://worldofweirdthings.com/wp-content/plugins/add-to-any/share_save_256_24.png" width="256" height="24" alt="Share/Bookmark"/></a> </li> </ul> <script type="text/javascript"><!-- var a2a_config = a2a_config || {}; a2a_config.linkname="could a geocentrist even screw in a lightbulb?"; a2a_config.linkurl="http://worldofweirdthings.com/2010/09/16/could-a-geocentrist-even-screw-in-a-lightbulb/"; a2a_config.show_title=1; a2a_color_main="ffffff";a2a_color_border="000000";a2a_color_link_text="3c607b";a2a_color_link_text_hover="1c2a36";a2a_color_bg="ffffff"; //--> </script> <script type="text/javascript" src="http://static.addtoany.com/menu/page.js"></script> </div> Code:
from calibre.web.feeds.recipes import BasicNewsRecipe class WorldofWeirdThingsBlog(BasicNewsRecipe): title = u'World of Weird Things' oldest_article = 1 max_articles_per_feed = 5 remove_tags = [dict(name='div', attrs={'class':'addtoany_share_save_container'})] feeds = [(u'World of Weird Things', u'feed://worldofweirdthings.com/feed/')] Code:
ebook-convert my.recipe blog.mobi Code:
def preprocess_html(self, soup): print(soup.prettify()) The html that was printed had did not have the tag, but still had the image….outside the tag. It was almost as if the containing div were stripped but the URL was left in tact. This is what I saw at the bottom. Code:
<p> <a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save"> <img src="http://worldofweirdthings.com/wp-content/plugins/add-to-any/share_save_256_24.png" width="256" height="24" alt="Share/Bookmark" /> </a> </p> Last edited by krunk; 09-17-2010 at 04:56 PM. |
![]() |
![]() |
![]() |
#2 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
BTW, usually the recipe discussions go on in the Custom Recipe thread: https://www.mobileread.com/forums/sho...32543&page=183 It's inhabited by all the recipe junkies. Last edited by Starson17; 09-16-2010 at 04:33 PM. |
|
![]() |
![]() |
![]() |
#3 | |
Member
![]() Posts: 19
Karma: 10
Join Date: Feb 2010
Location: Los Angeles, CA
Device: Kindle 3
|
Quote:
I'll check that thread, things tend to get lost in epic threads though. |
|
![]() |
![]() |
![]() |
#4 |
Not who you think I am...
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 374
Karma: 30283
Join Date: Jan 2010
Location: Honolulu
Device: PocketBook 360 -- Ivory
|
There should be a recipe sub-forum.
|
![]() |
![]() |
![]() |
#5 |
Member
![]() Posts: 19
Karma: 10
Join Date: Feb 2010
Location: Los Angeles, CA
Device: Kindle 3
|
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Merging multiple HTML files into one HTML file | skoobwoman | Workshop | 45 | 07-11-2014 10:46 AM |
HTML to Mobi conversions (DocBook XSL, and content.opf?) | AndrewLB | Calibre | 3 | 09-04-2010 09:02 PM |
New to Calibre - Recipe/HTML question | ClairePMR | Calibre | 3 | 07-23-2010 11:53 AM |
HTML Book + non HTML TOC to epub | aarcane | Calibre | 4 | 03-02-2010 02:58 AM |
Calibre ... or bad HTML? | FizzyWater | Calibre | 1 | 07-20-2008 06:57 PM |