![]() |
#2491 |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 151
Karma: 1002968
Join Date: Dec 2008
Device: none
|
Well I've attempted this but not getting what I expected.
Anyone care to take on the Baltimore Sun? http://www.baltimoresun.com/about/bl...62819.htmlpage |
![]() |
![]() |
#2492 | |
Addict
![]() Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
Quote:
![]() For example: Code:
<p class="calibre9" This is the line I wish to keep </p> <p class="calibre9" This is the line I wish to delete </p> <p class="calibre9" Some more stuff </p> [code ] var string = "This is the line I wish to delete" remove_tag [ where contains(string) or var string = "This is the line I wish to delete" var replacestring = "Calibre Rocks" replace_tag [ replace_where(string,replacestring) ] [/code] the above is just pseudo code but I hope you understand my logic. |
|
![]() |
![]() |
#2493 | ||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
Also, your examples are missing the closing tag marker ">" after <p class="calibre9" However, assuming that you're just using that as an example (I.e., you're as lazy as I am and didn't want to go back and open up the original site), the answer to your question is "yes - it's possible to insert or remove a line." and "yes, I understand your pseudo code." Am I correct in thinking that your next question is "How?" Spoiler:
|
||
![]() |
![]() |
#2494 | |
Addict
![]() Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
Quote:
![]() lets say in every parse i get something that has a doubleclick.net ad in it I tried Code:
filter_regexps = [r'feedads\.g\.doubleclick\.net'] thought well maybe if i use preprocess_regexps and remove all the instances of doubleclick first. So then i looked in the beautiful soup documentation and after a big headache i'm still kinda lost ![]() I tried this as well... Code:
preprocess_regexps = [(re.compile(r'feedads\.g\.doubleclick\.net', re.DOTALL), lambda m: '')] |
|
![]() |
![]() |
#2495 | |
Addict
![]() Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
Quote:
This works for me the only issue I have is for the life of me I can't figure out how to get it to remove the doubleclick.net add that it puts on some of the articles. Maybe someone can help you/me out on that one. I have tried filter_regexp with no go. Anyway enjoy... |
|
![]() |
![]() |
#2496 |
Pew Pew!
![]() ![]() ![]() Posts: 29
Karma: 270
Join Date: Aug 2010
Device: Kindle v3
|
Made these w/ icons. Hopefully they help and get added to the main program. I made a .py for all the Gawker Media Brand websites and Consumerist and added the icons for good measure.
Gawker.com deadspin.com io9.com jalopnik.com jezebel.com kotaku.com lifehacker.com fleshbot.com Consumerist.com Consumerist is done and working well, just can't figure out how to remove a lone image that shows up on each page for twitter. It's small though, so not a big detractor. |
![]() |
![]() |
#2497 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Aug 2010
Device: PRS 900
|
Hallo, I´m new her, Somebody can help me by create recipe from www.europasur.es ? I´try edit a recipe from El Pais , but rss links are very different ... Thank you
|
![]() |
![]() |
#2498 | ||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
Let's start with filter_regexps. I've only used it once. It's used to prevent a link from being followed. Most of the time, you're not following a link because recursion is off and Calibre isn't following links on the pages. What you normally want to do is remove the link or graphic from your page, not prevent it from being followed by Calibre. OTOH, I use preprocess_regexps a lot - but as a sort of last resort. It's simply a powerful search and replace on the HTML. You could do most of your remove_tags with preprocess_regexps if you wanted to. But, it's not tag-aware, so remove_tags is better in most cases (it won't be confused if there's a div tag inside a div tag, where S&R might find the open div tag of an outer tag and the close div of an inner tag. Why don't you show me the actual page source for the doubleclick you want to deal with, or give me a link,so I can understand what you are trying to remove? BTW, If you look at page source with your browser, it may not be the same as what Calibre sees. It may also be wrong if you look at it with FireBug. To see it as Calibre will see it I like to do this: Code:
def preprocess_html(self, soup): print 'The soup is: ', soup return soup |
||
![]() |
![]() |
#2499 | |
Enthusiast
![]() Posts: 34
Karma: 54
Join Date: Jul 2008
Device: not yet
|
Hi
I try based on the financial times recipes to adapt it to lloyd's List and I get this error Quote:
Code:
"<div class="grid_4 prefix_2 controls-container"> <div class="grid_4 first last common-box last-in-row" id="log-in-box"> <h2 class="common-box-header">Please Log In</h2> <form class="log-in" method="post" action="/ll/security_check"> <fieldset> <label for="j_username">Username:</label> <input class="common-field log-in-page" type="text" name="j_username" id="j_username" value="" tabindex="1"/> <label for="j_password">Password:</label> <input class="common-field log-in-page" type="password" name="j_password" id="j_password" tabindex="2"/> <input class="submit log-in-page" type="submit" value="Log In" tabindex="4"/> <label for="_spring_security_remember_me"><input type="checkbox" id="_spring_security_remember_me" name="_spring_security_remember_me" tabindex="3"/>Remember me</label> <a class="pwd-reminder" href="/ll/forgotten-password.htm">Forgotten your password?</a> </fieldset> |
|
![]() |
![]() |
#2500 | ||
Junior Member
![]() Posts: 4
Karma: 10
Join Date: Aug 2010
Location: Colombia
Device: Sony PRS-300
|
How I can change the title font?...
Let's see if someone can help me.
I made this recipe from a magazine in Colombia (larepublica.com.co). Everything comes as is the want but with a problem, is that the source of the title of each story as I get the source of the article and wanted to come out big and bold but How I can do this?, What command should I add? ... Thanks! Here's the recipe: Quote:
Quote:
Last edited by miangue; 08-23-2010 at 04:03 PM. |
||
![]() |
![]() |
#2501 | |
Addict
![]() Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
Quote:
As for the the issue where I had the doubleclick. It was the baltimoresun. I took a stab at it for that guy/gal that wanted someone to look at it. To the most part everything is fine with the rss feed except it puts in that google ad on some of the pages generally the first article. when i look at the orginal source it has ad.doubleclick.net in it then after it is rendered with calibre it is feedsad.g.doubleclick.net here is the recipe I am currently using for it... Spoiler:
Personally, I don't like the RSS feed of that site. I have considered trying to make a feed myself from this... http://www.baltimoresun.com/services...print-edition/ which actually gives you some nice pretty images and so forth. I figured in that cause I would simply use a Recursions =1 and then somehow strip what I didn't want using maybe keep_only or remove_tags. Or I could simply take and somehow make a print_version that looks for the text of Print inside a <a> tag and then simply get that url and pass it back. The only issue with using the print version on that is I loose the photos which I don't want to do. It is just something I'm playing with to learn and to also help someone else in the process ![]() |
|
![]() |
![]() |
#2502 |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Jun 2010
Device: none
|
My Recipe fails to place Articles data in epub.
Been trying to create my first simple recipe for a local paper - Ilkeston Advertiser (Derbyshire, England) with Free RSS Feeds. Manage to get the logon process working and ran the recipe in test mode. It seemed to download the first two articles into seperate directories each with an index.html first and an image subdirectory. Displaying the index file in Firefox shows the article data is being downloaded ok.
When I run the recipe in Calibre I get the the index summary pages ok but all the artciles refered to just contain header (Next Link, etc.) and footer lines (downloaded by Calibre, etc.). Have I missed a something out? Thanks. Spoiler:
|
![]() |
![]() |
#2503 |
US Navy, Retired
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,890
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
|
I applaud you using the spoiler tags, but first you have to wrap your recipe in the code tags (the # above) then wrap that with the spoiler tags. Placing your recipe in the code tags keeps your recipe intact with the critical spaces in their proper places. This makes trying your recipe and reviewing it easier on those that have the needed skills to assist you.
|
![]() |
![]() |
#2504 | |||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
Quote:
|
|||
![]() |
![]() |
#2505 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
|
|
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Custom column read ? | pchrist7 | Calibre | 2 | 10-04-2010 02:52 AM |
Archive for custom screensavers | sleeplessdave | Amazon Kindle | 1 | 07-07-2010 12:33 PM |
How to back up preferences and custom recipes? | greenapple | Calibre | 3 | 03-29-2010 05:08 AM |
Donations for Custom Recipes | ddavtian | Calibre | 5 | 01-23-2010 04:54 PM |
Help understanding custom recipes | andersent | Calibre | 0 | 12-17-2009 02:37 PM |