View Single Post
Old 06-02-2011, 04:50 PM   #3
BRGriff
Connoisseur
BRGriff began at the beginning.
 
Posts: 58
Karma: 12
Join Date: May 2011
Location: Deland, Florida
Device: Kindle 3
Thank you Starson17! The Arcamax fix is working great and the comics are larger and much easier to read.

GoComics is having trouble with its servers in the aftermath of its merger with Comics.com. Therefore I am having difficulty testing changes to the recipe code. I did try the preprocess code you submitted above but it didn't work for me. Maybe I did something wrong. I placed the code in the recipe after the "articles = self.make_links(url) subroutine and before the "def make_links(self, url):" subroutine.

I have also tried working with the "remove_tags":

dict(name='h1', attrs={'span':['by']}), and also dict(name='span', attr={'by':['']}), neither one of which worked.

It has occurred to me that not only do I need to get rid of the author's name, but also the comic's name as shown in red below:
<h1 ><a href="/kitandcarlyle/2011/06/02">Kit 'N' Carlyle</a><span> by Larry Wright</span></h1>

What may be easier is that the url in the original site HTML appears elsewhere without the comic strip name or the author's name. The HTML is shown below:

<div class="social-box">
<ul>
<li>
<form id="myspacepostto" method="post" action="http://www.myspace.com/index.cfm?fuseaction=postto" target="_blank">
<input type="hidden" name="u" value="http://www.gocomics.com/kitandcarlyle/2011/06/02"/>
</li>
</ul>
</div><!-- end div.social-box -->

I have edited out the extraneous HTML code. Once GoComics is up and running smoothly, I will try adding to "keep_only_tags" the code: dict(name='input', atrrs={'u':['value']}). Do you think that might work?

I very much appreciate all your help and patience.
BRGriff is offline   Reply With Quote