View Single Post
Old 02-22-2009, 09:19 AM   #272
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,410
Karma: 27757236
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by howsey View Post
Thanks for that. I've now got it working reasonably well. The next issue is that the article contains hyperlinks. The default processing seems to be to replace these with the element text and then include the url in brackets afterwards. Is there a way to stop the url coming out. My initial thought was to try the pre/post processing functions but this appears to filter out way too early.
Code:
def preprocess_html(soup):
    for a in soup.findAll('a', href=True): a['href'] = ''
    return soup
kovidgoyal is offline