View Single Post
Old 02-25-2015, 09:28 PM   #3
Apollo Mok
Junior Member
Apollo Mok began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Feb 2015
Device: Kindle
I would like to remove the text "<br>" inside content attribute of <meta> tag as below:

<h1 id="article-title" class="sub_ex_article-title">
sometext<br>
<meta content="sometext<br>" name="title">
</h1>


The child is malformed and I can't remove the <br> with prepocess_html,
coz BeautifulSoup doesn't seems to work with this malformed HTML.

Should I use prepocess_raw_html? And is there any sample code available?
How can I remove the whole line of <meta>... name="title"> ?

Thanks

Last edited by Apollo Mok; 02-25-2015 at 09:33 PM.
Apollo Mok is offline   Reply With Quote