Hello,
Using this wonderful program (thank's a lot Govid!), i have tried to add the support for "Le Monde" a french newspaper. It was working pretty well, but yesterday they changed both their structure and encoding, switching from utf8 to iso-8859-1.
Now, my new profile captures the articles but with weird encoding.
If i add in the regex,for instance,
<head><meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1"></head>
my characters are correct, but all the crap is not stripped from the articles.
Here is my profile
I would be very grateful for your help...