J'd like to ask you why you don't incorporate
readability technology to fetch news ?
because is:
free (opensource) also used in safari reader
works great.
you can find out more on
https://www.readability.com/
also you can install safari to check it out
http://www.webmonkey.com/2010/06/saf...ifies-the-web/
This technology is very powerful and will a lot better extract news...
here is source code :
http://code.google.com/p/arc90labs-readability/