I tried encoding = 'cp1252'
this makes the long dash show up as
� and makes a lot of text unreadable.
i think the replace solution is much better than figuring out encoding, the problem is only with em dash & they use a lot of them.
also tried latin1 .. doesn;t work