Not that I use any terrible crapple specific stuff but:
Sounds like you've got some invalid characters (commas, colons etc) in the hrefs or using direct addressing; ie href="something.html#stuff" often causes trouble while href="../Text/something.html#stuff" will be fine.
Anyway, if you you need help, it's a good idea to provide us with an example(s), this regex :
Code:
<a\b(?:\s*\w+="[^"]*")*\s*(href="[^"]+")(?:\s*\w+="[^"]*")*\s*>
will grab all href's from <a> tags, group 1 contains just the href itself, if you could run this over your html and save the matches, that would be quite helpful.