View Single Post
Old 08-25-2010, 09:06 AM   #2518
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by naisren View Post
as you see, there is "/" in the code
Code:
<a href="/Business_Etiquette_1.html" />
, and another "/" in
Code:
</a>
It seems Calibre can not deal with it as the browser, firefox or IE, it will skip after meeting the first "/".
link "a" tag is one case, division div tag has also such problems, such as
Code:
<div id="text"/>......</div>
How to deal with such codes using recipe, I can't get any links using:
soup.find(id='text').findAll('a') to handle the mentioned code.
Sorry, but I can't quite follow your question. Are you saying you can't reference tags by "id" or "href," etc.?

I've never run into the trailing slashes inside opening tags like you've posted, so I have no first hand experience. I would still expect normal referencing to work, but if it doesn't, you have various options. You can try search and replace to remove them with preprocess_regexps. You could remove just the slashes, or modify the whole tag with S&R, or use pre or postprocess_html and Beautiful Soup to identify the tag and extract or modify it. It's possible the slashes are confusing Beautiful Soup, so printing the results (see code in my post above on how to do this) might help you figure out what the recipe is seeing and where it's being confused.

More info would be needed to advise further.
Starson17 is offline