View Single Post
Old 08-06-2009, 10:11 AM   #638
jbambridge
Kindle DX
jbambridge began at the beginning.
 
Posts: 21
Karma: 10
Join Date: Aug 2009
Location: The Netherlands
Device: iPad and Kindle DX
One extra thought:

Checking:
Quote:
File "calibre\ebooks\oeb\base.pyo", line 917, in _parse_xhtml
in the source code shows that this is a part of the code that removes empty <a></a> tags. This is indeed the case on the example I gave where the publisher has left a strange link in the text.

Adding a
PHP Code:
remove_tags = [dict(name='a')] 
is a work around, although this also destroys valid <a> tags.

My PHP is not up to fixing the _parse_xhtml code myself though.

Can anyone suggest a better work around (that doesn't delete any valid content) or a fix to the PHP code?

John

P.S. I've attached the offending article as an example of the empty <a> tags. index.txt is after porcessing by the recipe and problem.txt is the original html file.
Attached Files
File Type: txt index.txt (6.0 KB, 241 views)
File Type: txt problem.txt (101.2 KB, 309 views)
jbambridge is offline