One extra thought:
Checking:
Quote:
File "calibre\ebooks\oeb\base.pyo", line 917, in _parse_xhtml
|
in the source code shows that this is a part of the code that removes empty <a></a> tags. This is indeed the case on the example I gave where the publisher has left a strange link in the text.
Adding a
PHP Code:
remove_tags = [dict(name='a')]
is a work around, although this also destroys valid <a> tags.
My PHP is not up to fixing the _parse_xhtml code myself though.
Can anyone suggest a better work around (that doesn't delete any valid content) or a fix to the PHP code?
John
P.S. I've attached the offending article as an example of the empty <a> tags. index.txt is after porcessing by the recipe and problem.txt is the original html file.