View Single Post
Old 12-18-2016, 05:32 AM   #12
slowsmile
Witchman
slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.
 
Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
@Doitsu...Interesting what you say about Kindle. Their's is a proprietary format that is closely related to epub with some peculiar quirks. Similar to iBooks proprietary version of epub. You can do that if you are a mammoth company like those two.

Here's another piece of BS code for html that I've found very useful:

Spoiler:
Code:
    # remove all anchors but preserve 
    # all anchors with internet links    
    for m in soup.findAll('a'):
        if 'href="http:' in str(m) or \
           'href="https:' in str(m) or \
           'mailto:' in str(m) or \
           '@' in str(m):
            pass            
        else:
            m.replaceWithChildren()


In my conversion plugin, I've also noticed significant differences between ODF html rendered from OO and LO. One problem I had was clearing out all the myriad FONT, FACE and SIZE declarations in these two different ODF html versions.

I used this code to remove all SIZE = 3 attributes from the html because it was causing problems. Notice that OO uses an integer while LO uses a string numeric for the size value.

Spoiler:
Code:
    # remove all 'size = 3' font declarations from OO or LO html       
    for x in soup.findAll('font'):
       if x.has_attr('size'):
           if x['size'] == "3" or x['size'] == 3:
               x.replaceWithChildren()


Both Tidy and BS have saved my bacon on many occasions. They are both remarkably useful and easy to use for processing html.

Last edited by slowsmile; 12-18-2016 at 05:39 AM.
slowsmile is offline   Reply With Quote