05-22-2018, 07:25 AM | #16 |
creator of calibre
Posts: 43,778
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Code:
def preprocess_raw_html(self, raw_html, url): open('/path/to/tempfile.html', 'wb').write(raw_html.encode('utf-8')) return raw_html |
05-22-2018, 08:38 AM | #17 |
Big Poppa
Posts: 110
Karma: 10
Join Date: Jul 2010
Device: Nook
|
Yep, the raw HTML looks same as it does in my browser. There's a multiline comment in the head tag but the other six are just plain comments with one space inside generally. So it seems to be a bug somewhere in beautifulsoup for not parsing comments properly? (or are multiline comments in head not to spec?)
Either way this regex doesn't do the job Where is the final HTML generated? Just in the epub you mean? |
Advert | |
|
05-22-2018, 09:08 AM | #18 |
Big Poppa
Posts: 110
Karma: 10
Join Date: Jul 2010
Device: Nook
|
Through trial and error removing the head tag manually seems to fix it. Not sure if bug or just bad HTML on NYT part, but the multiline ascii art is what kills it.
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Copy custom tag to author tag | Lzyslckr | Library Management | 3 | 11-25-2017 02:48 PM |
Wondering if there is a way to remove end tag with beginning tag | LadyKate | Editor | 5 | 06-29-2016 04:32 PM |
suggestion: tag groups should use Calibre tag hierarchy | comox | Calibre Companion | 53 | 05-25-2015 07:22 PM |
Send tag to device only if tag has more than 1 book? | eosrose | Calibre | 0 | 01-29-2013 07:46 PM |
Adding an Owner tag to tag list? | Fangles | Library Management | 1 | 02-25-2011 02:32 AM |