Thread: HTML5 parsing
View Single Post
Old 08-09-2012, 01:08 AM   #8
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,378
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I can add that to calibre for the next release. Just to make sure I get it right, the patch needed is:

Code:
=== modified file 'src/calibre/ebooks/BeautifulSoup.py'
--- src/calibre/ebooks/BeautifulSoup.py 2010-04-17 16:37:28 +0000
+++ src/calibre/ebooks/BeautifulSoup.py 2012-08-09 05:06:42 +0000
@@ -1454,7 +1454,8 @@
     #According to the HTML standard, these block tags can contain
     #another tag of the same type. Furthermore, it's common
     #to actually use these tags this way.
-    NESTABLE_BLOCK_TAGS = ['blockquote', 'div', 'fieldset', 'ins', 'del']
+    NESTABLE_BLOCK_TAGS = ['blockquote', 'div', 'fieldset', 'ins', 'del',
+            'article', 'aside', 'header', 'footer', 'nav', 'figcaption', 'figure', 'section']
 
     #Lists can contain other lists, but there are restrictions.
     NESTABLE_LIST_TAGS = { 'ol' : [],
kovidgoyal is offline   Reply With Quote