View Single Post
Old 01-03-2023, 11:46 PM   #26
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,509
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
heuristics work using regexps and these sometimes become very slow depending on markup. In this case it will be the detect_soft_breaks() function whose regex is
Code:
(?P<initline><(?P<outer>p|div)[^>]*>\s*(<(?P<inner1>font|span|[ibu])[^>]*>)?\s*(<(?P<inner2>font|span|[ibu])[^>]*>)?\s*(<(?P<inner3>font|span|[ibu])[^>]*>)?\s*\s*(?P<init_content>.*?)(</(?P=inner3)>)?\s*(</(?P=inner2)>)?\s*(</(?P=inner1)>)?\s*</(?P=outer)>)\s*<div[^>]*>\s*</div>\s*(?P<line_two><(?P<linetwo_ter>p|div)[^>]*>\s*(<(?P<linetwo_ner1>font|span|[ibu])[^>]*>)?\s*(<(?P<linetwo_ner2>font|span|[ibu])[^>]*>)?\s*(<(?P<linetwo_ner3>font|span|[ibu])[^>]*>)?\s*\s*(?P<line_two_content>.*?)(</(?P=linetwo_ner3)>)?\s*(</(?P=linetwo_ner2)>)?\s*(</(?P=linetwo_ner1)>)?\s*</(?P=linetwo_ter)>)

Last edited by kovidgoyal; 01-03-2023 at 11:55 PM.
kovidgoyal is offline   Reply With Quote