MobileRead Forums - View Single Post - KindleUnpack (MobiUnpack): Extracts text, images and metadata from Kindle/Mobi files

DiapDealer · 06-03-2012, 09:44 AM

I'm guessing it's some of my shoddy regex work at lines 70-73 of mobi_html.py.

I'm not entirely certain of Python's default lazy/greedy settings for it's regex engine, but it seems to me that something similar to (line 71):

Code:

link_pattern = re.compile(r'''<a(.*?)filepos=['"]{0,1}0*(\d+)['"]{0,1}(.*?)>''', re.IGNORECASE)

and consequently (line 73):

Code:

srctext = link_pattern.sub(r'''<a\1href="#filepos\2"\3>''', srctext)

might do the trick (unless that id attribute itself needs additional massaging, that is).

Spoiler:

06-03-2012, 09:44 AM	#382
DiapDealer Grand Sorcerer Posts: 28,707 Karma: 205039118 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD	I'm guessing it's some of my shoddy regex work at lines 70-73 of mobi_html.py. I'm not entirely certain of Python's default lazy/greedy settings for it's regex engine, but it seems to me that something similar to (line 71): Code: link_pattern = re.compile(r'''<a(.?)filepos=['"]{0,1}0(\d+)['"]{0,1}(.?)>''', re.IGNORECASE) and consequently (line 73): Code: srctext = link_pattern.sub(r'''<a\1href="#filepos\2"\3>''', srctext) might do the trick (unless that id attribute itself needs additional massaging, that is). Spoiler: Or blow it up... my regex-fu is not mighty. Last edited by DiapDealer; 06-03-2012 at 09:50 AM.*