MobileRead Forums - View Single Post - [GUI Plugin] Multi-Column Search

killo3967 · 11-07-2022, 12:22 PM

Hello

I am trying to extract the date on which the content of the book was created. This date is found in title.xhtml with this format:

<p class="ePUBfirma"><strong class="sans">Wolfman2408</strong> <code class="ePUBfecha sans">24.05.13</code></p>

I have used the following regular expression in a "TXT Query"

^(?!.*(\bfax\b|\bisbn\b|\blegal\b)).*((0?[1-9]|[123]\d)[-](0?[1-9]|1[012])[-]([1][9]|[2][0])?\d\d)|((0?[1-9]|[123]\d)[\/](0?[1-9]|1[012])[\/]([1][9]|[2][0])?\d\d)|((0?[1-9]|[123]\d)[\.](0?[1-9]|1[012])[\.]([1][9]|[2][0])?\d\d)$

It is so complex, because there are false positives when it finds numbers after the 'isbn', 'fax' or 'legal deposit' expressions, as well as in the numerical formats and their separators.

I have done checking the regular expression in pythex:

https://pythex.org/?regex=%5E(%3F!.*...ll=1&verbose=0

And it's works, inserting the date in a column called "generated".

The problem is that it takes almost 30 seconds per book, and when i select 40 or more books the program hangs.

Is this normal or is a fault in the regex or in the plugin?

Is there a way to format the output to replace the date separators '.' and '-' with '/' ?

Thank you

11-07-2022, 12:22 PM	#313
killo3967 Enthusiast Posts: 25 Karma: 10 Join Date: Sep 2020 Device: Kindle Paperwhite	TXT Query hangs on a big regex expression Hello I am trying to extract the date on which the content of the book was created. This date is found in title.xhtml with this format: <p class="ePUBfirma"><strong class="sans">Wolfman2408</strong> <code class="ePUBfecha sans">24.05.13</code></p> I have used the following regular expression in a "TXT Query" ^(?!.(\bfax\b\|\bisbn\b\|\blegal\b)).((0?[1-9]\|[123]\d)[-](0?[1-9]\|1[012])[-]([1][9]\|[2][0])?\d\d)\|((0?[1-9]\|[123]\d)[\/](0?[1-9]\|1[012])[\/]([1][9]\|[2][0])?\d\d)\|((0?[1-9]\|[123]\d)[\.](0?[1-9]\|1[012])[\.]([1][9]\|[2][0])?\d\d)$ It is so complex, because there are false positives when it finds numbers after the 'isbn', 'fax' or 'legal deposit' expressions, as well as in the numerical formats and their separators. I have done checking the regular expression in pythex: https://pythex.org/?regex=%5E(%3F!.*...ll=1&verbose=0 And it's works, inserting the date in a column called "generated". The problem is that it takes almost 30 seconds per book, and when i select 40 or more books the program hangs. Is this normal or is a fault in the regex or in the plugin? Is there a way to format the output to replace the date separators '.' and '-' with '/' ? Thank you