View Single Post
Old 11-07-2022, 12:22 PM   #313
killo3967
Enthusiast
killo3967 began at the beginning.
 
Posts: 25
Karma: 10
Join Date: Sep 2020
Device: Kindle Paperwhite
TXT Query hangs on a big regex expression

Hello

I am trying to extract the date on which the content of the book was created. This date is found in title.xhtml with this format:

<p class="ePUBfirma"><strong class="sans">Wolfman2408</strong> <code class="ePUBfecha sans">24.05.13</code></p>

I have used the following regular expression in a "TXT Query"

^(?!.*(\bfax\b|\bisbn\b|\blegal\b)).*((0?[1-9]|[123]\d)[-](0?[1-9]|1[012])[-]([1][9]|[2][0])?\d\d)|((0?[1-9]|[123]\d)[\/](0?[1-9]|1[012])[\/]([1][9]|[2][0])?\d\d)|((0?[1-9]|[123]\d)[\.](0?[1-9]|1[012])[\.]([1][9]|[2][0])?\d\d)$

It is so complex, because there are false positives when it finds numbers after the 'isbn', 'fax' or 'legal deposit' expressions, as well as in the numerical formats and their separators.

I have done checking the regular expression in pythex:

https://pythex.org/?regex=%5E(%3F!.*...ll=1&verbose=0

And it's works, inserting the date in a column called "generated".

The problem is that it takes almost 30 seconds per book, and when i select 40 or more books the program hangs.

Is this normal or is a fault in the regex or in the plugin?

Is there a way to format the output to replace the date separators '.' and '-' with '/' ?


Thank you
killo3967 is offline   Reply With Quote