[GUI Plugin] Multi-Column Search - Page 21

igorius · 06-20-2022, 11:02 AM

Hello,
I installed MCS and when i tried to use it, i found that in certain parts of MCS the font is so small that i am not able to read it.
See the attached pic...
Is there any way to change that?
Thanks very much!

DaltonST · 07-11-2022, 03:15 AM

Version 1.0.89 - 14 April 2022 Qt6 Compatibility. Minimum Calibre Version 5.99.8.

applegaa · 08-19-2022, 12:01 AM

In my calibre library, I have two columns I use in "Lookup/Search Name [1]". They are #lastupdated and #lastread. #lastupdated is populated by FanFicFare using metadata pulled from the web site (usually RoyalRoad or ScribbleHub). #lastread is set by me manually in Calibre. My first search criteria is that #lastupdated > #lastread. However it doesn't always pick up all of the actual matches. For example, the story "Falling with Folded Wings" on RoyalRoad shows a #lastupdated date of "18 Aug 2022" and a #lastread date of "17 Aug 2022" but was not included in the search results. It only happens when those two dates are one day apart and even then only rarely. I don't think this can be a problem with my column definitions, but I don't have a clue what's going on. Both columns are of type Date with matching formats.

Oh, I should also mention that "Lookup/Search Name [2]" simply checks that "#lastread > 0" so it only shows stories I've started reading. I also have a Final Filter to make sure that a Yes/No flag column is set to TRUE. Those are the only things I've set within MCS. MCS normally works on the stories that it occasionally misses so I don't think the problem is other MCS settings either.

- Andrew

DaltonST · 08-21-2022, 11:19 AM

MCS uses the raw database value of dates in YYYYMMDD format in universal time (GMT) for datetime #custom columns for comparison. That will not change.

Since you have this issue due to your local time zone, simply search using '<=' instead of '<', and afterwards sort the final result rows shown in the Library View of Calibre 'descending' by the appropriate column.

Standard Calibre searches in local time if that is important to you.

DaltonST

applegaa · 08-21-2022, 06:08 PM

Thanks for the explanation. I don't think that '<=' is a good solution for me, as having false positives will happen more often than these false negatives, so I guess I'll just have them show up a day late on occasion. Thanks again for responding!

- Andrew

DaltonST · 10-30-2022, 04:30 PM

Version 1.0.90 - 30 October 2022 TXT Queries Tab: Add searching of full-text-search.db if a format of .txt does not already exist.

Many Fan Fiction users download their stories in TXT format, and also use the TXT Queries Tab option to update Custom Columns based on their Queries.

See the attached image.

Added: Examples of Custom Column "Types" that this Tab was designed to update based on their Queries is shown in another image below. So, any "textual" Custom Column may be used: Comma Separated (Like Tags); Ampersand Separated (Like Authors); Comments (Short Text or Long Text).

For simple textual values to be updated, and that do not need to be in the Tag Browser, choose "Comments Like Short Text (e.g. a Title).

To test your regular expressions before you try to update anything, I recommend: https://pythex.org/

DaltonST

killo3967 · 11-04-2022, 04:04 PM

Hi.

First, thanks for this plugin.

I'm doing a "text search" looking for the "epub revision" text which is inside the book in the possible formats:

ePUB v1.1
ePub v1.1
epub v1.1
ePUB r1.1
ePub r1.1
epub r1.1
EPUB v1.1
EPub v1.1
Epub v1.1
EPUB r1.1
EPub r1.1
Epub r1.1

I use this regular expression:

(e|E)(PUB|pub|Pub)\s(v|r)(\d\.\d)

And works fine.

But the problem is when i want to replace my custom column "#revision" with only the numbers and not all the text. I don´t know who to extract the capture group 4 (\d\.\d) and insert it, instead all the text.

Now i have "ePub r1.3" instead of "1.3"

Could you help me?

DaltonST · 11-04-2022, 05:02 PM

To test your regular expressions before you try to update anything, I recommend: https://pythex.org/

See the 2 images at the bottom of: https://www.mobileread.com/forums/sh...8&postcount=24

Read the ToolTips in the image of the MCS Tab.

In MCS in the TXT Query Tab, there are actually 3 different regexes possible (only #1 is required):

[#1] Your (e|E)(PUB|pub|Pub)\s(v|r)(\d\.\d) which is the TXT Query itself.

[#2] The "Filter Using Custom Column" regex, which you do NOT need here.

[#3] The "Update This Custom Column" regex, which looks at the text returned by your #1 above: "(e|E)(PUB|pub|Pub)\s(v|r)(\d\.\d)". So, #3 must take the text like "ePub r1.3" and extract only the "1.3".

Your #revision Custom Column must be textual, not numeric or any other Type. Refer to the 2nd image in https://www.mobileread.com/forums/sh...8&postcount=24

The answer to your question of "What regex #3 should I use?" is shown with its test case in an image below: [0-9.]+

Personally, I would not have used any capture groups since the MCS function looks at all of the returned text, and not any single group results. You cannot specify in MCS which group to use. All of it is always used, so your regex must not require a specific group be used. See the 3d image below for a simpler regex to use instead of your grouped #1. "EPUB [rv][0-9.]+" using IGNORECASE.

The MCS TXT Query regular expression function always compiles with IGNORECASE. Makes things much simpler. The Python regex compile is:
re.escape("\\")
p = re.compile(re_string, re.IGNORECASE|re.DOTALL|re.MULTILINE)
match = p.search(s_string)

Note that since it uses MULTILINE and DOTALL, in some cases it would be necessary to specify where the selected text ends, such as with a trailing \s* . Search the web for: "regular expressions how to stop selection of characters at new-line".

Also see the related image here: https://www.mobileread.com/forums/sh...21&postcount=2

dunhill · 11-04-2022, 08:21 PM

Well we already have the revision number. With this plugin.
The translator and the original Title with the Js+ plugin
Now I'm struggling with the edit date
06-02-2018

Thanks for these plugins and for sharing the experiences

DaltonST · 11-05-2022, 01:07 PM

Version 1.0.91 - 05 November 2022 TXT Queries Tab: improved ToolTips & miscellany.

DaltonST

killo3967 · 11-05-2022, 01:30 PM

Quote:

Originally Posted by DaltonST

To test your regular expressions before you try to update anything, I recommend: https://pythex.org/

See the 2 images at the bottom of: https://www.mobileread.com/forums/sh...8&postcount=24

Read the ToolTips in the image of the MCS Tab.

In MCS in the TXT Query Tab, there are actually 3 different regexes possible (only #1 is required):

[#1] Your (e|E)(PUB|pub|Pub)\s(v|r)(\d\.\d) which is the TXT Query itself.

[#2] The "Filter Using Custom Column" regex, which you do NOT need here.

[#3] The "Update This Custom Column" regex, which looks at the text returned by your #1 above: "(e|E)(PUB|pub|Pub)\s(v|r)(\d\.\d)". So, #3 must take the text like "ePub r1.3" and extract only the "1.3".

Your #revision Custom Column must be textual, not numeric or any other Type. Refer to the 2nd image in https://www.mobileread.com/forums/sh...8&postcount=24

The answer to your question of "What regex #3 should I use?" is shown with its test case in an image below: [0-9.]+

Personally, I would not have used any capture groups since the MCS function looks at all of the returned text, and not any single group results. You cannot specify in MCS which group to use. All of it is always used, so your regex must not require a specific group be used. See the 3d image below for a simpler regex to use instead of your grouped #1. "EPUB [rv][0-9.]+" using IGNORECASE.

The MCS TXT Query regular expression function always compiles with IGNORECASE. Makes things much simpler. The Python regex compile is:
re.escape("\\")
p = re.compile(re_string, re.IGNORECASE|re.DOTALL|re.MULTILINE)
match = p.search(s_string)

Note that since it uses MULTILINE and DOTALL, in some cases it would be necessary to specify where the selected text ends, such as with a trailing \s* . Search the web for: "regular expressions how to stop selection of characters at new-line"

Thank you very much.
Your answer was very enlightening and I have already seen my problem, using regular expressions. I come from .net and in python they are slightly different.

DaltonST · 11-05-2022, 03:56 PM

https://www.regular-expressions.info/dot.html

.NET regexes are often not compatible with Python, Perl, etc. due to greediness and end-of-line differences. Calibre uses the Python engine for regular expressions.

The article has some interesting tips.

DaltonST

killo3967 · 11-07-2022, 12:22 PM

Hello

I am trying to extract the date on which the content of the book was created. This date is found in title.xhtml with this format:

Wolfman2408 <code class="ePUBfecha sans">24.05.13</code>

I have used the following regular expression in a "TXT Query"

^(?!.*(\bfax\b|\bisbn\b|\blegal\b)).*((0?[1-9]|[123]\d)[-](0?[1-9]|1[012])[-]([1][9]|[2][0])?\d\d)|((0?[1-9]|[123]\d)[\/](0?[1-9]|1[012])[\/]([1][9]|[2][0])?\d\d)|((0?[1-9]|[123]\d)[\.](0?[1-9]|1[012])[\.]([1][9]|[2][0])?\d\d)$

It is so complex, because there are false positives when it finds numbers after the 'isbn', 'fax' or 'legal deposit' expressions, as well as in the numerical formats and their separators.

I have done checking the regular expression in pythex:

https://pythex.org/?regex=%5E(%3F!.*...ll=1&verbose=0

And it's works, inserting the date in a column called "generated".

The problem is that it takes almost 30 seconds per book, and when i select 40 or more books the program hangs.

Is this normal or is a fault in the regex or in the plugin?

Is there a way to format the output to replace the date separators '.' and '-' with '/' ?

Thank you

DaltonST · 11-07-2022, 12:42 PM

The text in the FTS db column being searched is not html. Text.

Your regex is highly complex, likely causing endless repetitive backtracking by the re module in Python. Hence, it will be slow.

Remember too that Python re's quantifiers are 'greedy' by default.

See the tooltips re: IGNORECASE, MULTILINE and DOTALL.

You are responsible for all formatting using standard Calibre tools.

DaltonST

DaltonST · 11-08-2022, 10:33 AM

Version 1.0.92 - 08 November 2022 TXT Queries Tab: improved ToolTips; improved GUI responsiveness while executing searches with highly complex regular expressions and long books.

DaltonST

06-20-2022, 11:02 AM	#301
igorius Zealot Posts: 114 Karma: 34 Join Date: Jun 2015 Device: ipad & inkpad X	Hello, I installed MCS and when i tried to use it, i found that in certain parts of MCS the font is so small that i am not able to read it. See the attached pic... Is there any way to change that? Thanks very much! Attached Thumbnails

07-11-2022, 03:15 AM	#302
DaltonST Deviser Posts: 2,265 Karma: 2090983 Join Date: Aug 2013 Location: Texas Device: none	Version 1.0.89 Version 1.0.89 - 14 April 2022 Qt6 Compatibility. Minimum Calibre Version 5.99.8.

08-21-2022, 11:19 AM	#304
DaltonST Deviser Posts: 2,265 Karma: 2090983 Join Date: Aug 2013 Location: Texas Device: none	YYYYMMDD format in universal time (GMT) MCS uses the raw database value of dates in YYYYMMDD format in universal time (GMT) for datetime #custom columns for comparison. That will not change. Since you have this issue due to your local time zone, simply search using '<=' instead of '<', and afterwards sort the final result rows shown in the Library View of Calibre 'descending' by the appropriate column. Standard Calibre searches in local time if that is important to you. DaltonST

10-30-2022, 04:30 PM	#306
DaltonST Deviser Posts: 2,265 Karma: 2090983 Join Date: Aug 2013 Location: Texas Device: none	Version 1.0.90 TXT Queries Tab: Add searching of Calibre's FTS DB Version 1.0.90 - 30 October 2022 TXT Queries Tab: Add searching of full-text-search.db if a format of .txt does not already exist. Many Fan Fiction users download their stories in TXT format, and also use the TXT Queries Tab option to update Custom Columns based on their Queries. See the attached image. Added: Examples of Custom Column "Types" that this Tab was designed to update based on their Queries is shown in another image below. So, any "textual" Custom Column may be used: Comma Separated (Like Tags); Ampersand Separated (Like Authors); Comments (Short Text or Long Text). For simple textual values to be updated, and that do not need to be in the Tag Browser, choose "Comments Like Short Text (e.g. a Title). To test your regular expressions before you try to update anything, I recommend: https://pythex.org/ DaltonST Attached Thumbnails Last edited by DaltonST; 11-03-2022 at 04:50 PM. Reason: More info

11-04-2022, 04:04 PM	#307
killo3967 Member Posts: 22 Karma: 10 Join Date: Sep 2020 Device: Kindle Paperwhite	Select capture group from Text Search Hi. First, thanks for this plugin. I'm doing a "text search" looking for the "epub revision" text which is inside the book in the possible formats: ePUB v1.1 ePub v1.1 epub v1.1 ePUB r1.1 ePub r1.1 epub r1.1 EPUB v1.1 EPub v1.1 Epub v1.1 EPUB r1.1 EPub r1.1 Epub r1.1 I use this regular expression: (e\|E)(PUB\|pub\|Pub)\s(v\|r)(\d\.\d) And works fine. But the problem is when i want to replace my custom column "#revision" with only the numbers and not all the text. I don´t know who to extract the capture group 4 (\d\.\d) and insert it, instead all the text. Now i have "ePub r1.3" instead of "1.3" Could you help me?

08-19-2022, 12:01 AM	#303
applegaa Connoisseur Posts: 51 Karma: 10 Join Date: Apr 2019 Device: none	In my calibre library, I have two columns I use in "Lookup/Search Name [1]". They are #lastupdated and #lastread. #lastupdated is populated by FanFicFare using metadata pulled from the web site (usually RoyalRoad or ScribbleHub). #lastread is set by me manually in Calibre. My first search criteria is that #lastupdated > #lastread. However it doesn't always pick up all of the actual matches. For example, the story "Falling with Folded Wings" on RoyalRoad shows a #lastupdated date of "18 Aug 2022" and a #lastread date of "17 Aug 2022" but was not included in the search results. It only happens when those two dates are one day apart and even then only rarely. I don't think this can be a problem with my column definitions, but I don't have a clue what's going on. Both columns are of type Date with matching formats. Oh, I should also mention that "Lookup/Search Name [2]" simply checks that "#lastread > 0" so it only shows stories I've started reading. I also have a Final Filter to make sure that a Yes/No flag column is set to TRUE. Those are the only things I've set within MCS. MCS normally works on the stories that it occasionally misses so I don't think the problem is other MCS settings either. - Andrew

08-21-2022, 06:08 PM	#305
applegaa Connoisseur Posts: 51 Karma: 10 Join Date: Apr 2019 Device: none	Thanks for the explanation. I don't think that '<=' is a good solution for me, as having false positives will happen more often than these false negatives, so I guess I'll just have them show up a day late on occasion. Thanks again for responding! - Andrew

11-04-2022, 05:02 PM	#308
DaltonST Deviser Posts: 2,265 Karma: 2090983 Join Date: Aug 2013 Location: Texas Device: none	3 different regexes possible (only #1 is required) To test your regular expressions before you try to update anything, I recommend: https://pythex.org/ See the 2 images at the bottom of: https://www.mobileread.com/forums/sh...8&postcount=24 Read the ToolTips in the image of the MCS Tab. In MCS in the TXT Query Tab, there are actually 3 different regexes possible (only #1 is required): [#1] Your (e\|E)(PUB\|pub\|Pub)\s(v\|r)(\d\.\d) which is the TXT Query itself. [#2] The "Filter Using Custom Column" regex, which you do NOT need here. [#3] The "Update This Custom Column" regex, which looks at the text returned by your #1 above: "(e\|E)(PUB\|pub\|Pub)\s(v\|r)(\d\.\d)". So, #3 must take the text like "ePub r1.3" and extract only the "1.3". Your #revision Custom Column must be textual, not numeric or any other Type. Refer to the 2nd image in https://www.mobileread.com/forums/sh...8&postcount=24 The answer to your question of "What regex #3 should I use?" is shown with its test case in an image below: [0-9.]+ Personally, I would not have used any capture groups since the MCS function looks at all of the returned text, and not any single group results. You cannot specify in MCS which group to use. All of it is always used, so your regex must not require a specific group be used. See the 3d image below for a simpler regex to use instead of your grouped #1. "EPUB [rv][0-9.]+" using IGNORECASE. The MCS TXT Query regular expression function always compiles with IGNORECASE. Makes things much simpler. The Python regex compile is: re.escape("\\") p = re.compile(re_string, re.IGNORECASE\|re.DOTALL\|re.MULTILINE) match = p.search(s_string) Note that since it uses MULTILINE and DOTALL, in some cases it would be necessary to specify where the selected text ends, such as with a trailing \s* . Search the web for: "regular expressions how to stop selection of characters at new-line". Also see the related image here: https://www.mobileread.com/forums/sh...21&postcount=2 Attached Thumbnails Last edited by DaltonST; 11-25-2022 at 12:04 PM. Reason: added new link to image

11-04-2022, 08:21 PM	#309
dunhill Guru Posts: 724 Karma: 228142 Join Date: Sep 2017 Location: Argentina Device: moon+ reader, kindle paperwhite	Well we already have the revision number. With this plugin. The translator and the original Title with the Js+ plugin Now I'm struggling with the edit date <span class="tdate">06-02-2018</span></p> Thanks for these plugins and for sharing the experiences

11-05-2022, 01:07 PM	#310
DaltonST Deviser Posts: 2,265 Karma: 2090983 Join Date: Aug 2013 Location: Texas Device: none	Version 1.0.91 TXT Queries Tab: improved ToolTips & miscellany Version 1.0.91 - 05 November 2022 TXT Queries Tab: improved ToolTips & miscellany. DaltonST

11-05-2022, 03:56 PM	#312
DaltonST Deviser Posts: 2,265 Karma: 2090983 Join Date: Aug 2013 Location: Texas Device: none	https://www.regular-expressions.info/dot.html .NET regexes are often not compatible with Python, Perl, etc. due to greediness and end-of-line differences. Calibre uses the Python engine for regular expressions. The article has some interesting tips. DaltonST

11-07-2022, 12:22 PM	#313
killo3967 Member Posts: 22 Karma: 10 Join Date: Sep 2020 Device: Kindle Paperwhite	TXT Query hangs on a big regex expression Hello I am trying to extract the date on which the content of the book was created. This date is found in title.xhtml with this format: <p class="ePUBfirma"><strong class="sans">Wolfman2408</strong> <code class="ePUBfecha sans">24.05.13</code></p> I have used the following regular expression in a "TXT Query" ^(?!.(\bfax\b\|\bisbn\b\|\blegal\b)).((0?[1-9]\|[123]\d)[-](0?[1-9]\|1[012])[-]([1][9]\|[2][0])?\d\d)\|((0?[1-9]\|[123]\d)[\/](0?[1-9]\|1[012])[\/]([1][9]\|[2][0])?\d\d)\|((0?[1-9]\|[123]\d)[\.](0?[1-9]\|1[012])[\.]([1][9]\|[2][0])?\d\d)$ It is so complex, because there are false positives when it finds numbers after the 'isbn', 'fax' or 'legal deposit' expressions, as well as in the numerical formats and their separators. I have done checking the regular expression in pythex: https://pythex.org/?regex=%5E(%3F!.*...ll=1&verbose=0 And it's works, inserting the date in a column called "generated". The problem is that it takes almost 30 seconds per book, and when i select 40 or more books the program hangs. Is this normal or is a fault in the regex or in the plugin? Is there a way to format the output to replace the date separators '.' and '-' with '/' ? Thank you

11-07-2022, 12:42 PM	#314
DaltonST Deviser Posts: 2,265 Karma: 2090983 Join Date: Aug 2013 Location: Texas Device: none	The text in the FTS db column being searched is not html. Text. Your regex is highly complex, likely causing endless repetitive backtracking by the re module in Python. Hence, it will be slow. Remember too that Python re's quantifiers are 'greedy' by default. See the tooltips re: IGNORECASE, MULTILINE and DOTALL. You are responsible for all formatting using standard Calibre tools. DaltonST Last edited by DaltonST; 11-07-2022 at 01:26 PM.

11-08-2022, 10:33 AM	#315
DaltonST Deviser Posts: 2,265 Karma: 2090983 Join Date: Aug 2013 Location: Texas Device: none	Version 1.0.92 TXT Queries Tab Improvements Version 1.0.92 - 08 November 2022 TXT Queries Tab: improved ToolTips; improved GUI responsiveness while executing searches with highly complex regular expressions and long books. DaltonST

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
[GUI Plugin] Clipboard Search	kiwidude	Plugins	29	04-02-2024 10:05 PM
[GUI Plugin] Search the Internet	kiwidude	Plugins	433	04-01-2024 05:48 PM
[GUI Plugin] Walk Search History	kiwidude	Plugins	38	03-17-2024 12:47 AM
[GUI Plugin] Recoll Full Text Search	Satas	Plugins	16	08-05-2016 03:54 AM
[GUI Plugin] Full Text Search (SOLR)	peterpisljar	Plugins	2	08-09-2015 08:16 AM

Advert

Advert