View Single Post
Old 06-13-2015, 11:17 AM   #1
DaltonST
Deviser
DaltonST is the One.DaltonST is the One.DaltonST is the One.DaltonST is the One.DaltonST is the One.DaltonST is the One.DaltonST is the One.DaltonST is the One.DaltonST is the One.DaltonST is the One.DaltonST is the One.
 
DaltonST's Avatar
 
Posts: 1,479
Karma: 101298
Join Date: Aug 2013
Location: Texas
Device: 8" Win10 Tablet w/Calibre64
[GUI Plugin] Multi-Column Search

[GUI Plugin] Multi-Column Search


Summary:

Advanced Search Tools: Inter-Book; Intra-Book; Cross-Library; Full-Text; Word-by-Book Index; Native SQL Queries; Cross-Library Duplicate Books Search; and more.

Search Across Multiple Columns Interactively, Collectively and Simultaneously. Designed to provide capabilities not found in standard Calibre, such as comparing multiple columns directly, plus searching across Libraries. Performs both 'Intra-Book' and 'Inter-Book' searches. Performs additional specialized searches. Considers the text of TXT formats as a 'virtual column', so performs 'full-text' searches of that text. Performs 'word-book index' searches. Special "Search Comments" results viewer. Cross-Library Duplicate Books Search Using Custom Columns and/or Standard Columns. Metadata Variations. Duplicate values for any Custom Column.

Definitions, Usage, Examples, & Scope:
Spoiler:
  • 'Intra-Book' Search: Compare the columns within a single book.
  • 'Inter-Book' Search: Compare the columns of each book to the same columns in all other books in the same Library.

Usage: MCS is designed to do things that Standard Calibre cannot do. If Standard Calibre can do it, then use Standard Calibre.

Example: Intra-Book query: (#int_cc1 > #int_cc2) AND (title CONTAINS series) in the Current Library. Also show all of the books by all of the Authors who are represented in the query results.

Example: Intra-Book query: (#int_cc1 > #int_cc2) AND (title CONTAINS series) in Another Library that is not the Current Library.

Example: Inter-Book query: Find all books that have the same Author and the same value in a particular Custom Column without having to specify either the Author or the value of the Custom Column. Let MCS do all of the work by comparing each book to every other book in the Current Library. Also show all of the books by all of the Authors who are represented in the query results.

Scope:
Intra-Book queries: The 8 Standard Columns listed below plus any and all Custom Columns are supported:
Spoiler:
  • custom columns (Any) (#........)
  • authors (but 'author' will work too)
  • title
  • series
  • tags
  • publishers
  • comments
  • pubdate (but 'published' will work too)
  • path



Scope: Inter-Book queries: The 3 Standard Columns listed below plus any and all Custom Columns are supported:
Spoiler:

  • custom columns (Any) (#........)
  • authors (but 'author' will work too)
  • title
  • series




Duplicate Custom Column Values :
Spoiler:
See the attached example image of a Raw SQL Query crafted to find Custom Column values that are duplicates. The basic SQL query (that you must customize for your own Custom Column names) is:
Code:
SELECT book  FROM books_custom_column_27_link 
 WHERE value IN
 ( SELECT value 
 FROM books_custom_column_27_link 
 GROUP BY value 
 HAVING COUNT(*) > 1 )




Full Text Search & Filename Search Queries Comments:



Metadata Variation Queries:
Spoiler:

See the attached example image of a Raw SQL Query crafted to find Author metadata variations. This example can easily be changed for other metadata by modifying the basic Query, shown below.
Code:
SELECT book FROM _mcs_authors_by_book,authors WHERE authorname <> authors.name 
 AND SIMILARTO(authorname,authors.name) > 0.85
Caution: if you have a large library (e.g. 50,000 books) with many Authors (e.g. 15,000), then 15,000 Authors will have to be checked against every other Author (14,999) for every book (50,000). That is (15,000 times 14,999 times 50,000) checks. That will take a very long time unless you add some SQL to limit the number of books selected per search. You could limit by one or more metadata attributes to do so. The limitation should be placed immediately after the WHERE keyword to increase performance.



Book Awards Table:
Spoiler:

If you import the attached '_mcs_book_awards' table into your metadata.db, you will be able to use the SQL Query Tab of MCS to identify matching books. You then can then, for example, add a Tag to those books with the name of the Book Award.

I recommend using the Firefox SQL Manager to import the SQL file that is attached, which will also create the table.

There is also a CSV file if you prefer to use that, although you will have to manually create the table with this SQL:

Quote:
CREATE TABLE "_mcs_book_awards" (id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL , author TEXT NOT NULL , title TEXT NOT NULL , award TEXT NOT NULL, UNIQUE(author,title,award) )
Example SQL to use within the SQL Query Tab of MCS:

Quote:
SELECT book FROM _mcs_authors_by_book WHERE authorname IN (SELECT author FROM _mcs_book_awards) AND book IN (SELECT id FROM books WHERE id = book AND title IN (SELECT title FROM _mcs_book_awards WHERE award LIKE '%%' ))
The '%%' will select all books in your library matching the table. To narrow the search to, for example, the Hugo Award, simply change the '%%' to '%hugo%' in order to select all books that have been awarded the Hugo Award. Then, you can add a Tag of "Hugo Award" if you wish.

Obviously, your Author Names and Book Titles have to exactly match the table, which was populated from Wikipedia, GoodReads, and other sources. For that reason, you should probably use this SQL that is a little slower than the above, although optimized for speed:
Quote:
SELECT book FROM _mcs_authors_by_book,_mcs_book_awards WHERE award LIKE '%hugo%' AND substr(authorname,1,3) = substr(author,1,3) AND SIMILARTO(authorname,author) > 0.80 AND _mcs_authors_by_book.book IN ( SELECT id FROM books WHERE books.id = _mcs_authors_by_book.book AND SIMILARTO(books.title,_mcs_book_awards.title) > 0.80)
The "SIMILARTO" function used above in "SIMILARTO(authorname,author) > 0.80" is custom to MCS, and you will not find it in the SQLite documentation. It only works for textual columns, obviously. Also, table _mcs_authors_by_book is an MCS Search Accelerator table. Its columns are 'book' and 'authorname'.

n.b. There is also an MCS Search Accelerator table for Tags, _mcs_tags_by_book, which you can use for other purposes. Its columns are 'book' and 'tagname'.



Instructions:
Refer to the attached User Guide.

Requires Minimum Calibre Version: 2.30

Internal Name: MultiColumnSearch

OS Temporary Files:
Spoiler:

If you have personalized your "temporary" directories, such as using a RAMDisk for your temporary files, and/or have set CALIBRE_TMP_DIR, you must set certain standard environment variables such that Python finds and uses the one that you want Python to find and use. From the Python documentation:

Spoiler:

Quote:
The default directory is chosen from a platform-dependent list, but the user of the application can control the directory location by setting the TMPDIR, TEMP or TMP environment variables

Python searches a standard list of directories to find one which the calling user can create files in. The list is:

  • The directory named by the TMPDIR environment variable.
  • The directory named by the TEMP environment variable.
  • The directory named by the TMP environment variable.
  • A platform-specific location:
    On Windows, the directories C:\TEMP, C:\TMP, \TEMP, and \TMP, in that order.
  • On all other platforms, the directories /tmp, /var/tmp, and /usr/tmp, in that order.
  • As a last resort, the current working directory.


Use of MCS with Other Plug-ins:
Spoiler:
  • If you use a Custom Column for "Original Title" or "Translator", and MCS for Inter-Book Searches for one of those Custom Columns, then you might be interested in my Job Spy/JS+ GUI Tool "Extract Original Title/Translator to Update Custom Columns (EPUBs)".
  • If you use my CALM - Consolidate All Library Metadata plug-in, I suggest that you might also benefit from installing this MCS plug-in so you may directly compare any and all of the Standard and Custom Columns consolidated into the Target Library, which Standard Calibre cannot do.
  • If you use my QuarantineAndScrub add-on, I suggest that you might also benefit from installing this MCS plug-in so you may directly compare the #work_... columns to the "real" columns, which Standard Calibre cannot do.



Version History:
Spoiler:
Version 1.0.74 - 13 October 2017 Full-Text Search enhancements for Tag-like Custom Columns to be updated by merging existing values with the new final search value.
Version 1.0.73 - 12 October 2017 Configuration preferences change.
Version 1.0.72 - 11 October 2017 Major Full-Text Search Enhancements. Option to additionally compare the initial search results to the value in a specified Custom Column, optionally using a Regular Expression to filter the initial search results. Another new option to update a Custom Column using the initial search results after further refining the search results text to be updated by using a Regular expression. These options may be used separately or simultaneously. Refer to the ToolTips MCS.

Version History-Archive:
Spoiler:

Version 1.0.71 - 05 May 2017 Allow themes with user-defined icons.
Version 1.0.70 - 13 April 2017 Changes to way null standard column values are handled in CL queries
Version 1.0.69 - 25 December 2016 Ignore corrupt UTF8 text in Cross-Library Duplicate Books Search
Version 1.0.68 - 03 August 2016 Cross-Library Duplicate Books Search with Current Library using Custom Columns and/or Standard Columns
Version 1.0.67 - 30 July 2016 Allow () in a library path.
Version 1.0.66 - 29 July 2016 Cross-Library Duplicate Books Search Using Custom Columns and/or Standard Columns
Version 1.0.64 - 07 June 2016 Miscellany.
Version 1.0.63 - 02 June 2016 Final Filters: Added a new operator, "REGEX".
Version 1.0.62 - 22 May 2016 Miscellany.
Version 1.0.61 - 16 May 2016 Miscellany.
Version 1.0.60 - 15 May 2016 Changes for Favourites plug-in.
Version 1.0.59 - 06 May 2016 New special results dialog for Intra-Book searches of Comments for a specific string.
Version 1.0.58 - 11 March 2016 SQL Tab user-defined function added: SIMILARTO.
Version 1.0.57 - 08 March 2016 Miscellaneous tweaks.
Version 1.0.56 - 07 March 2016 "Special Queries" enhanced to include more simultaneous comparisons.
Version 1.0.55 - 05 March 2016 Word-by-Book Index Queries: new 'NOT' operator, '!', to use at the beginning of a word when the 'bar' ('|') means 'AND' not 'OR'.

Version 1.0.54 - 04 March 2016 Miscellaneous tweaks.
Version 1.0.53 - 03 March 2016 Miscellaneous tweaks.
Version 1.0.52 - 02 March 2016 Miscellaneous tweaks.
Version 1.0.51 - 02 March 2016 Technical changes when switching libraries to ensure the Word-Book Index gets created in the 'new' library.
Version 1.0.50 - 01 March 2016 New: Word Queries using a book-word index; Revised User Instructions.
Version 1.0.49 - 20 February 2016 User Instructions revised and expanded.
Version 1.0.48 - 19 February 2016 Miscellaneous enhancements.
Version 1.0.47 - 18 February 2016 Miscellaneous enhancements.
Version 1.0.46 - 18 February 2016 Miscellaneous enhancements.
Version 1.0.45 - 17 February 2016 New: Full-Text Search of Books with TXT Formats using Regular Expressions.
Version 1.0.44 - 16 February 2016 New: Raw SELECT SQL Queries can now use the SQLite Regular Expression keyword REGEXP.
Version 1.0.43 - 12 February 2016 New: Raw SELECT SQL Query Tab
Version 1.0.42 - 02 February 2016 Special Queries now supports Series-Like Custom Columns in addition to all Text Custom Columns.
Version 1.0.41 - 01 February 2016 Minimize Button added to MCS dialog.
Version 1.0.40 - 28 January 2016 Special Queries now include as choices all Custom Columns with a datatype of "text".
Version 1.0.39 - 28 January 2016 Special Query execution now confirms that Final Filters passed validation the last time they were saved; otherwise, execution is canceled.
Version 1.0.38 - 27 January 2016 Enhancements to Final Filter validations and ToolTips.
Version 1.0.37 - 27 January 2016 Miscellaneous changes.
Version 1.0.36 - 26 January 2016 New: 'Special Queries' Tab (refer to the ToolTips for details); Bug Fix in Final Filters evaluation logic.
Version 1.0.35 - 20 January 2016 Miscellaneous Final Filter logic improvements.
Version 1.0.34 - 11 January 2016 New Search Option: 'Final Filters'. This will apply a maximum of 10 additional filters at the very end to all books that would otherwise have been displayed.
Version 1.0.33 - 5 January 2016 New Search Option: 'Show All Authors' Books'. This will also mark and display all books from the authors the search has otherwise found.
Version 1.0.32 - 3 January 2016 Improved Cancellation of Inter-Book Searches.
Version 1.0.31 - 29 December 2015 Autocompletion of Standard Column Names and Custom Column Names.
Version 1.0.30 - 28 December 2015 Technical tweaks.
Version 1.0.29 - 13 December 2015 New: MCS Search Accelerator Functionality for search criteria involving Tags. This functionality is automatic and routine. The user has to do nothing to use it.
Version 1.0.28 - 12 December 2015 New: MCS Search Accelerator Functionality for search criteria involving Authors. This functionality is automatic and routine. The user has to do nothing special to use it.
Version 1.0.27 - 12 December 2015 Inter-Book Search regression error fixed.
Version 1.0.26 - 12 December 2015 Major performance enhancements, especially if your queries do NOT need to analyze Authors or Tags. That is because a single book may have many Authors and many Tags.
Version 1.0.25 - 11 December 2015 New: Progress Indicator and ability to Cancel an MCS Search when the number of selected books > 149. Performance enhancements.
Version 1.0.24 - 10 December 2015 New: Double-Metaphone 'Sounds-Like' Fuzzy Equality Comparison Function.
Version 1.0.23 - 07 December 2015 User Instructions Revised.
Version 1.0.22 - 06 December 2015 Additional New 'Fuzzy Logic' Comparison Functions. Refer to the attached examples and also the ToolTips for an explanation.
Version 1.0.21 - 05 December 2015 New 'Fuzzy Logic' Comparison Functions. Refer to the attached examples and also the ToolTips for an explanation.
Version 1.0.20 - 29 November 2015 New 'Comparison Transform Function': 'Compare as: Decomposed & Normalized Alphabet'. This allows searching for non-English metadata that may or may not have been mangled into something English-like. Basically, it changes complex letters into simple letters, removes diacritics, removes spaces, and removes punctuation.
Version 1.0.19 - 27 November 2015 Addition of 'Comparison Transform Functions' that are applied to the values to be compared prior to their comparison.
Version 1.0.18 - 09 November 2015 Major enhancement to support the Post-Search updating of a single Custom Column for books that are both 'Marked' and 'Selected'.
Version 1.0.17 - 06 November 2015 Technical tweaks.
Version 1.0.16 - 05 November 2015 Major enhancement to support Inter-Book queries. Prior versions implicitly supported only Intra-Book queries.
Version 1.0.15 - 21 September 2015 Technical tweak.
Version 1.0.14 - 3 September 2015 New: ToolTips for key fields; Improved Error Messages when an invalid column choice is made.
Version 1.0.13 - 31 August 2015 Technical tweaks.
Version 1.0.12 - 24 June 2015 New: Cross-Library Search Results can be copied to the Clipboard as tab-delimited text for pasting "special" into a spreadsheet or normal pasting into any text document.
Version 1.0.11 - 23 June 2015 Various Enhancements; Revised User Guide.Version 1.0.10 - 23 June 2015 Cross-Library Search Results enhanced.
Version 1.0.9 - 21 June 2015 New: Regular Expressions; Revised User Guide.
Version 1.0.8 - 19 June 2015 New: Cross-Library Searches; Revised User Guide.
Version 1.0.7 - 17 June 2015 Regression error fixed.
Version 1.0.6 - 16 June 2015 Revised User Guide; MCS now validates User Criteria, and auto-corrects it if possible.
Version 1.0.5 - 16 June 2015 Many tweaks.
Version 1.0.4 - 15 June 2015 Added a "Swap Criteria" Button.
Version 1.0.3 - 15 June 2015 Improved handling of blanks, integers, floats and datetime values manually specified as Search Text.
Version 1.0.2 - 14 June 2015 Miscellany
Version 1.0.1 - 13 June 2015 Technical Changes
Version 1.0.0 - 13 June 2015 Initial Release

Attached Thumbnails
Click image for larger version

Name:	mcs_decomposed_and_normalized_alphabet_example.jpg
Views:	1532
Size:	912.9 KB
ID:	144171   Click image for larger version

Name:	mcs_fuzzy_equality_token_set_ratio_example.JPG
Views:	1135
Size:	274.1 KB
ID:	144328   Click image for larger version

Name:	mcs_cross_library_search_using_fuzzy_equality_function_example.JPG
Views:	930
Size:	564.3 KB
ID:	144374   Click image for larger version

Name:	mcs_doublemetaphone_sounds_like_example.jpg
Views:	834
Size:	263.2 KB
ID:	144492   Click image for larger version

Name:	mcs_raw_select_sql_tab.JPG
Views:	746
Size:	279.1 KB
ID:	146256   Click image for larger version

Name:	mcs_word_book_index_queries_tab.JPG
Views:	639
Size:	130.9 KB
ID:	146810   Click image for larger version

Name:	_mcs_book_awards_table.jpg
Views:	624
Size:	429.3 KB
ID:	147039   Click image for larger version

Name:	mcs_sql_query_template_for_book_awards.jpg
Views:	563
Size:	172.8 KB
ID:	147040   Click image for larger version

Name:	mcs_comments_results_viewer.jpg
Views:	509
Size:	735.2 KB
ID:	150311   Click image for larger version

Name:	AuthorMetadataVariationsExampleUsing MCS.jpg
Views:	366
Size:	438.0 KB
ID:	154795   Click image for larger version

Name:	mcs_sql_search_duplicates_in_single_custom_column.jpg
Views:	370
Size:	1.01 MB
ID:	159218   Click image for larger version

Name:	mcs_full_text_search_using_optional_filtering_and_cc_updating.jpg
Views:	317
Size:	1.48 MB
ID:	159405  
Attached Files
File Type: pdf mcs_instructions.pdf (2.65 MB, 644 views)
File Type: txt mcs_word_book_index_example_csv_files_rename_to_zip_to_extract.txt (17.4 KB, 423 views)
File Type: txt mcs_regular_expression_that_emulates_the_NEAR_operator_for_full_text_search_queries.txt (140 Bytes, 430 views)
File Type: txt _mcs_book_awards_csv.txt (265.2 KB, 785 views)
File Type: txt _mcs_book_awards_sql.txt (465.1 KB, 622 views)
File Type: zip multicolumnsearch.zip (2.76 MB, 29994 views)

Last edited by DaltonST; 10-13-2017 at 01:16 PM. Reason: Version 1.0.74
DaltonST is offline   Reply With Quote