![]() |
#1 |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Jun 2015
Device: none
|
Automated tag association
Hi, I have a bunch of (over 500) text documents that I've imported to the Calibre library, but they don't have any metadata associated with them (they are plain .txt files).
I would like to assign tags to documents based on their content. Basically if a document talks about bridges it is given a bridge tag, if it talks about roadways it is given a roadways tag, and if it talks about bridges and roadways it is given both tags. How would I do this in Calibre? |
![]() |
![]() |
![]() |
#2 | |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,724
Karma: 29711016
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
You could try downloading the metadata from one or more of the metadata source sites (Amazon, B&N, Goodreads etc) - but that probably only works for commercial publications To do it manually - Enter the tags in Metadata Edit (Press E). with commas between tags eg 'bridges, roadways'. To the left of the tag field in Metadata Edit there's a button, if you click that, then you get a specialised Tag Editor that makes it easy to select previously defined tags - helps avoid ending up with 'roads' and 'roadways'. You can also edit them directly in the book list by highlighting the cell and pressing F2, you can also press Shift+F2 on a Tags cell to get the Tags Editor. BR |
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Deviser
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,265
Karma: 2090983
Join Date: Aug 2013
Location: Texas
Device: none
|
QuarantineAndScrub
The subject add-on has a Tags By Comments capability. Peruse its user guide for more info.
DaltonST |
![]() |
![]() |
![]() |
#4 |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Jun 2015
Device: none
|
Hello,
BetterRed, these aren't commercial publications unfortunately. I may have to go the manual route if I can't work out DaltonST's plugin DaltonST, thanks, I'll take a look at that and post back my results. |
![]() |
![]() |
![]() |
#5 | |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,724
Karma: 29711016
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
BR |
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Jun 2015
Device: none
|
hello, they're not sorted by subject. right now the directory listing is like
Documents +Roadkill |`Roadkill.txt +Fezfez |`Fezfez.txt +Apricots and Bonds `Apricots and Bonds.txt DaltonST, I've read through the manual for Q&S and I see how it works with pre-existing metadata (title, tags, etc), but how do I have it use the contents of the file instead of the metadata as indicated in the manual? |
![]() |
![]() |
![]() |
#7 |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,724
Karma: 29711016
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Looks like you'll have to do the tagging based on your knowledge of the contents.
When I first created my main library it was with about 8000 'texts', and like you I had no downloadable metadata sources. I worked on the tagging progressively over a couple of months and ended up with about 30 tags. BR |
![]() |
![]() |
![]() |
#8 |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Jun 2015
Device: none
|
alright, thanks BetterRed, if I worked at your rate I'd have these done in about a week. Do you know anything about the aforementioned addon Quarantine & Scrub?
|
![]() |
![]() |
![]() |
#9 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Quarantine&Scrub has a long and complex user guide. It seems to have niche appeal and TBH I am not sure how many people understand it.
Last edited by eschwartz; 06-18-2015 at 06:16 PM. |
![]() |
![]() |
![]() |
#10 | |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,724
Karma: 29711016
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
I have reservations about whether that approach would work for contents without contextual analysis - Kerouac's On the Road ain't about "Roads". My initial inclination is that that would require human intervention. But here's a patent aimed at automation, the citations might find some implementations Patent US6199081 - Automatic tagging of documents and exclusion by content And here's an interesting pdf paper from a Taxomony consultant Taxonomies for Auto-Tagging Unstructured Content They might inspire someone to write something BR |
|
![]() |
![]() |
![]() |
#11 | ||
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Jun 2015
Device: none
|
Quote:
I'm new here, but I'm already disappointed that a developer would suggest a plugin that doesn't do what he said. Quote:
That patent seems to be outside the scope of my needs. I'm not using a network to store these files. The Hedden document is a slideshow presentation of talking points, not something I could use |
||
![]() |
![]() |
![]() |
#12 | ||
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,724
Karma: 29711016
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
I've no idea if DaltonST had this in mind when he suggested you take a look at his PI. If you look through the version history of the PI you'll see there have been many enhancements - many of which stemmed from posts such as yours. Quote:
The Calibre (GUI and Command Line) and it's PI's provide 'canned' solutions to many problems. But they also provide a rich set of tools, which with a bit of lateral thinking enable the user to fashion their own solutions. BR |
||
![]() |
![]() |
![]() |
#13 |
Deviser
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,265
Karma: 2090983
Join Date: Aug 2013
Location: Texas
Device: none
|
@jon_joy_1999:
If you haven't finished manually creating Tags within Calibre for your 500 text files, this might help you by creating Comments and Tags automatically using a list of the 'Top N Nouns' in each text file, sorted by frequency in descending order: [GUI Plugin] English Noun Frequency : https://www.mobileread.com/forums/sho...d.php?t=263684 A typical example for a Factual/Non-fiction book is attached just below. DaltonST Last edited by DaltonST; 08-06-2015 at 09:53 AM. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Unutterably Silly Guilt by association | ahammer | Lounge | 11450 | Today 09:41 PM |
suggestion: tag groups should use Calibre tag hierarchy | comox | Calibre Companion | 53 | 05-25-2015 07:22 PM |
Clear txt association | CuZnDragon | Calibre | 1 | 12-11-2010 07:13 PM |
File Association | Soxendom | Calibre | 26 | 10-25-2009 01:29 PM |