Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 06-15-2015, 07:14 PM   #1
jon_joy_1999
Junior Member
jon_joy_1999 began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Jun 2015
Device: none
Automated tag association

Hi, I have a bunch of (over 500) text documents that I've imported to the Calibre library, but they don't have any metadata associated with them (they are plain .txt files).
I would like to assign tags to documents based on their content. Basically if a document talks about bridges it is given a bridge tag, if it talks about roadways it is given a roadways tag, and if it talks about bridges and roadways it is given both tags.
How would I do this in Calibre?
jon_joy_1999 is offline   Reply With Quote
Old 06-15-2015, 09:54 PM   #2
BetterRed
null operator
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 17,525
Karma: 20473555
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by jon_joy_1999 View Post
Hi, I have a bunch of (over 500) text documents that I've imported to the Calibre library, but they don't have any metadata associated with them (they are plain .txt files).
I would like to assign tags to documents based on their content. Basically if a document talks about bridges it is given a bridge tag, if it talks about roadways it is given a roadways tag, and if it talks about bridges and roadways it is given both tags.
How would I do this in Calibre?
@jon_joy_1999 - As far as know there's no automatic way based on analysis of book contents

You could try downloading the metadata from one or more of the metadata source sites (Amazon, B&N, Goodreads etc) - but that probably only works for commercial publications

To do it manually - Enter the tags in Metadata Edit (Press E). with commas between tags eg 'bridges, roadways'. To the left of the tag field in Metadata Edit there's a button, if you click that, then you get a specialised Tag Editor that makes it easy to select previously defined tags - helps avoid ending up with 'roads' and 'roadways'.

You can also edit them directly in the book list by highlighting the cell and pressing F2, you can also press Shift+F2 on a Tags cell to get the Tags Editor.

BR
BetterRed is offline   Reply With Quote
Advert
Old 06-15-2015, 11:17 PM   #3
DaltonST
Deviser
DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.
 
DaltonST's Avatar
 
Posts: 1,880
Karma: 1804640
Join Date: Aug 2013
Location: Texas
Device: 10" Win10 Tablet w/Calibre64, CalibreSpy & Freda+
QuarantineAndScrub

The subject add-on has a Tags By Comments capability. Peruse its user guide for more info.

DaltonST
DaltonST is offline   Reply With Quote
Old 06-16-2015, 12:31 PM   #4
jon_joy_1999
Junior Member
jon_joy_1999 began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Jun 2015
Device: none
Hello,
BetterRed, these aren't commercial publications unfortunately. I may have to go the manual route if I can't work out DaltonST's plugin

DaltonST, thanks, I'll take a look at that and post back my results.
jon_joy_1999 is offline   Reply With Quote
Old 06-16-2015, 04:24 PM   #5
BetterRed
null operator
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 17,525
Karma: 20473555
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by jon_joy_1999 View Post
Hello,
BetterRed, these aren't commercial publications unfortunately. I may have to go the manual route if I can't work out DaltonST's plugin
@jon_joy_1999 - had a feeling that was the case, if its not too late, and you had the originals organised around subject -- e.g. in separate directories -- you could re-add them is batches and make use of

Click image for larger version

Name:	Capture.JPG
Views:	138
Size:	89.1 KB
ID:	139361

BR
BetterRed is offline   Reply With Quote
Advert
Old 06-16-2015, 09:19 PM   #6
jon_joy_1999
Junior Member
jon_joy_1999 began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Jun 2015
Device: none
hello, they're not sorted by subject. right now the directory listing is like

Documents
+Roadkill
|`Roadkill.txt
+Fezfez
|`Fezfez.txt
+Apricots and Bonds
`Apricots and Bonds.txt

DaltonST, I've read through the manual for Q&S and I see how it works with pre-existing metadata (title, tags, etc), but how do I have it use the contents of the file instead of the metadata as indicated in the manual?
jon_joy_1999 is offline   Reply With Quote
Old 06-16-2015, 11:40 PM   #7
BetterRed
null operator
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 17,525
Karma: 20473555
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Looks like you'll have to do the tagging based on your knowledge of the contents.

When I first created my main library it was with about 8000 'texts', and like you I had no downloadable metadata sources. I worked on the tagging progressively over a couple of months and ended up with about 30 tags.

BR
BetterRed is offline   Reply With Quote
Old 06-17-2015, 11:18 AM   #8
jon_joy_1999
Junior Member
jon_joy_1999 began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Jun 2015
Device: none
alright, thanks BetterRed, if I worked at your rate I'd have these done in about a week. Do you know anything about the aforementioned addon Quarantine & Scrub?
jon_joy_1999 is offline   Reply With Quote
Old 06-17-2015, 11:51 AM   #9
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,420
Karma: 85000000
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quarantine&Scrub has a long and complex user guide. It seems to have niche appeal and TBH I am not sure how many people understand it.

Last edited by eschwartz; 06-18-2015 at 06:16 PM.
eschwartz is offline   Reply With Quote
Old 06-17-2015, 05:37 PM   #10
BetterRed
null operator
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 17,525
Karma: 20473555
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by jon_joy_1999 View Post
alright, thanks BetterRed, if I worked at your rate I'd have these done in about a week. Do you know anything about the aforementioned addon Quarantine & Scrub?
Only what I've read in the manual - the nearest thing is probably tags from comments. As I understand it you define words pairs, if first word is in comments then the second word is used as a tag. So you might have it set up such that -- track, street, turnpike, and motorway etc -- result in book being tagged as 'Roads'.

I have reservations about whether that approach would work for contents without contextual analysis - Kerouac's On the Road ain't about "Roads". My initial inclination is that that would require human intervention. But here's a patent aimed at automation, the citations might find some implementations

Patent US6199081 - Automatic tagging of documents and exclusion by content

And here's an interesting pdf paper from a Taxomony consultant Taxonomies for Auto-Tagging Unstructured Content

They might inspire someone to write something

BR
BetterRed is offline   Reply With Quote
Old 06-18-2015, 02:31 PM   #11
jon_joy_1999
Junior Member
jon_joy_1999 began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Jun 2015
Device: none
Quote:
Originally Posted by eschwartz View Post
Quarantine&Scrub has an incredibly long user guide and the creator expects you to read it all, doesn't like having to explain it. It seems to have niche appeal and TBH I am not sure how many people understand it.
I read the user guide provided and didn't see any functions that suggested they would do what I wanted, that's why I asked how DaltonST I would use it with the contents of the file.
I'm new here, but I'm already disappointed that a developer would suggest a plugin that doesn't do what he said.
Quote:
Originally Posted by BetterRed View Post
Only what I've read in the manual - the nearest thing is probably tags from comments. As I understand it you define words pairs, if first word is in comments then the second word is used as a tag. So you might have it set up such that -- track, street, turnpike, and motorway etc -- result in book being tagged as 'Roads'.

I have reservations about whether that approach would work for contents without contextual analysis - Kerouac's On the Road ain't about "Roads". My initial inclination is that that would require human intervention. But here's a patent aimed at automation, the citations might find some implementations

Patent US6199081 - Automatic tagging of documents and exclusion by content

And here's an interesting pdf paper from a Taxomony consultant Taxonomies for Auto-Tagging Unstructured Content

They might inspire someone to write something

BR
These files don't even have comments associated with them. As of right now I'm using Agent Ransack to search through the files for keywords and then manually applying tags within Calibre.

That patent seems to be outside the scope of my needs. I'm not using a network to store these files.

The Hedden document is a slideshow presentation of talking points, not something I could use
jon_joy_1999 is offline   Reply With Quote
Old 06-18-2015, 05:27 PM   #12
BetterRed
null operator
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 17,525
Karma: 20473555
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by jon_joy_1999 View Post
I read the user guide provided and didn't see any functions that suggested they would do what I wanted, that's why I asked how DaltonST I would use it with the contents of the file.
I'm new here, but I'm already disappointed that a developer would suggest a plugin that doesn't do what he said.

These files don't even have comments associated with them.
Given your documents are text documents - why don't you try pasting a couple of them (yes the whole document) into the corresponding Comments column and then experiment with the Q&S Tags from Comments facility - after which you can remove the text from Comments.

I've no idea if DaltonST had this in mind when he suggested you take a look at his PI. If you look through the version history of the PI you'll see there have been many enhancements - many of which stemmed from posts such as yours.

Quote:
Originally Posted by jon_joy_1999 View Post
As of right now I'm using Agent Ransack to search through the files for keywords and then manually applying tags within Calibre.
I use Windows Search in a similar way as you're using Ransack. When I get interested in something I do the relevant searches, I save the results paths to the clip board, paste that into Notepad++ and make it into a csv that I read with the Import List PI to create a Reading List, add a Tag etc.

The Calibre (GUI and Command Line) and it's PI's provide 'canned' solutions to many problems. But they also provide a rich set of tools, which with a bit of lateral thinking enable the user to fashion their own solutions.

BR
BetterRed is offline   Reply With Quote
Old 08-06-2015, 09:45 AM   #13
DaltonST
Deviser
DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.
 
DaltonST's Avatar
 
Posts: 1,880
Karma: 1804640
Join Date: Aug 2013
Location: Texas
Device: 10" Win10 Tablet w/Calibre64, CalibreSpy & Freda+
@jon_joy_1999:


If you haven't finished manually creating Tags within Calibre for your 500 text files, this might help you by creating Comments and Tags automatically using a list of the 'Top N Nouns' in each text file, sorted by frequency in descending order:


[GUI Plugin] English Noun Frequency : https://www.mobileread.com/forums/sho...d.php?t=263684


A typical example for a Factual/Non-fiction book is attached just below.





DaltonST
Attached Thumbnails
Click image for larger version

Name:	eng_example_english_only.JPG
Views:	132
Size:	51.7 KB
ID:	140842  

Last edited by DaltonST; 08-06-2015 at 09:53 AM.
DaltonST is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
suggestion: tag groups should use Calibre tag hierarchy comox Calibre Companion 53 05-25-2015 07:22 PM
Unutterably Silly Guilt by association ahammer Lounge 5171 09-24-2012 11:17 AM
Clear txt association CuZnDragon Calibre 1 12-11-2010 07:13 PM
File Association Soxendom Calibre 26 10-25-2009 01:29 PM


All times are GMT -4. The time now is 08:11 AM.


MobileRead.com is a privately owned, operated and funded community.