![]() |
#1 |
Deviser
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,265
Karma: 2090983
Join Date: Aug 2013
Location: Texas
Device: none
|
[GUI Plugin] Extract People & Other Metadata
[GUI Plugin] Extract People & Other Metadata
Summary: EPOM uses your personally-constructed 'Python Regular Expressions' to search the text of an ebook and extract metadata from it. Documentation: The EPOM 'User Guide' is comprised of all of its ToolTips plus any images and related files attached below. Requires Minimum Calibre Version: 6.11.0 Other Useful Calibre Plugins to Consider:
Version History: Spoiler:
Last edited by DaltonST; 01-07-2023 at 11:18 AM. |
![]() |
![]() |
![]() |
#2 |
Deviser
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,265
Karma: 2090983
Join Date: Aug 2013
Location: Texas
Device: none
|
For Future Use
For Future Use
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
light mode user
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 66
Karma: 16268
Join Date: May 2023
Location: New England
Device: I use the Calibre ebook-viewer on macos and Apple Books on ios.
|
Great plugin!
I installed Extract People & Other Metadata through calibre user plugins and used it to extract Ao3 links from the epubs I downloaded, though I'm not sure if I did it the best way. (I have been intentionally avoiding fanficfare) Thought it would be good to share if anyone else needed my janky solution.
I couldn't figure out how to remove a set string at the beginning and end of the text so here's the solution I made. I made a custom column named link. I set up three #link custom column extractors: To identify work links I used the keyword "Posted originally on the.+$" I wanted to make I got the right link, sometimes there are other work links in the file. I filtered the result to just the work id number and s/ "s\/\d+" Then I in tweaks I added Code:
REMOVE_CHARACTERS=.<>[]() CALIBRE_TEMPLATE_LANGUAGE_BUILTIN=re(1, 'Posted originally on the Archive of Our Own at ', '') CALIBRE_TEMPLATE_LANGUAGE_BUILTIN=re(1, 'Posted originally on the Archive of Our Ownhttp://archiveofourownorg/ at ', '') CALIBRE_TEMPLATE_LANGUAGE_BUILTIN=re(1, 's/','http://archiveofourown.org/works/') To identify series links it was easier just assume there was only one and use the keyword "http:\/\/archiveofourown.org\/series\/.+$" There was no period after to remove, but I filtered for the properly formatted url just in case. I only turned this extractor on for works I combined with epubmerge. For fanfiction.net works that I downloaded with ficlab I used the keyword "based on content retrieved from.+$" I filtered for the story id and s/ "s\/\d+" I replaced s/ with the website url after removing extraneous characters. Code:
REMOVE_CHARACTERS=. CALIBRE_TEMPLATE_LANGUAGE_BUILTIN=re(1, 'based on content retrieved from ', '') CALIBRE_TEMPLATE_LANGUAGE_BUILTIN=re(1, 's/', 'https://www.fanfiction.net/s/') |
![]() |
![]() |
![]() |
#4 |
light mode user
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 66
Karma: 16268
Join Date: May 2023
Location: New England
Device: I use the Calibre ebook-viewer on macos and Apple Books on ios.
|
I now realize this can be done through fanficfare or mass search and replace to find urls in books grab metadata, and it correctly make the url an identifier... oops.
https://www.mobileread.com/forums/sh...ntifiers+links https://www.mobileread.com/forums/sh...nk#post4320727 https://www.mobileread.com/forums/sh...ntifiers+links |
![]() |
![]() |
![]() |
#5 |
want to learn what I want
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,523
Karma: 7086519
Join Date: Sep 2020
Device: none
|
I wish I had tried this amazing plugin before using the Job Spy similar tool!
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Member
![]() Posts: 11
Karma: 10
Join Date: Jul 2021
Device: Windows 11
|
This is an awesome tool. Is there a way to bulk extract info from multiple ebooks, instead of doing one at a time?
If it's necessary to do 1 at a time, is there a way to avoid the result screen popping up (where it shows the updated book, and you need to click back to the home screen, then do another search to get to where you originally was)? Thank you! |
![]() |
![]() |
![]() |
#7 |
want to learn what I want
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,523
Karma: 7086519
Join Date: Sep 2020
Device: none
|
edit: it's ok now
Last edited by Comfy.n; 03-11-2024 at 03:31 PM. |
![]() |
![]() |
![]() |
#8 |
Member
![]() Posts: 11
Karma: 10
Join Date: Jul 2021
Device: Windows 11
|
@Comfy.n: Awesome, tysm!! This has been super useful. A small issue with this is that when I do multiple books without using FTS index, the result pops up 1 by 1. Consequentially, only the last extracted book get marked. This makes it a bit inconvenient when going back and forth between different search pages. This aside, the function definitely works.
Question: Is there anyway I can use this to extract the first 3-4 paragraphs from an epub? Background info: I'm trying to generate a cover from the first page of an epub. Calibre's default "set cover from book" doesn't seem to work too well, so my plan is to 1. Use EPOM to extract the first 3-4 paragraphs from the epub (into custom column #first_pars) 2. Use Generate Cover to create a cover with text from #first_pars |
![]() |
![]() |
![]() |
#9 |
want to learn what I want
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,523
Karma: 7086519
Join Date: Sep 2020
Device: none
|
Well I've used EPOM just for extracting translators and original titles, using the FTS index. That was not too challenging. In your case it would be better if Dalton could help, but it's been almost a year he's away from MR, unfortunately. Or maybe some regex power user.
I don't see an easy way to detect the exact beginning of the text, given the ebooks' structure variations, however you could try something like this - set the tweak MAXIMUM_LENGTH_TO_ACCEPT= to a large value - then populate the #first-pars column using regex to match, say, the first 1000 chars in the book ![]() |
![]() |
![]() |
![]() |
#10 |
Member
![]() Posts: 23
Karma: 10
Join Date: Sep 2020
Device: Kindle Paperwhite
|
This is a very usefull tool, thanks for it.
I have a problen that is, after extract a translator to a custom column, the return value is "Translator: Diorki" for example. How can i get from the ":" to the end? |
![]() |
![]() |
![]() |
#11 |
want to learn what I want
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,523
Karma: 7086519
Join Date: Sep 2020
Device: none
|
I had an issue with a MOBI format: UnboundLocalError: cannot access local variable 'is_finished' where it is not associated with a value
Upon conversion to EPUB, plugin worked fine, thankfully. |
![]() |
![]() |
![]() |
#12 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 70
Karma: 143132
Join Date: Sep 2010
Device: Kindle Keyboard 3G
|
Hello, I don't know if I understand the plugin correct: I've epub books which have this tag in the content.opf (e. g.: <dc:source>https://www.heise.de/blog/CQRS-als-Grundlage-fuer-moderne-flexible-und-skalierbare-Anwendungsarchitektur-10275526.html?view=print</dc:source>)
I want to extract the website address and add it to custom text column #source. How can I do this with this plugin? Or is there another possibility (e. g. identifiers) to extract such a tag from a ebook file? Greetings from the Münsterland, Maria |
![]() |
![]() |
![]() |
#13 | |
want to learn what I want
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,523
Karma: 7086519
Join Date: Sep 2020
Device: none
|
Quote:
From looking at this tooltip, I gather EPOM is capable of extracting text, not OPF HTML content: Last edited by Comfy.n; 02-13-2025 at 02:55 PM. |
|
![]() |
![]() |
![]() |
#14 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 70
Karma: 143132
Join Date: Sep 2010
Device: Kindle Keyboard 3G
|
Hello Comfy.n,
thank you - i had the same idea that it won't works with the content.opf. I'll add the idea to the thread. Greetings from the Münsterland, Maria! |
![]() |
![]() |
![]() |
#15 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Feb 2025
Device: epub
|
What a wonderfull plugin !
I'd like to add all the authors from an anthology, with that expression ^([\p{L}\.'\- ]*\p{Lu}{2,}[\p{L}\.'\- ]*)$ i ^(.*)$ ...but i allways get a "no custum column is active...nothing to do". What am i doing wrong ? I've check the case at the left. |
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
[GUI Plugin] Extract ISBN | kiwidude | Plugins | 548 | 03-04-2025 10:43 PM |
[GUI Plugin] Zotero Metadata Importer | DaltonST | Plugins | 292 | 12-18-2024 05:23 PM |
[GUI Plugin] ePub Extended Metadata | un_pogaz | Plugins | 20 | 08-10-2024 05:48 PM |
[GUI plugin] Extract tables of contents | Phssthpok | Plugins | 3 | 02-11-2024 07:47 AM |
[GUI Plugin] Clean Metadata | WS64 | Plugins | 28 | 01-06-2022 08:09 PM |