MobileRead Forums - View Single Post - [GUI Plugin] Extract People & Other Metadata

arpeggioaccele · 05-23-2023, 08:02 AM

I installed Extract People & Other Metadata through calibre user plugins and used it to extract Ao3 links from the epubs I downloaded, though I'm not sure if I did it the best way. (I have been intentionally avoiding fanficfare) Thought it would be good to share if anyone else needed my janky solution.

I couldn't figure out how to remove a set string at the beginning and end of the text so here's the solution I made.

I made a custom column named link.
I set up three #link custom column extractors:
To identify work links I used the keyword "Posted originally on the.+$" I wanted to make I got the right link, sometimes there are other work links in the file.
I filtered the result to just the work id number and s/ "s\/\d+"
Then I in tweaks I added

Code:

REMOVE_CHARACTERS=.<>[]()
CALIBRE_TEMPLATE_LANGUAGE_BUILTIN=re(1, 'Posted originally on the Archive of Our Own at ', '')
CALIBRE_TEMPLATE_LANGUAGE_BUILTIN=re(1, 'Posted originally on the Archive of Our Ownhttp://archiveofourownorg/ at ', '')
CALIBRE_TEMPLATE_LANGUAGE_BUILTIN=re(1, 's/','http://archiveofourown.org/works/')

which removes the extraneous period and original keyword and adds back the website url after by replacing the /s. When you don't have a fulltext index, the raw html needs to be filtered differently.

To identify series links it was easier just assume there was only one and use the keyword "http:\/\/archiveofourown.org\/series\/.+$"
There was no period after to remove, but I filtered for the properly formatted url just in case.
I only turned this extractor on for works I combined with epubmerge.

For fanfiction.net works that I downloaded with ficlab I used the keyword "based on content retrieved from.+$"
I filtered for the story id and s/ "s\/\d+"
I replaced s/ with the website url after removing extraneous characters.

Code:

REMOVE_CHARACTERS=.
CALIBRE_TEMPLATE_LANGUAGE_BUILTIN=re(1, 'based on content retrieved from ', '')
CALIBRE_TEMPLATE_LANGUAGE_BUILTIN=re(1, 's/', 'https://www.fanfiction.net/s/')

And now I have a link in book details that I cannot click as a link. I probably should have made it an identifier or something lol, but it works for my purpose of exporting to a spreadsheet.

05-23-2023, 08:02 AM	#3
arpeggioaccele light mode user Posts: 66 Karma: 16268 Join Date: May 2023 Location: New England Device: I use the Calibre ebook-viewer on macos and Apple Books on ios.	Great plugin! I installed Extract People & Other Metadata through calibre user plugins and used it to extract Ao3 links from the epubs I downloaded, though I'm not sure if I did it the best way. (I have been intentionally avoiding fanficfare) Thought it would be good to share if anyone else needed my janky solution. I couldn't figure out how to remove a set string at the beginning and end of the text so here's the solution I made. I made a custom column named link. I set up three #link custom column extractors: To identify work links I used the keyword "Posted originally on the.+$" I wanted to make I got the right link, sometimes there are other work links in the file. I filtered the result to just the work id number and s/ "s\/\d+" Then I in tweaks I added Code: REMOVE_CHARACTERS=.<>[]() CALIBRE_TEMPLATE_LANGUAGE_BUILTIN=re(1, 'Posted originally on the Archive of Our Own at ', '') CALIBRE_TEMPLATE_LANGUAGE_BUILTIN=re(1, 'Posted originally on the Archive of Our Ownhttp://archiveofourownorg/ at ', '') CALIBRE_TEMPLATE_LANGUAGE_BUILTIN=re(1, 's/','http://archiveofourown.org/works/') which removes the extraneous period and original keyword and adds back the website url after by replacing the /s. When you don't have a fulltext index, the raw html needs to be filtered differently. To identify series links it was easier just assume there was only one and use the keyword "http:\/\/archiveofourown.org\/series\/.+$" There was no period after to remove, but I filtered for the properly formatted url just in case. I only turned this extractor on for works I combined with epubmerge. For fanfiction.net works that I downloaded with ficlab I used the keyword "based on content retrieved from.+$" I filtered for the story id and s/ "s\/\d+" I replaced s/ with the website url after removing extraneous characters. Code: REMOVE_CHARACTERS=. CALIBRE_TEMPLATE_LANGUAGE_BUILTIN=re(1, 'based on content retrieved from ', '') CALIBRE_TEMPLATE_LANGUAGE_BUILTIN=re(1, 's/', 'https://www.fanfiction.net/s/') And now I have a link in book details that I cannot click as a link. I probably should have made it an identifier or something lol, but it works for my purpose of exporting to a spreadsheet.