I've got a research project I'd like to be able to pull specific information for and I think the shortest path to getting that is building two simple plugins.
In short, I've got a pile of pdfs, epubs, and mobis that all have footnotes/endnotes I'd like extracted and put into a text file.
I have another pile of the same type I'd like to create an index for any mention of a date or day of the week. Ideally, output to a .csv.
I
think python can do this through Calibre.
If I'm wrong about that, I'd appreciate someone letting me know before I sink too much time into it.
Now, I'm returning to programming after about 15 years of not doing any, so this is going to be a bumpy janky mess but I don't really need anything better. So it'll be a fun project.
I want to start with the hyperlink/footnote/endnote extractor first because it's simpler.
I'm spending this week catching up on the basics, but I want to make sure I'm also building a plan for how this program will work. It looks like python and Calibre already have libraries for most of what I want to do, so it'll just be a question of reading the documentation and looking at extant plugin code.
I'm considering updating a couple of the python 2 plugins still not updated to python 3 just so I see what less inept people are doing, but I've been unable to find anything quite like what I want.
A friend suggested this library for the
date parsing.
My questions for anyone who's made it this far are:
- Is what I want to do possible?
- What do I need to learn to make this work?
- Is there any existing project that might make this easier?
Anyway, thanks.