View Single Post
Old 09-04-2021, 06:47 PM   #1
Trenchant Edges
Junior Member
Trenchant Edges began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Sep 2021
Device: Kindle 10
Post [New Plugin Development Plan] Extracting footnotes/endnotes and Indexing dates

I've got a research project I'd like to be able to pull specific information for and I think the shortest path to getting that is building two simple plugins.

In short, I've got a pile of pdfs, epubs, and mobis that all have footnotes/endnotes I'd like extracted and put into a text file.

I have another pile of the same type I'd like to create an index for any mention of a date or day of the week. Ideally, output to a .csv.

I think python can do this through Calibre.

If I'm wrong about that, I'd appreciate someone letting me know before I sink too much time into it.

Now, I'm returning to programming after about 15 years of not doing any, so this is going to be a bumpy janky mess but I don't really need anything better. So it'll be a fun project.

I want to start with the hyperlink/footnote/endnote extractor first because it's simpler.


I'm spending this week catching up on the basics, but I want to make sure I'm also building a plan for how this program will work. It looks like python and Calibre already have libraries for most of what I want to do, so it'll just be a question of reading the documentation and looking at extant plugin code.

I'm considering updating a couple of the python 2 plugins still not updated to python 3 just so I see what less inept people are doing, but I've been unable to find anything quite like what I want.

A friend suggested this library for the date parsing.

My questions for anyone who's made it this far are:
  1. Is what I want to do possible?
  2. What do I need to learn to make this work?
  3. Is there any existing project that might make this easier?

Anyway, thanks.

Trenchant Edges is offline   Reply With Quote