Get only chapters with content from ePub?
Hi!
I'm currently working on an app that syncs the progress from audiobookshelf to Calibre-Web and vice-versa. However, I'm coming to the issue that it's quite difficult to extract the content chapters from an ePub, so discarding foreword, table of contents etc.
Is there a way to reliably get those chapters? Since ePub doesn't have a strict standard, many of the books I looked at structure the book slightly differently.
Right now, I'm using a keyword-based approach to check which chapters contain the meat of the book: If the chapter name contains prologue, chapter, epilogue, it is a content chapter. If it's something like foreword, table of contents, acknowledgement etc. it gets discarded. This approach works mostly but obviously immediately breaks when there is a book that doesn't follow this structure.
|