View Single Post
Old 01-13-2025, 04:26 AM   #1
FunnyPocketBook
Junior Member
FunnyPocketBook began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Jan 2025
Device: Kobo Clara BW
Get only chapters with content from ePub?

Hi!

I'm currently working on an app that syncs the progress from audiobookshelf to Calibre-Web and vice-versa. However, I'm coming to the issue that it's quite difficult to extract the content chapters from an ePub, so discarding foreword, table of contents etc.

Is there a way to reliably get those chapters? Since ePub doesn't have a strict standard, many of the books I looked at structure the book slightly differently.

Right now, I'm using a keyword-based approach to check which chapters contain the meat of the book: If the chapter name contains prologue, chapter, epilogue, it is a content chapter. If it's something like foreword, table of contents, acknowledgement etc. it gets discarded. This approach works mostly but obviously immediately breaks when there is a book that doesn't follow this structure.
FunnyPocketBook is offline   Reply With Quote