![]() |
#1 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Jan 2025
Device: Kobo Clara BW
|
Get only chapters with content from ePub?
Hi!
I'm currently working on an app that syncs the progress from audiobookshelf to Calibre-Web and vice-versa. However, I'm coming to the issue that it's quite difficult to extract the content chapters from an ePub, so discarding foreword, table of contents etc. Is there a way to reliably get those chapters? Since ePub doesn't have a strict standard, many of the books I looked at structure the book slightly differently. Right now, I'm using a keyword-based approach to check which chapters contain the meat of the book: If the chapter name contains prologue, chapter, epilogue, it is a content chapter. If it's something like foreword, table of contents, acknowledgement etc. it gets discarded. This approach works mostly but obviously immediately breaks when there is a book that doesn't follow this structure. |
![]() |
![]() |
![]() |
#2 |
Reading till the spring
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 12,989
Karma: 97000289
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
This isn't really an epub issue. There is no standard name or heading for a "chapter", or even need to have chapters at all, or have names / titles / headings in the text.
A keyword approach will fail on many ebooks. It's convention to regard each file as either a front matter section, main body (optionally chapters) and end matter section. But an ebook also might have front matter, body, end matter etc all in one file. The only 100% accurate method to delete all but main content is to look at the epub in an epub editor. You need to do it on a copy! |
![]() |
![]() |
![]() |
#3 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Jan 2025
Device: Kobo Clara BW
|
That's too bad, thank you! I forgot to mention that I am using an ePub editor and I'm looking at the XML files of the ePub.
I've seen some ePubs that use a class attribute in the toc.ncx to mark the main chapters of the book. I was hoping that there was some sort of widespread convention of marking the main chapters like that. |
![]() |
![]() |
![]() |
#4 | |
Reading till the spring
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 12,989
Karma: 97000289
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
Quote:
There can be one, all or multiple chapters per file. Or no chapters. |
|
![]() |
![]() |
![]() |
#5 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,481
Karma: 9202958
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
|
And some books might use Book>Part>Chapter
The Lord of the Rings is an example... The Lord of the Rings |__ The Fellowship of the Ring |____Book One |______ Chapter 1 - A Long-expected Party ... |____Book Two |______ Chapter 1 - Many Meetings ... |__ The Two Towers Some books may even have sub-chapters in the same xhtml file. |
![]() |
![]() |
![]() |
#6 |
Bibliophagist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 42,533
Karma: 162932766
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
I agree that the only reliable method is an epub editor and the Mark One eyeball. As usual, do all the dirty work on a copy or using Sigil's Checkpoints. The copy is my preferred method.
And then you have the epubs such as Gutenberg where you can have multiple chapters and partial chapters in a single file (the last one I edited, chapters 1 & 2 and the first part of chapter 3 were in a single file with the title page, contents, etc. while the last part of chapter 3, chapter 4 & 5 and the first part of chapter 6 were also a single file.) Last edited by DNSB; 01-13-2025 at 04:08 PM. |
![]() |
![]() |
![]() |
#7 |
frumious Bandersnatch
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,541
Karma: 19001081
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
And books with two types of overlapping divisions, like the 1001 Nights (or whichever version of the title you prefer): there are stories, stories within stories, etc. and nights.
|
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
epub -> epub/mobi/azw3 breaks up chapters | Siavahda | Conversion | 1 | 09-25-2017 12:20 PM |
epub to epub converting: making chapters? | Joy736 | Conversion | 8 | 10-30-2011 05:47 PM |
First visits to Sony store - comparing content to Chapters | hermes | Sony Reader | 7 | 05-31-2011 08:01 AM |
ePub division in chapters | TheWatt | ePub | 3 | 04-04-2011 05:02 PM |
ePub Chapters vs. Stanza Chapters | kjk | Sigil | 4 | 09-14-2009 11:50 AM |