![]() |
#1 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 108
Karma: 810
Join Date: Jul 2012
Device: Kobo
|
![]()
Seeking suggestions for how to set up a Calibre "Quality Check" library search to find all books that contain ANY page #s in the table of contents, since I am trying to isolate those poorly converted (typically old) books that essentially contain only page #s. The objective is to find them so I can try to find alternative versions that are structured without page #s and show actual chapter sections in the TOC. I tried the approach of searching for a single occurrence of "page_[0-9]|page [0-9]|page[0-9]" in the NCX TOC but that approach doesn't work and on a sub-sample run I get too many hits, none of which seem to show page #s in the TOC.
Thanks in advance for any help on this. |
![]() |
![]() |
![]() |
#2 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 108
Karma: 810
Join Date: Jul 2012
Device: Kobo
|
approach that seems to work
I tried again, and the following modified "Quality Check" search setting DOES seems to work fairly well:
\>page_[0-9]|\>page [0-9]|\>page[0-9] There are significantly fewer hits, most of which do in fact show at least one page # in the TOC. I can then use "Edit ToC" for each of those ePubs to look for books where the TOC contains just page #s and no chapter identification. |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 108
Karma: 810
Join Date: Jul 2012
Device: Kobo
|
Replacing TOC containing only page #s ("Edit TOC")
After using the above auto+manual technique in Calibre to find those page-structured ePubs in my ePub library that contained only page #s in the TOC, I generally was not able to find an alternative ePub version that instead had the more typical continuous-flow text (not-page-structured) with only paragraph and chapter breaks and a corresponding TOC.
So, instead I left the page-structure as-is (I've removed page structures before and it can be a tedious process) and focused instead on removing the page#-only TOC and replacing it with a content-meaningful TOC. Surprisingly, the following "Edit TOC" script was useful as a starting point in almost every case when the ePub was page-structured, then manually deleting TOC elements that did not belong and manually inserting others as identified by the ePub's own internal TOC: //h:td[re:test(., "(^\s*[0-9]{1,2}\s*\n\s*[A-Za-z0-9].{1,80}[a-z]\n*)|(^\s*[0-9]{1,2}\s*$)|(^.{1,80}[a-z]\s*$)|(^\s*[IVX]{1,6}\s*$)|(^\s*prologue)|(^\s*epilogue)|(^\s*chap ter)|(^\s*book\s)|(^\s*part\s)|(^\s*map)|(^\s*inde x)|(^\s*introduction)|(^\s*notes)", "i")] |
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
[GUI Plugin] Quality Check | kiwidude | Plugins | 1251 | 07-07-2025 09:13 PM |
Share button greyed-out; used Quality Check | timetravelprimer | Kindle Formats | 3 | 02-02-2018 06:02 PM |
'Find Next' with Quality Check plugin | Frizzell | Library Management | 1 | 10-31-2017 06:23 AM |
Touch Quality check before loading to KT? | GvilleBridge | Kobo Reader | 7 | 07-11-2012 07:55 PM |
Quality check some extra function | drMerry | Development | 4 | 05-28-2011 12:40 PM |