![]() |
#1 |
Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21
Karma: 13884
Join Date: Jan 2014
Device: apple ipad (3rd generation)
|
Can I break up an HTML file using a TOC?
I have some public domain ebooks in epub that contain a Table of Contents, but it seems that the books contains only a small handful of HTML files, each with multiple chapters in them. However my e-reader only recognizes chapter progress within “sections”, meaning within each HTML file and not according to the TOC which is just linking to paragraphs within each HTML file.
I’d like to break up the chapters into their own separate HTML files using the TOC as a guide. I’d like to be able to do it automatically rather than manually. |
![]() |
![]() |
![]() |
#2 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,175
Karma: 18533687
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
If you see all chapter titles, you can simply insert a Sigil split marker tag before each chapter heading tag. For example, if all chapter headings are <h1> tags, you'd use: Find:<h1 Replace:<hr class="sigil_split_marker" /><h1 and then select Edit > Split at markers followed by Tools > Table of Contents > Generate Table of Contents. If the TOC is empty when you select Tools > Table of Contents > Generate Table of Contents, you can use KevinH's TOCSaver plugin to change paragraph tags to heading tags or insert hidden heading tags. |
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 874
Karma: 2457128
Join Date: Nov 2011
Device: none
|
If the document consistently gives chapter titles a particular tag (and doesn't use that tag elsewhere) a simple Search and Replace to insert "sigil_split_marker" will do the job.
But if the code were that organised, I suspect the TOC would have been sorted out already. You may have no practical alternative to finding the chapter titles you want to list in a TOC and applying the h1 tag manually. How many chapters? Some jobs are really too small to be worth automating. |
![]() |
![]() |
![]() |
#4 |
Running with scissors
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,037
Karma: 10000000
Join Date: Nov 2019
Device: none
|
In addition to what's said above, what I also do in order to have chapter breaks only before chapter headings is to join all of the chapter files into one large html/xhtml file, then do the splitting as per above. (But I've forgotten how I joined the separate files so hunt around and experiment.)
|
![]() |
![]() |
![]() |
#5 |
Bibliophagist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 12,146
Karma: 59280049
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Forma, Clara HD, Nexus 7 HD, iPad Pro, Tolino epos
|
To continue from what @hobnail said, I also tend to join all the chapter files into a single file since Gutenberg has a love for having massive files with chapters split between the files. To do this, I select the files I want to merge, and then right click and merge or Ctrl-M. After this, I insert the split markers and split.
Quite often the split markers are simple to insert but at other times, the regex to insert the split markers can be a learning experience. |
![]() |
![]() |
Advert | |
|
![]() |
#6 | |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 874
Karma: 2457128
Join Date: Nov 2011
Device: none
|
Quote:
Indeed. I'm all in favour of learning experiences. But sometimes you have to balance an hour's research into Regex with the time taken to manually insert 16 chapter breaks! |
|
![]() |
![]() |
![]() |
#7 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 62,333
Karma: 102150074
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Aura H2O, PRS-650, PRS-T1, nook STR, iPad 4, iPhone SE 2020, PW3
|
While it might take longer to learn regex, once you've learned it, it will eventually take less time.
|
![]() |
![]() |
![]() |
#8 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 874
Karma: 2457128
Join Date: Nov 2011
Device: none
|
But when that hour results in the conclusion that chapter's AREN'T marked in any consistent and unique way.... :-( If these were well-constructed EPUB files we wouldn't be having to do this job in the first place.
|
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
HTML Entities placed in ToC break Kobo Aura | trekky0623 | Calibre | 11 | 12-16-2016 04:22 PM |
Kindler previewer not recognizing toc.ncx file, my html toc, or the start point... | petercrowell | Kindle Formats | 2 | 05-01-2012 08:14 AM |
HTML input plugin stripping text within toc tags in child html file | nimblebooks | Conversion | 3 | 02-21-2012 03:24 PM |
NCX file generator (and html ToC and opf) | GiorgioC | Workshop | 0 | 07-12-2011 06:55 AM |
can't generate a toc from an html file | p3aul | Calibre | 13 | 08-27-2010 05:44 AM |