Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 07-14-2017, 04:08 PM   #1
anarcat
Enthusiast
anarcat began at the beginning.
 
Posts: 36
Karma: 10
Join Date: Jul 2013
Device: Kobo Glo HD
automatic index generation

is there a way to automatically generate indexes the same way we can automatically generate table of contents? i am wondering because i have a large ePUB where H1 tags are in the TOC, but it would be too unwidely to also include H2 and H3 tags, which I would like as (two separate) indexes... Is that possible?
anarcat is offline   Reply With Quote
Advert
Old 07-14-2017, 04:11 PM   #2
Turtle91
Wizard
Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.
 
Turtle91's Avatar
 
Posts: 1,469
Karma: 11526836
Join Date: Dec 2012
Location: Altus, Oklahoma today
Device: iPhone 6/5/iPad 1,2 & Air/Surface Pro/Kindle PW
I don't think there is a fully automatic way, but try creating separate toc's including just h1, and just h2, and just h3. Then copy/paste the results to a new sheet.
Turtle91 is offline   Reply With Quote
Old 07-14-2017, 04:16 PM   #3
anarcat
Enthusiast
anarcat began at the beginning.
 
Posts: 36
Karma: 10
Join Date: Jul 2013
Device: Kobo Glo HD
Quote:
Originally Posted by Turtle91 View Post
I don't think there is a fully automatic way, but try creating separate toc's including just h1, and just h2, and just h3. Then copy/paste the results to a new sheet.
is there a way to have multiple TOCs? from what i can tell, there's only one TOC per document.

I also don't see how to include only headers below H1: i can get "only h1", "only h1 and h2" or "everything". i'm using 0.9.7, if that matters at all...
anarcat is offline   Reply With Quote
Old 07-14-2017, 04:59 PM   #4
Tex2002ans
Guru
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 847
Karma: 3900001
Join Date: Jul 2012
Device: Nook
What is the use-case? If we can understand the details of the project, maybe there is a better way of handling it.

The only cases of multiple TOCs I can think of off the top of my head are for "List of Tables" or "List of Illustrations".

Also, the title of the topic is a little confusing. The title says "Indexes", but it seems you are talking about TOCs.
Tex2002ans is offline   Reply With Quote
Old 07-14-2017, 05:12 PM   #5
anarcat
Enthusiast
anarcat began at the beginning.
 
Posts: 36
Karma: 10
Join Date: Jul 2013
Device: Kobo Glo HD
Well, you suggested to use multiple TOCs, I want an index.

The use case is this: I have a library of song lyrics. I want to compound those in ebook of some sort. There are a lot of songs from different artists and albums. I created a simple structure in HTML where every h1 is the artist, h2 is the album and h3 is the song title. Then the song lyrics are in a PRE tag after. This is ordered by Artist/Album so it works out okay in the main TOC: I take only H1 tags and get a table of contents for authors. So far so good.

But making a (sorted!) TOC for song titles doesn't make sense anymore, because the content is sorted by Artist. Hence the idea of using indexes instead. The idea would be to have an index of all song titles at the end of the ePUB, ordered by song name.
anarcat is offline   Reply With Quote
Old 07-14-2017, 06:50 PM   #6
KevinH
Wizard
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 2,524
Karma: 772404
Join Date: Nov 2009
Device: many
Then use search or grep to build a text file of just artists and of just albums. The feed those text files into Sigil Index Generating tool. Save the generated index after each run in an external html file and merge them.

See the old but still valid Sigil User's Guide to see how to generate an index using the Index Generation tool from a tab list of words / phrases.

Should work.
KevinH is online now   Reply With Quote
Old 07-14-2017, 11:43 PM   #7
Turtle91
Wizard
Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.
 
Turtle91's Avatar
 
Posts: 1,469
Karma: 11526836
Join Date: Dec 2012
Location: Altus, Oklahoma today
Device: iPhone 6/5/iPad 1,2 & Air/Surface Pro/Kindle PW
Quote:
Originally Posted by anarcat View Post
is there a way to have multiple TOCs? from what i can tell, there's only one TOC per document.

I also don't see how to include only headers below H1: i can get "only h1", "only h1 and h2" or "everything". i'm using 0.9.7, if that matters at all...
Sorry, that was a quick response as I was running off to work.

What I meant was that you could run an automated TOC while selecting just H2 to show up on the list. When that is done you would copy and paste that to a different sheet. Delete the TOC, then repeat that with just H3 selected, and copy/paste the results, etc. etc.

That would create your links to the different tags.

Yes it is more manual work than you probably want, but a lot less than it could be.

Hopefully the index options these others have posted about will work better. If they do please post back!

Cheers,
Turtle91 is offline   Reply With Quote
Old 07-15-2017, 01:24 AM   #8
Tex2002ans
Guru
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 847
Karma: 3900001
Join Date: Jul 2012
Device: Nook
Quote:
Originally Posted by anarcat View Post
The use case is this: I have a library of song lyrics. I want to compound those in ebook of some sort. There are a lot of songs from different artists and albums. I created a simple structure in HTML where every h1 is the artist, h2 is the album and h3 is the song title. Then the song lyrics are in a PRE tag after. This is ordered by Artist/Album so it works out okay in the main TOC: I take only H1 tags and get a table of contents for authors. So far so good.

But making a (sorted!) TOC for song titles doesn't make sense anymore, because the content is sorted by Artist. Hence the idea of using indexes instead. The idea would be to have an index of all song titles at the end of the ePUB, ordered by song name.
I would do what Turtle91 mentioned. Generate the entire Sigil TOC (include <h1> (Artists) -> <h3> (Songs)).

STEP 0

Make sure you aren't doing this on your actual EPUB. Save As and make a copy!

MAKE SURE YOU HAVE "Mode: Regex" + "Current File" selected.

MAKE SURE YOU ARE IN "Code View".

I attached a sample EPUB to the end of this post. I will be using that as the code examples:

Click image for larger version

Name:	RegexCurrentFile.png
Views:	20
Size:	95.3 KB
ID:	157951 Click image for larger version

Name:	ExampleArtistAlbumSongTOC.png
Views:	21
Size:	57.1 KB
ID:	157950

Original Code:

Spoiler:
Code:
  <div class="sgc-toc-level-1">
    <a href="../Text/Artist01.xhtml">Artist 1</a> 
    <div class="sgc-toc-level-2">
      <a href="../Text/Artist01.xhtml#sigil_toc_id_1">Albumus Example</a> 
      <div class="sgc-toc-level-3">
        <a href="../Text/Artist01.xhtml#sigil_toc_id_2">Albumus Songimus 1</a>
      </div>
      <div class="sgc-toc-level-3">
        <a href="../Text/Artist01.xhtml#sigil_toc_id_3">Albumus Songimus 2</a>
      </div>
      <div class="sgc-toc-level-3">
        <a href="../Text/Artist01.xhtml#sigil_toc_id_4">Albumus Songimus 3</a>
      </div>
      <div class="sgc-toc-level-3">
        <a href="../Text/Artist01.xhtml#sigil_toc_id_5">Albumus Songimus 4</a>
      </div>
    </div>
    <div class="sgc-toc-level-2">
      <a href="../Text/Artist01.xhtml#sigil_toc_id_6">Bulbumus Example</a> 
      <div class="sgc-toc-level-3">
        <a href="../Text/Artist01.xhtml#sigil_toc_id_7">Bulbumus Songimus 1</a>
      </div>
    </div>
    <div class="sgc-toc-level-2">
      <a href="../Text/Artist01.xhtml#sigil_toc_id_8">Callbumus Example</a> 
      <div class="sgc-toc-level-3">
        <a href="../Text/Artist01.xhtml#sigil_toc_id_9">Callbumus Songimus 1</a>
      </div>
    </div>
    <div class="sgc-toc-level-2">
      <a href="../Text/Artist01.xhtml#sigil_toc_id_10">Dollbumus Example</a> 
      <div class="sgc-toc-level-3">
        <a href="../Text/Artist01.xhtml#sigil_toc_id_11">Dollbumus Songimus 1</a>
      </div>
    </div>
  </div>
  <div class="sgc-toc-level-1">
    <a href="../Text/Artist02.xhtml">Bartist 2</a> 
    <div class="sgc-toc-level-2">
      <a href="../Text/Artist02.xhtml#sigil_toc_id_12">Bartimus Example</a> 
      <div class="sgc-toc-level-3">
        <a href="../Text/Artist02.xhtml#sigil_toc_id_13">Bartimus Songimus 1</a>
      </div>
    </div>
  </div>

  <div class="sgc-toc-level-1">
    <a href="../Text/Artist03.xhtml">Cartist 3</a> 
    <div class="sgc-toc-level-2">
      <a href="../Text/Artist03.xhtml#sigil_toc_id_14">Cartimus Example</a> 
      <div class="sgc-toc-level-3">
        <a href="../Text/Artist03.xhtml#sigil_toc_id_15">Cartimus Songimus 1</a>
      </div>
    </div>
  </div>


STEP 1

So... we generated the Sigil TOC. Now we have to throw everything out and only be left with the just the <h3> (Songs).

Regex is your friend.

This Regex takes Sigil's TOC code, and gets rid of the <h1>s (Artists):

Search: <div class="sgc-toc-level-1">\s+(<a[^>]+>[^<]+</a>)
Replace:

This gets rid of the <h2>s (Albums):

Search: <div class="sgc-toc-level-2">\s+(<a[^>]+>[^<]+</a>)
Replace:

And since we need the Songs... what I like to do is just change Sigil's TOC <div> into a <p> with a class:

Search: <div class="sgc-toc-level-3">\s+(<a[^>]+>[^<]+</a>)
Replace: <p class="tocthree">\1</p>

STEP 2

Right click > Reformat HTML > Mend and Prettify

OR press

Tools > Reformat HTML > Mend and Prettify All HTML Files.

That should leave you with a list of ONLY the Song Names:

Spoiler:
Code:
 <p class="tocthree"><a href="../Text/Artist01.xhtml#sigil_toc_id_2">Albumus Songimus 1</a></p>
  <p class="tocthree"><a href="../Text/Artist01.xhtml#sigil_toc_id_3">Albumus Songimus 2</a></p>
  <p class="tocthree"><a href="../Text/Artist01.xhtml#sigil_toc_id_4">Albumus Songimus 3</a></p>
  <p class="tocthree"><a href="../Text/Artist01.xhtml#sigil_toc_id_5">Albumus Songimus 4</a></p>
  <p class="tocthree"><a href="../Text/Artist01.xhtml#sigil_toc_id_7">Bulbumus Songimus 1</a></p>
  <p class="tocthree"><a href="../Text/Artist01.xhtml#sigil_toc_id_9">Callbumus Songimus 1</a></p>
  <p class="tocthree"><a href="../Text/Artist01.xhtml#sigil_toc_id_11">Dollbumus Songimus 1</a></p>
  <p class="tocthree"><a href="../Text/Artist02.xhtml#sigil_toc_id_13">Bartimus Songimus 1</a></p>
  <p class="tocthree"><a href="../Text/Artist03.xhtml#sigil_toc_id_15">Cartimus Songimus 1</a></p>


Warning If you mess up any of the Regex or Search/Replacing, when Sigil is trying to cleanup the leftover <div>s, it may remove important code. This is why you need to back up.

STEP 3

Now, to alphabetize these songs.

Run another Regex:

Search: (<a .+?>)(.+?)(</a>)
Replace: \2\1\3

What this does is capture the Song name and put it before the <a> link:

Before:

<p class="tocthree"><a href="../Text/Artist01.xhtml#sigil_toc_id_3">Albumus Songimus 2</a></p>

After:

<p class="tocthree">Albumus Songimus 2<a href="../Text/Artist01.xhtml#sigil_toc_id_3"></a></p>

Spoiler:
Code:
  <p class="tocthree">Albumus Songimus 1<a href="../Text/Artist01.xhtml#sigil_toc_id_2"></a></p>
  <p class="tocthree">Albumus Songimus 2<a href="../Text/Artist01.xhtml#sigil_toc_id_3"></a></p>
  <p class="tocthree">Albumus Songimus 3<a href="../Text/Artist01.xhtml#sigil_toc_id_4"></a></p>
  <p class="tocthree">Albumus Songimus 4<a href="../Text/Artist01.xhtml#sigil_toc_id_5"></a></p>
  <p class="tocthree">Bulbumus Songimus 1<a href="../Text/Artist01.xhtml#sigil_toc_id_7"></a></p>
  <p class="tocthree">Callbumus Songimus 1<a href="../Text/Artist01.xhtml#sigil_toc_id_9"></a></p>
  <p class="tocthree">Dollbumus Songimus 1<a href="../Text/Artist01.xhtml#sigil_toc_id_11"></a></p>
  <p class="tocthree">Bartimus Songimus 1<a href="../Text/Artist02.xhtml#sigil_toc_id_13"></a></p>
  <p class="tocthree">Cartimus Songimus 1<a href="../Text/Artist03.xhtml#sigil_toc_id_15"></a></p>


STEP 4

Now just toss that HTML into any tool that sorts alphabetically for you (I use Notepad++, or you may want to use a website like Text Mechanic). It should alphabetize all the songs:

Spoiler:
Code:
  <p class="tocthree">Albumus Songimus 1<a href="../Text/Artist01.xhtml#sigil_toc_id_2"></a></p>
  <p class="tocthree">Albumus Songimus 2<a href="../Text/Artist01.xhtml#sigil_toc_id_3"></a></p>
  <p class="tocthree">Albumus Songimus 3<a href="../Text/Artist01.xhtml#sigil_toc_id_4"></a></p>
  <p class="tocthree">Albumus Songimus 4<a href="../Text/Artist01.xhtml#sigil_toc_id_5"></a></p>
  <p class="tocthree">Bartimus Songimus 1<a href="../Text/Artist02.xhtml#sigil_toc_id_13"></a></p>
  <p class="tocthree">Bulbumus Songimus 1<a href="../Text/Artist01.xhtml#sigil_toc_id_7"></a></p>
  <p class="tocthree">Callbumus Songimus 1<a href="../Text/Artist01.xhtml#sigil_toc_id_9"></a></p>
  <p class="tocthree">Cartimus Songimus 1<a href="../Text/Artist03.xhtml#sigil_toc_id_15"></a></p>
  <p class="tocthree">Dollbumus Songimus 1<a href="../Text/Artist01.xhtml#sigil_toc_id_11"></a></p>


STEP 5

Stick the HTML back into Sigil and move the song names back in the links:

Search: <p class="tocthree">(.+?)(<a .+?>)(</a>)
Replace: <p class="tocthree">\2\1\3

That should reverse Step 3.

Before:

<p class="tocthree">Albumus Songimus 2<a href="../Text/Artist01.xhtml#sigil_toc_id_3"></a></p>

After:

<p class="tocthree"><a href="../Text/Artist01.xhtml#sigil_toc_id_3">Albumus Songimus 2</a></p>

Spoiler:
Code:
  <p class="tocthree"><a href="../Text/Artist01.xhtml#sigil_toc_id_2">Albumus Songimus 1</a></p>
  <p class="tocthree"><a href="../Text/Artist01.xhtml#sigil_toc_id_3">Albumus Songimus 2</a></p>
  <p class="tocthree"><a href="../Text/Artist01.xhtml#sigil_toc_id_4">Albumus Songimus 3</a></p>
  <p class="tocthree"><a href="../Text/Artist01.xhtml#sigil_toc_id_5">Albumus Songimus 4</a></p>
  <p class="tocthree"><a href="../Text/Artist02.xhtml#sigil_toc_id_13">Bartimus Songimus 1</a></p>
  <p class="tocthree"><a href="../Text/Artist01.xhtml#sigil_toc_id_7">Bulbumus Songimus 1</a></p>
  <p class="tocthree"><a href="../Text/Artist01.xhtml#sigil_toc_id_9">Callbumus Songimus 1</a></p>
  <p class="tocthree"><a href="../Text/Artist03.xhtml#sigil_toc_id_15">Cartimus Songimus 1</a></p>
  <p class="tocthree"><a href="../Text/Artist01.xhtml#sigil_toc_id_11">Dollbumus Songimus 1</a></p>


Now you have your fully alphabetized list of songs with links. Toss that in the Song Index at the end of your book.
Attached Files
File Type: epub ExampleArtistLyrics.epub (3.9 KB, 13 views)

Last edited by Tex2002ans; 07-15-2017 at 01:35 AM.
Tex2002ans is offline   Reply With Quote
Old 07-15-2017, 09:34 AM   #9
anarcat
Enthusiast
anarcat began at the beginning.
 
Posts: 36
Karma: 10
Join Date: Jul 2013
Device: Kobo Glo HD
thanks so much for the detailed responses!

in the end, i found another way of creating that ePUB, in the end. i'm generating a RST document which Sphinx turns into an ePUB, PDF or HTML:

https://github.com/beetbox/beets/pull/2628
anarcat is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
automatic TOC generation Neurone Conversion 1 06-03-2015 06:36 AM
automatic language tag generation donald1 Plugins 1 07-25-2013 10:12 AM
Index: Making a linked index in epub virtual_ink ePub 21 10-19-2011 11:23 PM
Automatic Index of Books Available for Download HarryT BBeB/LRF Books 6 09-11-2009 09:49 PM
Automatic index links creation in mobipocket ragdoll Kindle Formats 1 02-08-2008 07:07 AM


All times are GMT -4. The time now is 09:45 PM.


MobileRead.com is a privately owned, operated and funded community.