|  05-12-2020, 09:11 AM | #1 | 
| Groupie            Posts: 173 Karma: 40000 Join Date: Oct 2013 Device: kindle |  Create index on epub from printed book 
			
			I wonder if there is a way to render the index (analytical index, not the table of contents) from a printed book without having to manually insert all the entries in Sigil's index editor. I've been working with books with very long and complex indexes, and re-creating them is a huge task. But since the indexes are already written and there's references to the page numbers, maybe there is a quicker way to do it?  I thought perhaps keeping the page numbers when saving the file from Finereader, then turning those numbers into invisible IDs to use as anchors to which to link the page numbers in the index (all doable with some regex wisdom). This solution, though, has at least one big problem: it only works on books whose page numbers are displayed at the top of the page (because the anchor needs to be before the page it refers to, and not after). Anyone knows a better solution? Last edited by 1v4n0; 05-12-2020 at 09:46 AM. | 
|   |   | 
|  05-12-2020, 10:44 AM | #2 | 
| Addict            Posts: 324 Karma: 3200000 Join Date: Oct 2015 Location: Madison, WI Device: Kindle 5th Gen | 
			
			Seems like one problem with your idea is that even if it worked, it would only tell the reader that the topic they're seeking exists somewhere between that anchor and the end of the book. A human will have to find the real target, whether that human is you building the index, or the reader skimming around and hoping they didn't miss it.
		 Last edited by phillipgessert; 05-12-2020 at 10:47 AM. | 
|   |   | 
| Advert | |
|  | 
|  05-12-2020, 11:00 AM | #3 | 
| Groupie            Posts: 173 Karma: 40000 Join Date: Oct 2013 Device: kindle | 
			
			The link will bring the (human) reader to the point where the printed page begins, in which the word is found. Sigil's index targets <p>s, which are shorter than printed pages, but the difference is not tragic imho. Indexes on printed books work the same way, except that you actually notice when you've reached the end of the page.
		 | 
|   |   | 
|  05-12-2020, 11:24 AM | #4 | 
| Addict            Posts: 324 Karma: 3200000 Join Date: Oct 2015 Location: Madison, WI Device: Kindle 5th Gen | 
			
			Yes, that was my point. An extreme example, an anchor on page 1, "the topic you're looking for exists somewhere in this book." If this is for your own use, seems fine though. If not, you might also run into trouble with folks that don't understand this concept of a page and won't page at all.
		 Last edited by phillipgessert; 05-12-2020 at 11:28 AM. | 
|   |   | 
|  05-13-2020, 01:36 AM | #5 | ||||
| Wizard            Posts: 2,306 Karma: 13057279 Join Date: Jul 2012 Device: Kobo Forma, Nook | Quote: 
 When you make it to EPUB, Search&Replace ¤ with a <span>: Code: <span class="pagebreak"></span> Before: Code: <span class="pagebreak"></span> <span class="pagebreak"></span> <span class="pagebreak"></span> [...] Code: <span epub:type="pagebreak" id="page1" title="1"/> <span epub:type="pagebreak" id="page2" title="2"/> <span epub:type="pagebreak" id="page3" title="3"/> [...] Quote: 
 To do this, use Regex to set the page numbers apart, then Doitsu's plugin to renumber everything. Page Numbers (Bottom): Spoiler: 
 then just tell incremental IDs to start from 1 higher: After: Spoiler: 
 Quote: 
 Code: 381–385 381–85 381–5 And another style where I typically see errors: Code: 385n 385n10 Quote: 
 For all the info you'll ever need, see my posts in: Help ;with Kindle Page Numbers Fractional Page Numbering (And the entire pyramid of linked threads.) Last edited by Tex2002ans; 05-13-2020 at 01:56 AM. | ||||
|   |   | 
| Advert | |
|  | 
|  05-13-2020, 07:51 AM | #6 | 
| Groupie            Posts: 173 Karma: 40000 Join Date: Oct 2013 Device: kindle | 
			
			Wow, huge thanks Tex2002ans for the post. One question though: how can I add a <span> at the beginning of each page from AFR? There is no string that is repeated on all the pages. | 
|   |   | 
|  05-13-2020, 12:52 PM | #7 | |
| Wizard            Posts: 2,306 Karma: 13057279 Join Date: Jul 2012 Device: Kobo Forma, Nook |  Quote: 
 Or in your typical Non-Fiction book, it usually has alternating author/book+chapter on every even/odd page. You would just have to do a few search/replaces.  Every book is different. Side Note: I never export any of the headers/footers out of Finereader anyway, since it only introduces more problems than it helps. And keeping all the same page breaks would require exporting as "Exact Copy"... and that is UGLY. Personally, I would just complete all the cleanup work in EPUB, then reintroduce pagebreaks manually. If you were working in InDesign, there are a few different methods mentioned in "Getting InDesign to export pagelists to ePub3 (reflowable)". The gist was adding an invisible anchor at the start of every "master page". Unsure if there's a similar way to do that in Word/DOCX. | |
|   |   | 
|  05-15-2020, 06:24 AM | #8 | |
| Bookmaker & Cat Slave            Posts: 11,503 Karma: 158448243 Join Date: Apr 2010 Location: Phoenix, AZ Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2 | Quote: 
 But realistically, for any decent-sized non-fic book, you're talking 5.5" x 8.5" or 6" x 9" or larger, which means that one printed page=3(ish) eBook "pages" or screens. The links are simply bloody ineffective and more frustrating, in practice, than useful. I mean, it's one thing to fan to a page and then visually skim. It's another to flip to a page/screen and then have to more-carefully (because you don't know where the original "page" ends, do you?) skim-read to find the relevant section. Just saying. This is a topic/subject with which I have wrangled more than once and as much as I hate to say it--and I bloody well do--the only realistic answers are, tell them to use search, OR, link to the damn paragraphs and nobody, anywhere, wants to do that, if they are the bookmaker, or pay to have it done, if you're the customer. That's my $0.02 cents. Hitch | |
|   |   | 
|  05-15-2020, 06:57 AM | #9 | 
| Groupie            Posts: 173 Karma: 40000 Join Date: Oct 2013 Device: kindle | 
			
			Thanks hitch. I had very similar concerns in fact. Sigil's index does link to <p>s, and it is relatively simple to create one when it's just one analytical index (most commonly with personal names). It is however a much longer task to recreate the indexes when there are more than one (for example one for persons, one for places, one for books, etc).
		 | 
|   |   | 
|  05-15-2020, 07:43 AM | #10 | |
| Bookmaker & Cat Slave            Posts: 11,503 Karma: 158448243 Join Date: Apr 2010 Location: Phoenix, AZ Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2 | Quote: 
 
 The problem has never been, really, getting people from the index to the relevant content, if the customer is willing to pay, (as it takes much longer, obviously, than Real Page Numbers linking) and is also willing to ensure that you know WHERE those page links are really meant to land--it's everything else that goes with it. Hitch | |
|   |   | 
|  05-15-2020, 07:48 AM | #11 | 
| Groupie            Posts: 173 Karma: 40000 Join Date: Oct 2013 Device: kindle | 
			
			Wow, lots of italics and caps there   Yes I get it. It's a complicated issue. Given all that, do you think it's worth creating the index anyway? | 
|   |   | 
|  05-15-2020, 09:18 AM | #12 | |
| Bookmaker & Cat Slave            Posts: 11,503 Karma: 158448243 Join Date: Apr 2010 Location: Phoenix, AZ Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2 | Quote: 
 I think that the state of search is such that many topic indices could be out of work. However, there are index topics that aren't that clear-cut, like the usual "see also" or where you get an area or range of pages that discuss something important to the topic in which you are interested, but don't necessarily say a word or phrase overtly. In those instances, you would not find it/them, if you did search. You have to assess the existence of those, the reality of them, before you can decide what to do around your index, IMHO. So, sorry--it's really your call. Hitch | |
|   |   | 
|  05-15-2020, 03:24 PM | #13 | |||
| Wizard            Posts: 2,306 Karma: 13057279 Join Date: Jul 2012 Device: Kobo Forma, Nook | Quote: 
  And in reality, most publishers opt to just remove the index completely. (My personal preference: Just leave the index in, but unlinked.) As Hitch says, the usefulness of fully linking is 99.9% of the time NOT worth all the extra work needed.* But but but, there are still some advantages of inserting "RPNs" (especially Accessibility). Side Note: And again, on the real-life applicability of RPNs... Amazon strips out RPNs from most books. There are only a few rare exceptions allowed (as Hitch discussed a few months ago). But don't let Amazon's foolish decisions stop you, there's still plenty of EPUB devices/stores out there. And who knows, maybe one day Amazon might get some sense and flip the RPN switch on other types of books, and if your source documents were all prepped ahead of time, you'll be glad.  * Note: Although over the years, I have thought of coding up a little tool to check/link indexes faster+more accurately... but I haven't done any movement on that front.  * * * Similar argument can be made for proper HTML lang markup. Currently, there aren't many ereaders that currently make use of it, but in the future, support could be expanded: 
 (And there still ARE some very nice edge-cases to proper markup: like Calibre's Multi-Language Spellcheck, where it currently is a huge benefit.) Quote: 
 You would need an actual Word/LaTeX source, linked at the word/paragraph level, plus someone who knew exactly what they were doing with the built-in Index tools. Hint: Even within professional Indexers, this is an extreme minority. Most Indexes are created completely externally, and just appended to the end of a document. So you have a "dumb Index" as the only source (even in books that were fully created digitally). This is kind of like properly using Styles... Properly using Index tools? Absolutely unheard of. :P Quote: 
 It generates completely unhuman-readable mush of "[1], [2], [3], [...], [50]" links for every single word and mangles the HTML throughout the book with disgusting markup. Similar situation currently happens in a Calibre conversion of a Word document that has a built-in Index. Side Note: I haven't tested this one in a few years, but even worse, Calibre renumbers all of Word's Index's "page numbers" chronologically in the order they appear. For example: Actual DOCX Index: Code: A: 100, 101 B: 99, 100 C: 50, 101 Code: A: [1], [2] B: [3], [1] C: [4], [2] Last edited by Tex2002ans; 05-15-2020 at 03:31 PM. | |||
|   |   | 
|  05-16-2020, 05:59 AM | #14 | 
| Still reading            Posts: 14,926 Karma: 110507267 Join Date: Jun 2017 Location: Ireland Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper | 
			
			I can understand wanting to replicate a printed book index, but unless it's really a concordance it's not at all needed. Search works so well on every make / model of ebook reader you hardly need sync either. I use search to find the same place on a different make of ereader, which it's likely "sync" will never do that. So being lazy and liking an exactly correct solution, I'd put a note explaining search and about adding a space before or after if too many incorrect results occur. Baffles me that there isn't a whole word option and match case option. Most people don't even use internet search well. However the Google Playstore wins the award for the worst search ever, impossible to limit results or do exact searches. Google's web search is now also poisoned by what they think you ought to see, but the ereader searches and ereader app searches return exactly what you asked for. | 
|   |   | 
|  | 
| Thread Tools | Search this Thread | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Converting from EPUB/MOBI to PDF with Printed Book Formatting | nickmik123 | Conversion | 1 | 05-03-2018 10:21 PM | 
| Create Index | brolny | Sigil | 22 | 08-09-2016 09:24 AM | 
| Rapidly create an index? | Iteria | Sigil | 9 | 04-18-2012 07:33 AM | 
| Creator Tearing my hair out trying to create a simple index | djb132 | Kindle Formats | 0 | 10-23-2011 10:14 PM | 
| Index: Making a linked index in epub | virtual_ink | ePub | 21 | 10-19-2011 11:23 PM |