![]() |
#1 |
Bibliophagist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 44,805
Karma: 168802811
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Indexing all words in an ebook
I have one acquaintance who wants to have a mass of words in his magnum opus to be in an index with the page location of each word and the page location linked back to the word. He also wants this to show in an azw3/kfx Kindle format conversion.
You would see a word index (I can't bring myself to call it a concordance) where clicking on the page number e.g. 76 would move to that page in ebook preferably with the cursor on the word aardvark. aardvark pg. 1, pg. 14, pg. 49, pg. 76 syzygy pg. 21, pg 48, pg. 103 My suggestion was that he forget the idea since a reflowable ebook does not have fixed page numbers and creating the index he wants would take an incredible amount of work for no real gain. His argument is that there must be an automated solution to doing this task pointing at AntConc 3.5.8 as an example, the other example he pointed out was written in Fortran and has not been maintained since the 80's. I'm asking for any feedback from anyone who has managed a similar task. Edit: could an moderator change the title to all words instead of all works? Please? Edit2: Thanks! Last edited by DNSB; 05-20-2020 at 12:57 PM. |
![]() |
![]() |
![]() |
#2 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,680
Karma: 23983815
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
What you could try is:
(Add Entry supports multiline lists.) |
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Still reading
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 13,696
Karma: 103837201
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
If it's NOT a concordance, then it's absolutely pointless as the built in search will work better.
|
![]() |
![]() |
![]() |
#4 | |
Bibliophagist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 44,805
Karma: 168802811
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Quote:
That's what I've argued. From his email this AM, he has decided to to find a "professional" who will give him what he wants and at the rather low price he wants to pay. |
|
![]() |
![]() |
![]() |
#5 | ||||
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,680
Karma: 23983815
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
Quote:
Quote:
MySQL NLTK It's also relatively easy to create a list of all unique words with NLTK and other tools. But insisting on page numbers doesn't make sense. Quote:
![]() |
||||
![]() |
![]() |
Advert | |
|
![]() |
#6 | |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,123
Karma: 144284184
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
|
|
![]() |
![]() |
![]() |
#7 | |||
Bibliophagist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 44,805
Karma: 168802811
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Quote:
Quote:
Quote:
Last edited by DNSB; 05-20-2020 at 12:57 PM. |
|||
![]() |
![]() |
![]() |
#8 | |||||||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Indexes serve a purpose because they are human-curated. A random hodgepodge of every occurrence of a word isn't very helpful. Quote:
Quote:
Original Text: Code:
This is an example sentence with aardvark and syzygy. Code:
This is an example sentence with aardvark\index{aardvark} and syzygy\index{syzygy}. Code:
\newcommand{\indexthis}[1]{#1\index{#1}} [...] This is an example sentence with \indexthis{aardvark} and \indexthis{syzygy}. Now... getting that back into EPUB would be a different problem. lol. Again, what is the point? At this level, you might as well be using Search, which does exactly what you want. Trying to replicate "jumping to the exact point" would require disgusting word-level markup like Sigil's Index Tool, and even then, in 99.9% of the ereaders, it wouldn't happen like he imagines. Side Note: And another thing, would he want all variants of "aardvark" ("aardvarks", "aardvarking") under an entry "aardvark" too? Or does he consider those all unique words/entries? Quote:
What he wants is off-the-deep-end absurd. Quote:
Quote:
Maybe question this "friend's" sanity. Perhaps recommend them to a mental health specialist for wanting to unleash such horrors onto their potential readers. ![]() Quote:
![]() Last edited by Tex2002ans; 05-20-2020 at 02:08 PM. |
|||||||
![]() |
![]() |
![]() |
#9 | ||
Bibliophagist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 44,805
Karma: 168802811
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Quote:
Quote:
![]() |
||
![]() |
![]() |
![]() |
#10 |
Still reading
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 13,696
Karma: 103837201
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
I don't think Tex2002ans or Hitch have done anything to deserve it.
I sometimes think that some people have only used paper and PDFs and think ebooks are PDFs. PDFs are print replicas, proofing tools, or even a file for publishing to paper. I wonder have people asking for these things read real ebooks on real ereaders. Not PDFs on 19" screens or 12" tablets. Oh, and my name is Manuel, I'm from Barcelona and I know NOTHING about making ebooks. Also that is not a dead rat in the kitchen. It's the cat's toy. Last edited by Quoth; 05-20-2020 at 05:06 PM. |
![]() |
![]() |
![]() |
#11 | ||
Bibliophagist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 44,805
Karma: 168802811
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Quote:
Quote:
As for PDFs? They are great for their intended purposes. Sadly as you mentioned, too many people tend to think of PDF as being the only ebook format. Reflow? Unreal page number? What's that? |
||
![]() |
![]() |
![]() |
#12 | |
Bookmaker & Cat Slave
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,503
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
Quote:
I do want to say that I would ABSOLUTELY criminally overcharge him for this. The brain-damage alone is worth thousands, IMHO. I figure when people see the sign on the website, saying that we're closed, they'll roll their eyes, thinking "but, but, everybody's been 'off' for months!" Not us, man. Hitch |
|
![]() |
![]() |
![]() |
#13 |
frumious Bandersnatch
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,543
Karma: 19001583
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
I don't understand... Sure, the sole proposal of such an index is nuts and ludicrous, but it doesn't seem to be difficult at all to do (except if one wants the page numbers to be the reflowable page numbers, and the "cursor on the word aardvark" part).
As for the stop words, just ignore them (i.e. don't exclude them), let him remove the corresponding entries once the index is done. |
![]() |
![]() |
![]() |
#14 |
Still reading
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 13,696
Karma: 103837201
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
|
![]() |
![]() |
![]() |
#15 | |
Bookmaker & Cat Slave
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,503
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
Quote:
Hitch |
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Indexing whole words only | 1v4n0 | Sigil | 5 | 09-30-2018 02:10 PM |
Extracts words highlighted in a ebook kindle | aprendoidiomas | Amazon Kindle | 30 | 09-16-2017 02:08 PM |
Indexing ebook for kindle ??? | ppoo | Kindle Developer's Corner | 2 | 10-23-2012 06:54 AM |
320 4-letter words puzzle ebook. $0.99 | old goat | Self-Promotions by Authors and Publishers | 0 | 07-01-2011 10:43 AM |
ebook has words running together with no gaps between them likethis | DarkRoast | General Discussions | 19 | 01-06-2011 01:05 AM |