Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 12-04-2020, 02:49 PM   #16
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 11,503
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
Quote:
Originally Posted by BeckyEbook View Post
I'm not sure if this is exactly what you need, but I will post a few links that may lead you to come up with your own solution.

http://epubsecrets.com/why-i-use-page-list-and-how.php
http://epubsecrets.com/page-list-all...e-doing-it.php

The link to the script is dead, so I'm listing it from web archive:
http://web.archive.org/web/201912181...orohikoscripts

You can write directly to Laura, but I have a feeling you'd better check out the "EPUB Accessibility Using InDesign" video tutorial (available from Lynda.com or Linkedin), which AFAIK includes the PageStaker and EPUBOgrify script.
The latter is not so important anyway, because it is a simple change that can be done in Sigil.
See, there you go. I knew somebody would have some clips.

Hitch
Hitch is offline   Reply With Quote
Old 12-04-2020, 04:04 PM   #17
phillipgessert
Addict
phillipgessert ought to be getting tired of karma fortunes by now.phillipgessert ought to be getting tired of karma fortunes by now.phillipgessert ought to be getting tired of karma fortunes by now.phillipgessert ought to be getting tired of karma fortunes by now.phillipgessert ought to be getting tired of karma fortunes by now.phillipgessert ought to be getting tired of karma fortunes by now.phillipgessert ought to be getting tired of karma fortunes by now.phillipgessert ought to be getting tired of karma fortunes by now.phillipgessert ought to be getting tired of karma fortunes by now.phillipgessert ought to be getting tired of karma fortunes by now.phillipgessert ought to be getting tired of karma fortunes by now.
 
phillipgessert's Avatar
 
Posts: 316
Karma: 3200000
Join Date: Oct 2015
Location: Madison, WI
Device: Kindle 5th Gen
I have not tried this (and frankly even if it works it still sounds pretty miserable) but I wonder if you could work page-by-page unlocking whatever master page element includes the page number, and then use a plugin such as https://www.rorohiko.com/wordpress/i...ds/textstitch/ to auto-thread the page numbers into the document flow.
phillipgessert is offline   Reply With Quote
Old 12-04-2020, 04:29 PM   #18
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 11,503
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
Quote:
Originally Posted by phillipgessert View Post
I have not tried this (and frankly even if it works it still sounds pretty miserable) but I wonder if you could work page-by-page unlocking whatever master page element includes the page number, and then use a plugin such as https://www.rorohiko.com/wordpress/i...ds/textstitch/ to auto-thread the page numbers into the document flow.
Oh, Phillip! Who knew you had such masochistic leanings? Quick--what's your safe word?



Hitch
Hitch is offline   Reply With Quote
Old 12-04-2020, 04:39 PM   #19
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,764
Karma: 6000000
Join Date: Nov 2009
Device: many
If you have pdf of printed version of book, you should be able to print to a postscript file and use python on that postscript file to extract the page numbers and the first n words and last n words on each page (where n is small say 3) and save that info to a file. Then use sed or some other stream editor with that info to insert the markers you want in each html file.

Some custom programming in python might be needed but should be reusable for future projects.
KevinH is offline   Reply With Quote
Old 12-04-2020, 05:39 PM   #20
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by BeckyEbook View Post
I'm not sure if this is exactly what you need, but I will post a few links that may lead you to come up with your own solution.
Fantastic. Thanks for sharing.

Quote:
Originally Posted by Ryn View Post
What I really wanted to know was: how do you export the page numbers in an indesign export to epub? It's not an option in the export dialogue box, nor is it something I can easily put together using indesign's byzantine search module.

Is there another way?
BeckyEbook's links would likely work as well. Those 2018 articles are probably better + more modern than the older articles I linked in my Post #4.

Quote:
Originally Posted by Hitch View Post
You don't. There isn't any easy or magic or automatic way to export the RPNs (Real Page Numbers). You create them manually.
Yep, you would think it would be a checkbox in InDesign... especially with how much Adobe talks Accessibility.

Quote:
Originally Posted by Hitch View Post
Knowing Tex, he has some mad coding that will do some of this more easily than I've described, but that's the fundamental process, right there.
I see you didn't read all the links in my earlier post #4!

(Typical Hitch, never reading anything I write! )

Anyway, it's not anything I've ever done in an actual EPUB, just theoretical musings. All the logic is sound though.

Luckily, I haven't had to do an Index in a very long time.
Tex2002ans is offline   Reply With Quote
Old 12-04-2020, 05:54 PM   #21
Ryn
Connoisseur
Ryn began at the beginning.
 
Posts: 55
Karma: 10
Join Date: Feb 2012
Device: none
Quote:
Originally Posted by BeckyEbook View Post
I'm not sure if this is exactly what you need, but I will post a few links that may lead you to come up with your own solution.

http://epubsecrets.com/why-i-use-page-list-and-how.php
http://epubsecrets.com/page-list-all...e-doing-it.php

The link to the script is dead, so I'm listing it from web archive:
http://web.archive.org/web/201912181...orohikoscripts

You can write directly to Laura, but I have a feeling you'd better check out the "EPUB Accessibility Using InDesign" video tutorial (available from Lynda.com or Linkedin), which AFAIK includes the PageStaker and EPUBOgrify script.
The latter is not so important anyway, because it is a simple change that can be done in Sigil.
Thank you for digging up the web archive link for me. This might be just what I'm looking for.

Quote:
Originally Posted by KevinH
If you have pdf of printed version of book, you should be able to print to a postscript file and use python on that postscript file to extract the page numbers and the first n words and last n words on each page (where n is small say 3) and save that info to a file. Then use sed or some other stream editor with that info to insert the markers you want in each html file.

Some custom programming in python might be needed but should be reusable for future projects.
This might also be something I'd consider doing, seeing how big the project is, and how much I loathe working from PDFs. By postscript file, do you mean a text file?

I can see some potential problems with this, as the page numbers are on the bottom of the pages, and some pages are empty, which may confuse the issue, but that might be something I could prompt for.

Quote:
Originally Posted by phillipgessert
I have not tried this (and frankly even if it works it still sounds pretty miserable) but I wonder if you could work page-by-page unlocking whatever master page element includes the page number, and then use a plugin such as https://www.rorohiko.com/wordpress/i...ds/textstitch/ to auto-thread the page numbers into the document flow.
Or I could try my hand at programming an indesign plugin for this express purpose. How hard could it be to get a script to recognize the page numbers, and to cross-reference the indexed page numbers to the first word on the relevant page? Famous last words, I'm sure...

--
Food for thought here folks, thanks a lot!
Ryn is offline   Reply With Quote
Old 12-04-2020, 06:37 PM   #22
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,764
Karma: 6000000
Join Date: Nov 2009
Device: many
Yes a postscript (.ps) file is typically generated by a postscript printer driver (it is what is spooled and set to a postscript compatible printer) and contains the commands in text that the printer uses to make it actually print. There are many opensource (ghostscript, cups, pdf2ps on Linux) and commercial tools that can do this (Adobe Acrobat Pro). A ps file is very similar to a pdf file that has been unencoded, and uncompressed.

Extracting the info you need from one should be relatively easy. As for missing page labels (or numbers) these can easily be filled in and interpolated correctly from known page labels or values using a simple spreadsheet or software as it is a one-to-one mapping.

If you can use pdf2ps on your machine it should be easy enough to look at the postscript file in a text editor and look for "showpage" and decide for yourself how hard it would be to extract just what you need.

Last edited by KevinH; 12-04-2020 at 07:12 PM.
KevinH is offline   Reply With Quote
Old 12-04-2020, 07:25 PM   #23
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 11,503
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
Quote:
Originally Posted by Tex2002ans View Post

I see you didn't read all the links in my earlier post #4!

(Typical Hitch, never reading anything I write! )
Oh, you are being a [something or other]. You know I do, in fact, read all your stuff. (Go ahead, I defy you to find other human beings that do!). I just don't always remember every single thing. And...well, yes, I may on occasion skim. It's not like you write 300-word Blogger posts, now, is it. You're like the Anti-Twitter. You're the one guy I can always count on to make me look terse, my brother.

Anyway, it's not anything I've ever done in an actual EPUB, just theoretical musings. All the logic is sound though. [/QUOTE]

And there it is. :-) That stuff is just bloody tedious. I think it would be fun to write programming or clips, etc., to do it...but HAVING to do it, commercially, is the dog's south end.

Quote:
Luckily, I haven't had to do an Index in a very long time.
You ARE lucky!

Hitch
Hitch is offline   Reply With Quote
Old 12-05-2020, 03:54 AM   #24
Ryn
Connoisseur
Ryn began at the beginning.
 
Posts: 55
Karma: 10
Join Date: Feb 2012
Device: none
Quote:
Originally Posted by KevinH View Post
Yes a postscript (.ps) file is typically generated by a postscript printer driver (it is what is spooled and set to a postscript compatible printer) and contains the commands in text that the printer uses to make it actually print. There are many opensource (ghostscript, cups, pdf2ps on Linux) and commercial tools that can do this (Adobe Acrobat Pro). A ps file is very similar to a pdf file that has been unencoded, and uncompressed.

Extracting the info you need from one should be relatively easy. As for missing page labels (or numbers) these can easily be filled in and interpolated correctly from known page labels or values using a simple spreadsheet or software as it is a one-to-one mapping.

If you can use pdf2ps on your machine it should be easy enough to look at the postscript file in a text editor and look for "showpage" and decide for yourself how hard it would be to extract just what you need.
I have looked at this, but both pdf2ps and acrobat's postscript output are heavy on code, and feature encrypted text, rendering those avenues relatively useless.

Acrobat also exports to txt, rtf, doc, docx etc, so I could imagine writing a python script that analyses such a file.

That might give me the strings I could use to iterate through an epub html file and add anchor tags, that I could then link the index entries to. I'd need to account for whitespace, the existence of potential html tags, and other things, probably, but this seems relatively straightforward.

With some sophistication - as the indices feature page ranges and note numbers, too - I might be able to automate the whole thing.

Seeing as there are thousands of pages - and thousands of index entries per volume - here, it's definitely worth a try.

edit: oh, no, scratch that. There are footnotes, a lot of them, clouding the issue in the PDF2xxx output, which I need to disregard, without losing the numbered lists. Also, as the page numbers are in the footers. And of course there are also headers, which I should also disregard. At this point, perhaps I am better off just working from the PDF in the first place, which is not so bad all things considered.

Last edited by Ryn; 12-05-2020 at 04:07 AM. Reason: new sh*t has come to light; she kidnapped herself, man
Ryn is offline   Reply With Quote
Old 12-05-2020, 06:16 AM   #25
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,727
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
@Ryn: Do you know that Sigil has a built-in index tool? To use it, all you have to do is generate a plain text file with index entries, e.g. index.txt, and select the following:
  • Tool > Index > Index Editor...
  • Right-click > Open > index.txt > Save
  • Tool > Index > Create Index
Obviously the index entries won't have page numbers, but you might be able to add them later with a custom Python script or a plugin.
Doitsu is offline   Reply With Quote
Old 12-05-2020, 07:54 AM   #26
Ryn
Connoisseur
Ryn began at the beginning.
 
Posts: 55
Karma: 10
Join Date: Feb 2012
Device: none
Quote:
Originally Posted by Doitsu View Post
@Ryn: Do you know that Sigil has a built-in index tool? To use it, all you have to do is generate a plain text file with index entries, e.g. index.txt, and select the following:
  • Tool > Index > Index Editor...
  • Right-click > Open > index.txt > Save
  • Tool > Index > Create Index
Obviously the index entries won't have page numbers, but you might be able to add them later with a custom Python script or a plugin.
Hi there Doitsu, yeah I did know that. And I have considered using it.

Thing is, these are custom-built indexes with thousands of entries per volume. I doubt I could do remotely as good a job as the original indexers, who likely spent upward of forty hours on each index.

Needless to say, these books were not created to make any profit whatsoever, and that also goes for the digital edition we're currently putting together. The foundation which has enlisted my help has as a core value the dissemination of these texts, and keeping them safely available for future generations.

I generally dissuade clients from including indexes, but in this case I am willing to make an exception. And I personally resonate with the subject, so my participation is not a chore at all.

That being said, I dislike unnecessary monotonous labor as much as most people, if not more, so being smart about it and using tech to my advantage, I'm all for that!
Ryn is offline   Reply With Quote
Old 12-05-2020, 08:26 AM   #27
Ryn
Connoisseur
Ryn began at the beginning.
 
Posts: 55
Karma: 10
Join Date: Feb 2012
Device: none
Quote:
Originally Posted by BeckyEbook View Post
I'm not sure if this is exactly what you need, but I will post a few links that may lead you to come up with your own solution.

http://epubsecrets.com/why-i-use-page-list-and-how.php
http://epubsecrets.com/page-list-all...e-doing-it.php

The link to the script is dead, so I'm listing it from web archive:
http://web.archive.org/web/201912181...orohikoscripts

You can write directly to Laura, but I have a feeling you'd better check out the "EPUB Accessibility Using InDesign" video tutorial (available from Lynda.com or Linkedin), which AFAIK includes the PageStaker and EPUBOgrify script.
The latter is not so important anyway, because it is a simple change that can be done in Sigil.
This turns out to work like a charm, even when exporting to ePub 2, which I feared might be a problem.

Thanks Becky!

Now I just need to write me some python logic to rid myself of the task of manually linking the index to the pages. But that's the fun part

Laura mentioned another script that might actually serve my purpose even better: LiveIndex, found here: https://www.id-extras.com/products/liveindex/

I'm mentioning it, just in case anyone else ever comes across a similar use case.

Last edited by Ryn; 12-05-2020 at 10:04 AM. Reason: LiveIndex script mention
Ryn is offline   Reply With Quote
Old 12-05-2020, 09:56 AM   #28
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,764
Karma: 6000000
Join Date: Nov 2009
Device: many
I see you found your solution but for the record there are ps2txt and ps2ascii you can use to display these as well as this useful article I used in the past:

https://www.cs.waikato.ac.nz/~ihw/pa...tract-Text.pdf

It prepends a short and sweet extra postscript function to the original postscript which redefines the show methods to give you text output that would be easier to parse.

FWIW, I find working with "ps printer device" can extract text electronically that is very hard to get to in other ways without a scanner.
KevinH is offline   Reply With Quote
Old 12-05-2020, 10:01 AM   #29
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,764
Karma: 6000000
Join Date: Nov 2009
Device: many
And you need to worry that you are linking to the top of a "printed" page and not the word itself which may appear no where on the actual screen as that "printed page" will generally be much longer than the screen holds. So it will just get them close at best.

In many ways, a good search function replaces the need for indexes almost completely.


Quote:
Originally Posted by Ryn View Post
This turns out to work like a charm, even when exporting to ePub 2, which I feared might be a problem.

Thanks Becky!

Now I just need to write me some python logic to rid myself of the task of manually linking the index to the pages. But that's the fun part
KevinH is offline   Reply With Quote
Old 12-05-2020, 10:12 AM   #30
Ryn
Connoisseur
Ryn began at the beginning.
 
Posts: 55
Karma: 10
Join Date: Feb 2012
Device: none
Quote:
Originally Posted by KevinH View Post
And you need to worry that you are linking to the top of a "printed" page and not the word itself which may appear no where on the actual screen as that "printed page" will generally be much longer than the screen holds. So it will just get them close at best.

In many ways, a good search function replaces the need for indexes almost completely.
Well, yes and no. Of course, I always use the exact same argument when attempting to dissuade clients from insisting on index inclusion.

But... Whereas searching is active, it presupposes you know exactly what you are searching for, whereas you might not always know what you don't know.

An Index, otoh, has done this work for you, and then some. A good index will have collated different locations pertaining to the way "angular momentum" pertains to "diesel engines," for example. (Not even sure that that is a thing, but allow me the liberty.)

This passive searching allows for a deeper sense of discovery in books that are more encyclopedic in scope.

Not relevant to the vast majority of books that reaches our devices, I would be the first to agree, but in some cases, very much a desirable addition.
Ryn is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
InDesign CC 2017 epub export question ralphiedee ePub 5 11-24-2016 09:02 PM
Export to ePub from InDesign CS5 gardefjord ePub 42 10-29-2011 10:42 AM
InDesign CS 5.5 Epub Export Problems SamL ePub 1 09-16-2011 07:06 PM
InDesign export as ePub? Alda General Discussions 3 01-24-2011 12:59 PM
EPUB Expert Needed: Cant properly export epub from InDesign crottmann ePub 17 08-27-2010 10:23 AM


All times are GMT -4. The time now is 02:01 PM.


MobileRead.com is a privately owned, operated and funded community.