MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Sigil (https://www.mobileread.com/forums/forumdisplay.php?f=203)
-   -   Creating page lists on Windows – EPUBogrify alternative? (https://www.mobileread.com/forums/showthread.php?t=339101)

Monaghan 04-29-2021 06:48 AM

Creating page lists on Windows – EPUBogrify alternative?
 
I am looking to improve accessibility in my EPUBs by adding a page list.

I was following Laura Brady's workflow (on LinkedIn Learning – here's essentially the same workflow detailed on epubsecrets), but unfortunately stalled at the point when it calls for a MacOS-only application called EPUBogrify at the point after using Pagestaker.

Ideally I would like to keep my workflows on my workstation PC.

In this MobileRead thread, BeckyEbook mentions that:

Quote:

[EPUBogrify] is not so important anyway, because it is a simple change that can be done in Sigil.
But this is not elaborated on. Does anyone know what this simple change is or whether there is an alternative to EPUBogrify for those of us not wanting to use MacOS?

BeckyEbook 04-29-2021 08:44 AM

Perhaps the simplest replacement for EPUBgrify would be a Sigil plugin that would only do two things:
1. replacement.
Find:
Code:

<span class="com-rorohiko-pagestaker-style">(\d+)</span>
Replace to:
Code:

<span epub:type="pagebreak" id="page\1" title="\1" />
2.creating a text file with the content (like EPUBgrify):
Code:

<li><a href="FILENAME.XHTML#\1">\1</a></li>


or and put that data right into the nav file.

In fact, the only problem is to put this file name in the text file, because there is no variable/placeholder in the "naked" Sigil that would allow you to insert the current file name in the "Replace" field. I do not complain about this lack, but this example only shows that such a need sometimes exists :)

JSWolf 04-29-2021 10:05 AM

How do page lists improve accessibility?

KevinH 04-29-2021 11:09 AM

We can add a nav page list generator to sigil if we could mark the spans with a sigil_pagelist class or some other way in the first replace.

I will look into itif people are interested.

Also doing this in a plugin would be very straight forward.

KevinH

Doitsu 04-29-2021 02:07 PM

Quote:

Originally Posted by KevinH (Post 4117187)
Also doing this in a plugin would be very straight forward.

Shameless plug: PageList plugin.

The plugin supports NCX and NAV pagelists.

EDIT: The plugin only supports books created with Sigil. If the nav file is not in the same folder as the other XHTML files, the page list hrefs will need to be manually updated.
I might fix this limitation in the next version.

BeckyEbook 04-29-2021 02:43 PM

@Doitsu: Oh, I don't know how I could have forgotten this plugin!

@Monaghan:
The first point – replacement.
The second point – use the PageList plugin.
Good luck!

Tex2002ans 04-29-2021 03:01 PM

Quote:

Originally Posted by Monaghan (Post 4117142)
I am looking to improve accessibility in my EPUBs by adding a page list.

[...]

In this MobileRead thread, BeckyEbook mentions [...]

But this is not elaborated on. Does anyone know what this simple change is or whether there is an alternative to EPUBogrify for those of us not wanting to use MacOS?

Hmmm... were my answers in that thread (Posts #4+) not satisfactory?

Doitsu's plugins really help, but you'd still need to go through and actually mark where page breaks actually occur.

- - -

If you're working in Microsoft Word, DAISY has since released a tool called "WordToEPUB":

https://daisy.org/activities/software/wordtoepub/

and they've also released a few videos about it:

Youtube: "WordToEPUB Extended Tutorial – Accessible EPUB in Seconds"

(They even mentioned how great Sigil is! :D)

~38 mins is where they discuss generating page numbers.

Same with that though, you have to manually mark your DOCX where pages occur, then the tools will help convert those numbers into the needed <a> or <span> markup.

Quote:

Originally Posted by JSWolf (Post 4117174)
How do page lists improve accessibility?

Allows readers to navigate in alternate ways (similar to TOC).

Instead of jumping to the next screen/heading, you can navigate by jumping to the next page (think Text-to-Speech).

You can also sync together with print-book readers (think a book club, classroom, or citation where they say "On page 45, this this and this occurs [...]")

For more details, check out all the previous threads discussing this topic:

2020: "Correct Page Numbers in Kindle?" (Post #20)

or do a search in your favorite search engine:

Code:

page numbers Tex2002ans Doitsu site:mobileread.com
Whenever I discuss page numbers, I usually bring up Doitsu's plugins. :D That should lead you to all the MR threads on the topic.

DiapDealer 04-29-2021 03:17 PM

Sounds like it all boils down to manually marking where pages occur. The rest can be easily automated (provided it's marked with all the info necessary to create the PageList).

Tex2002ans 04-29-2021 03:28 PM

Quote:

Originally Posted by DiapDealer (Post 4117272)
Sounds like it all boils down to manually marking where pages occur.

Pretty much.

In Post #5 of "Create index on epub from printed book", I even explained how to:
  1. go from nothing
  2. mark beginning-of-pages with rare character like the ¤ CURRENCY SIGN (U+00A4)
  3. Doitsu's plugins
  4. page markup

Quote:

Originally Posted by DiapDealer (Post 4117272)
The rest can be easily automated (provided it's marked with all the info necessary to create the PageList).

Yep, pretty much.

I know there was some InDesign plugin that automatically marked every page with a hidden anchor.

But I'm not aware of such a thing for Word/LibreOffice or other programs.

Everything I'm aware of is still the ol' manual markings.

Monaghan 04-30-2021 09:01 AM

Thanks, all. This is all very helpful information.

Quote:

Originally Posted by Tex2002ans (Post 4117274)
I know there was some InDesign plugin that automatically marked every page with a hidden anchor.

Yes indeed. My workflow EPUB workflow begins with the print book made in InDesign. I have used Pagestaker to mark the beginnings of the pages, so that side of things is sorted. It was the next step I was having issues with, so I have plenty to look into now. Much appreciated, everyone.

Tex2002ans 04-30-2021 03:56 PM

Quote:

Originally Posted by Monaghan (Post 4117486)
Yes indeed. My workflow EPUB workflow begins with the print book made in InDesign. I have used Pagestaker to mark the beginnings of the pages, so that side of things is sorted. It was the next step I was having issues with, [...].

Post example code from Pagestaker, and I could create a regex to convert it.

Export an EPUB from InDesign, then post a few pages of code here (or just attach a sample EPUB).

Once I see the pattern, it should be easy to map that over + come up with instructions for Doitsu's plugins. :D

Monaghan 05-03-2021 12:20 PM

That's kind of you. It's basically just this tag:

Code:

<span class="com-rorohiko-pagestaker-style">85</span>
But do let me know if you need more code!

KevinH 05-03-2021 01:42 PM

One related question I have always had, if you look at the actual printed book, does the position of this tag end that page number or start that page number?

In other words using your example, is the

Code:

<span class="com-rorohiko-pagestaker-style">85</span>
put after the last word on page 85 or is it marking the start of page 85 material (before the first word on page 85)?

The reason I ask (other than the off by one page issue) is that I have seen ebooks that tried this based on where the physical page number was printed in the pdf (top header or bottom footer) which makes it even more confusing. There must be a convention.

Doitsu 05-03-2021 03:47 PM

Quote:

Originally Posted by Monaghan (Post 4118332)
It's basically just this tag:
Code:

<span class="com-rorohiko-pagestaker-style">85</span>

I updated my Sigil PageList plugin. If you run it once to generate the PageList.json preference file and then change the contents to:

Code:

{
  "tag": "span",
  "attribute": "class",
  "value": "com-rorohiko-pagestaker-style"
}

it should work for your book, as long as there are no duplicate numbers or page gaps.

Since the plugin will also add missing epub:type and id attributes, you might want to test it with a copy of your book.

Tex2002ans 05-03-2021 05:00 PM

Quote:

Originally Posted by KevinH (Post 4118369)
One related question I have always had, if you look at the actual printed book, does the position of this tag end that page number or start that page number?

It always goes at the very beginning of the page, before the first text.

See DAISY, Accessible Publishing Knowledge Base: "Page Navigation"

Quote:

Does the page number reflect the page that is ending or the page that is starting?

The page number always reflects the page that is starting.

Should the page break marker placement follow the print position?

No, page break markers are always placed at the start of the page's content, regardless of whether the page number is printed at the top or bottom of the page in the print edition. When a user jumps to a specific page, they want to jump to the start of the content for that page, not the end.

Where do I put the page break if a word is hyphenated across a page?

Place the page marker after the word. Do not retain the print hyphenation and insert the number in the middle of the word.
Quote:

Originally Posted by Doitsu (Post 4118429)
I updated my Sigil PageList plugin.

Fantastic.

Quote:

Originally Posted by Monaghan (Post 4118332)
That's kind of you. It's basically just this tag:

Code:

<span class="com-rorohiko-pagestaker-style">85</span>
But do let me know if you need more code!

Then you can use this regex (EPUB3):

Find: <span class="com-rorohiko-pagestaker-style">(\d+)</span>
Replace: <span epub:type="pagebreak" id="page\1" title="\1"/>

That will convert that span into:

Code:

<span epub:type="pagebreak" id="page85" title="85"/>
which can then be fed into Doitsu's plugin.

Note: You may also wants a separate regex to deal with frontmatter with roman numeral page numbers (no idea how EPUBOgrify generates those).

I usually use the ol':

\b[xiv]+\b

to find lowercase roman numerals... but definitely don't do a mass Search/Replace unless you know what you're doing. :P

If EPUBOgrify uses the same code, it'll be:

Find: <span class="com-rorohiko-pagestaker-style">(\b[xiv]+\b)</span>

DNSB 05-03-2021 05:44 PM

Quote:

Originally Posted by Tex2002ans (Post 4118451)
It always goes at the very beginning of the page, before the first text.

See DAISY, Accessible Publishing Knowledge Base: "Page Navigation"

From what I've seen, while placing the page number marker at the start of text is common, placing it at the end of the text is also used. Who needs to pay attention to standards when we can do it our better way.

BeckyEbook 05-03-2021 05:46 PM

Code from EPUBOgrify:
Find: <span class="com-rorohiko-pagestaker-style">(\d+|[ivxclm]+)</span>

Tex2002ans 05-03-2021 06:19 PM

Quote:

Originally Posted by DNSB (Post 4118469)
From what I've seen, while placing the page number marker at the start of text is common, placing it at the end of the text is also used. Who needs to pay attention to standards when we can do it our better way.

Placing it at the end makes absolutely no sense. If you say:

"On page 123, author wrote X, Y, Z."

You need the marker to jump you to the very beginning of that page's text.

Quote:

Originally Posted by BeckyEbook (Post 4118470)
Code from EPUBOfrify:
Find: <span class="com-rorohiko-pagestaker-style">(\d+|[ivxclm]+)</span>

:thumbsup:

DNSB 05-03-2021 10:51 PM

Quote:

Originally Posted by Tex2002ans (Post 4118475)
Placing it at the end makes absolutely no sense. If you say:

"On page 123, author wrote X, Y, Z."

You need the marker to jump you to the very beginning of that page's text.

:thumbsup:

I didn't say it made sense since it doesn't to me. I've just seen it done. Much like the people who use .htm or .xml for file names in epubs, it may not be standard but they don't seem to care.

Tex2002ans 05-04-2021 01:45 AM

Quote:

Originally Posted by DNSB (Post 4118526)
I didn't say it made sense since it doesn't to me. I've just seen it done.

Well, I'd report their off-by-one pages to them. They're likely not even aware of their error.

Monaghan 05-04-2021 10:51 AM

You wonderful people. Thanks for all.


All times are GMT -4. The time now is 10:28 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.