Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Reading and Management

Notices

Reply
 
Thread Tools Search this Thread
Old 11-13-2022, 01:34 PM   #1
Thertzler
Member
Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.
 
Posts: 13
Karma: 1138306
Join Date: Mar 2018
Device: none
Post Print Page Approximator for EPUB and EPUB3 v1.1.8

After years of reading ebooks with reflowing page numbers and reading progress measured in percents, I discovered KOreader's support for print/reference pages and quickly got hooked. Now I'm one of those people who expect ebooks to retain the page numbering of the print version regardless of font or screen size and it distresses me how few ebooks seem to use the pageList feature.

It distressed me even more that there was no quick way to add a page list yourself to ebooks that don't have it.
- I don't use ADE and don't think that a single page detection algorithm fits all books.
- Calibre can approximate page numbers when converting to kfx but does not offer any option to get those numbers back into an epub.

So I started developing my own tool for the job and I think it's at a point where it's fully usable and produces good results.

Print Page Approximator is a simple command line utility and using it to paginate a book is as simple as this:
Code:
.\page_approximator.exe .\example_book.epub 150
...You should now have a copy of your book with 150 "pages".

As of version 1.1.5 the tool also supports calculating a custom page count based on book contents (characters/words/lines).
Otherwise, it takes any page count you want and calculates page-breaks based on that.

And as of version 1.1.8 you can also "upgrade" books that have non-standard page markers, converting said markers to working print reference pages with page-list entries.

For those who want finer control over how page breaks are generated there are quite a few advanced options available, among them are:
  • [--pagingmode] Decide if page breaks can be inserted within any line, after existing line breaks, or only within lines/paragraphes above a certain character count.
  • [--breakmode] Decide if page breaks should be inserted on the next or previous available whitespace character, or if the script shouldn't care and break the page within a word.
  • [--tocpages] You can even tell the tool on which pages the individual chapters are in the print version and it will take it into account using the ebook's ToC.
  • [--romanfrontmatter] Add a number of pages with Roman numerals in the front matter. Can be in the form of a Roman numeral or a normal integer
  • [--nonlinear] Choose how to handle documents that are designated as 'nonlinear' in the book's spine: append, prepend or ignore.
  • [--unlisted] Choose how to handle documents not listed in the book's spine: append, prepend or ignore.
Details about how to use the more advanced functionality can be found in the Readme and Wiki on GitHub, I don't want to bloat this post with them.

The output of this tool is spec compliant for both the pageList in EPUB2 and the page-list nav in EPUB3, so if a device supports pageList normally, there should be no problem.

Important: For devices/apps that only support an the adobe version of pageList, "page-map" (apparently this includes the standard reader on Kobo, thanks to @Sirtel for testing this) an additional page map file can be generated by appending the flag --page-map.

Personally I don't really have any way to test the results outside of KOreader, so I'd really appreciate feedback about how well page support works on different devices.

I am aware that some might question the point of generating an arbitrary and inaccurate approximation of an already arbitrary and inconsistent metric that is technically obsolete anyway. But I just think it shouldn't be too much to ask that the book that has 344 pages on my shelf also has 344 pages on my tablet and with this it's possible within a few seconds.

Attached are a standalone executable for 64-bit Windows as well as the python source code for other platforms.
*If you're running the script, please note that Python 3.10 and the "ebooklib" library are required.

...I am also thinking of turning this tool into a calibre plugin, but that's a bit of a long term goal.

Links:
Source on GitHub
Attached Files
File Type: zip epub-print-page-approximator-1.1.8_source.zip (35.4 KB, 316 views)
File Type: zip page_approximator_1.1.8_win_x64.zip (10.44 MB, 363 views)

Last edited by Thertzler; 04-02-2023 at 03:15 PM. Reason: Upgrade to 1.1.8
Thertzler is offline   Reply With Quote
Old 11-13-2022, 03:56 PM   #2
Sirtel
Grand Sorcerer
Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.
 
Sirtel's Avatar
 
Posts: 13,961
Karma: 243829933
Join Date: Jan 2014
Location: Estonia
Device: Kobo Sage & Libra 2
Wow, that's certainly something I'll try on my Kobos! Thank you!
Sirtel is offline   Reply With Quote
Old 11-13-2022, 04:24 PM   #3
Sirtel
Grand Sorcerer
Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.
 
Sirtel's Avatar
 
Posts: 13,961
Karma: 243829933
Join Date: Jan 2014
Location: Estonia
Device: Kobo Sage & Libra 2
...aaand I'm stumped already. How exactly does one use the Windows executable? I know nothing about coding and very little about command line operations, and can't figure it out. Where should I put the epub file? When I try to specify the file path in the command line, I get "Invalid argument" or "No such file or directory".
Sirtel is offline   Reply With Quote
Old 11-13-2022, 05:45 PM   #4
Thertzler
Member
Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.
 
Posts: 13
Karma: 1138306
Join Date: Mar 2018
Device: none
Quote:
Originally Posted by Sirtel View Post
Where should I put the epub file? When I try to specify the file path in the command line, I get "Invalid argument" or "No such file or directory".
Technically you can use epubs from any location but to keep it as simple as possible just put it in the same folder as the .exe file.
About the tool not liking the file path... Are there spaces in the epub's file name? If so, the path needs to be in quotation marks.
Thertzler is offline   Reply With Quote
Old 11-13-2022, 05:54 PM   #5
Sirtel
Grand Sorcerer
Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.
 
Sirtel's Avatar
 
Posts: 13,961
Karma: 243829933
Join Date: Jan 2014
Location: Estonia
Device: Kobo Sage & Libra 2
Quote:
Originally Posted by Thertzler View Post
Technically you can use epubs from any location but to keep it as simple as possible just put it in the same folder as the .exe file.
About the tool not liking the file path... Are there spaces in the epub's file name? If so, the path needs to be in quotation marks.
No spaces. I copied the file to the same folder as the exe file, but it still gave me "No such file". Then I put quotation marks around the file name, and got "error: the following arguments are required: pages", even though I had the number added like in your example.

In short, I don't know what exactly I should put in the command line. No matter what I do, I get an error. Presumably I have the command wrong in some way. I don't really know anything about command line.
Sirtel is offline   Reply With Quote
Old 11-14-2022, 06:46 AM   #6
Thertzler
Member
Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.
 
Posts: 13
Karma: 1138306
Join Date: Mar 2018
Device: none
Quote:
Originally Posted by Sirtel View Post
Then I put quotation marks around the file name, and got "error: the following arguments are required: pages", even though I had the number added like in your example.
Well, but it least looks like you're getting close...
Would you mind simply posting the command you are currently trying to run?
Thertzler is offline   Reply With Quote
Old 11-14-2022, 01:24 PM   #7
Sirtel
Grand Sorcerer
Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.
 
Sirtel's Avatar
 
Posts: 13,961
Karma: 243829933
Join Date: Jan 2014
Location: Estonia
Device: Kobo Sage & Libra 2
Quote:
Originally Posted by Thertzler View Post
Well, but it least looks like you're getting close...
Would you mind simply posting the command you are currently trying to run?
Code:
C:\Users\videv\Downloads\page_approximator_win_x64\page_approximator.exe .\EssexDogs.epub 466
No such file or directory. The epub file is in the same folder as the .exe file.
Sirtel is offline   Reply With Quote
Old 11-14-2022, 02:45 PM   #8
Thertzler
Member
Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.
 
Posts: 13
Karma: 1138306
Join Date: Mar 2018
Device: none
So the most likely issue here is that is that your command line is not actually executing in the same director as the executable.

If you are simply opening the command prompt from the start menu it tends to start in your users directory or maybe in some system32 windows folder.
It'll find the page_approximator.exe file because you provided the absolute path to it, not the epub because its path is relative.

So either set the cmd location to that folder before running the other command with this:
Code:
cd C:\Users\videv\Downloads\page_approximator_win_x64
...or use the absolute path for the book as well:
Code:
C:\Users\videv\Downloads\page_approximator_win_x64\page_approximator.exe C:\Users\videv\Downloads\page_approximator_win_x64\EssexDogs.epub 466
Thertzler is offline   Reply With Quote
Old 11-14-2022, 02:52 PM   #9
Sirtel
Grand Sorcerer
Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.
 
Sirtel's Avatar
 
Posts: 13,961
Karma: 243829933
Join Date: Jan 2014
Location: Estonia
Device: Kobo Sage & Libra 2
Many thanks! It worked now!

I'll try the book on my Sage. I suspect it may not work, as Kobo has their own page estimation methods, but can't know for sure until trying.
Sirtel is offline   Reply With Quote
Old 11-14-2022, 03:03 PM   #10
Aleron Ives
Wizard
Aleron Ives ought to be getting tired of karma fortunes by now.Aleron Ives ought to be getting tired of karma fortunes by now.Aleron Ives ought to be getting tired of karma fortunes by now.Aleron Ives ought to be getting tired of karma fortunes by now.Aleron Ives ought to be getting tired of karma fortunes by now.Aleron Ives ought to be getting tired of karma fortunes by now.Aleron Ives ought to be getting tired of karma fortunes by now.Aleron Ives ought to be getting tired of karma fortunes by now.Aleron Ives ought to be getting tired of karma fortunes by now.Aleron Ives ought to be getting tired of karma fortunes by now.Aleron Ives ought to be getting tired of karma fortunes by now.
 
Posts: 1,768
Karma: 16319690
Join Date: Sep 2022
Device: Kobo Libra 2
Maybe it's out of scope, but would you consider adding an alternate mode for word-based page approximation? My understanding is that ADE uses the number of bytes to approximate pages, so books with lots of formatting end up with more pages than books with sparse formatting. It would be a nice ADE alternative to be able to number the pages based on a user-specified number of characters/words/lines per page, instead, so you could have a consistent page metric across all your e-books.
Aleron Ives is offline   Reply With Quote
Old 11-14-2022, 03:04 PM   #11
Sirtel
Grand Sorcerer
Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.
 
Sirtel's Avatar
 
Posts: 13,961
Karma: 243829933
Join Date: Jan 2014
Location: Estonia
Device: Kobo Sage & Libra 2
Quote:
Originally Posted by Aleron Ives View Post
Maybe it's out of scope, but would you consider adding an alternate mode for word-based page approximation? My understanding is that ADE uses the number of bytes to approximate pages, so books with lots of formatting end up with more pages than books with sparse formatting. It would be a nice ADE alternative to be able to number the pages based on a user-specified number of characters/words/lines per page, instead, so you could have a consistent page metric across all your e-books.
You can do that via the Count Pages plugin in Calibre, and just enter the resulting number in this approximator.
Sirtel is offline   Reply With Quote
Old 11-14-2022, 03:24 PM   #12
Aleron Ives
Wizard
Aleron Ives ought to be getting tired of karma fortunes by now.Aleron Ives ought to be getting tired of karma fortunes by now.Aleron Ives ought to be getting tired of karma fortunes by now.Aleron Ives ought to be getting tired of karma fortunes by now.Aleron Ives ought to be getting tired of karma fortunes by now.Aleron Ives ought to be getting tired of karma fortunes by now.Aleron Ives ought to be getting tired of karma fortunes by now.Aleron Ives ought to be getting tired of karma fortunes by now.Aleron Ives ought to be getting tired of karma fortunes by now.Aleron Ives ought to be getting tired of karma fortunes by now.Aleron Ives ought to be getting tired of karma fortunes by now.
 
Posts: 1,768
Karma: 16319690
Join Date: Sep 2022
Device: Kobo Libra 2
That's good to know, but I much prefer to use a simple CLI utility when possible, especially in cases where batch processing is desirable.
Aleron Ives is offline   Reply With Quote
Old 11-14-2022, 04:05 PM   #13
Sirtel
Grand Sorcerer
Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.Sirtel ought to be getting tired of karma fortunes by now.
 
Sirtel's Avatar
 
Posts: 13,961
Karma: 243829933
Join Date: Jan 2014
Location: Estonia
Device: Kobo Sage & Libra 2
Anyway, I'm sorry to report that this method doesn't work in Nickel (the stock reader on eink Kobo devices), either with epubs or kepubs. Epubs on a Kobo use the Adobe page numbering system and kepubs use 1 screen=1 page. There is no way to force the stock reader to use any other system. So this approximator would be of use only in KOReader.

I'm disappointed, even though I kind of expected this. But it would have been nice to use my own page numbering system.
Sirtel is offline   Reply With Quote
Old 11-14-2022, 04:53 PM   #14
Aleron Ives
Wizard
Aleron Ives ought to be getting tired of karma fortunes by now.Aleron Ives ought to be getting tired of karma fortunes by now.Aleron Ives ought to be getting tired of karma fortunes by now.Aleron Ives ought to be getting tired of karma fortunes by now.Aleron Ives ought to be getting tired of karma fortunes by now.Aleron Ives ought to be getting tired of karma fortunes by now.Aleron Ives ought to be getting tired of karma fortunes by now.Aleron Ives ought to be getting tired of karma fortunes by now.Aleron Ives ought to be getting tired of karma fortunes by now.Aleron Ives ought to be getting tired of karma fortunes by now.Aleron Ives ought to be getting tired of karma fortunes by now.
 
Posts: 1,768
Karma: 16319690
Join Date: Sep 2022
Device: Kobo Libra 2
Ouch. I wonder if we could convince Kobo to support this in a future firmware release?
Aleron Ives is offline   Reply With Quote
Old 11-14-2022, 07:25 PM   #15
Thertzler
Member
Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.Thertzler ought to be getting tired of karma fortunes by now.
 
Posts: 13
Karma: 1138306
Join Date: Mar 2018
Device: none
Quote:
Anyway, I'm sorry to report that this method doesn't work in Nickel (the stock reader on eink Kobo devices), either with epubs or kepubs. Epubs on a Kobo use the Adobe page numbering system and kepubs use 1 screen=1 page.
Perhaps the new version I just released could help with that... if the standard Kobo reader fully implements the adobe features, then the page numbering can be manually defined via a page-map.xml.
In the newest release, which I have attached to this post, you can append the flag --page-map to the pagination command and it will add that file to the epub as well.
I've downloaded the desktop version of ADE, and it displays the new page count correctly.

I'd love to update the opening post with this as well but I don't see the edit button mentioned in the FAQ... some restriction for new users I assume?

Quote:
Maybe it's out of scope, but would you consider adding an alternate mode for word-based page approximation? My understanding is that ADE uses the number of bytes to approximate pages, so books with lots of formatting end up with more pages than books with sparse formatting. It would be a nice ADE alternative to be able to number the pages based on a user-specified number of characters/words/lines per page, instead, so you could have a consistent page metric across all your e-books.
I'd say it's slightly out of scope since as mentioned in the first post the tool assumes that you already know how many pages you want.
But maybe a feature like that, some sort of page count suggestion, could be added.
In order to find valid locations for page breaks the page approximator does indeed already create a "raw" representation of the book text without any xml/html markup and tags and it further limits the text to the content of tags that can reasonably be assumed to be visible to the reader (so the content of meta/head/style/script tags etc is omitted).
This could be a pretty good basis for what you want and the character/lines limit would be easily doable.
Words might get a bit more complicated, if only because they can be a bit of a pain to formally define and you might potentially end up with a more than a few hundred word difference depending on whether you count contractions as one or two words and other stuff like that.
Attached Files
File Type: zip page_approximator_v1.1.4_win_x64.zip (10.44 MB, 389 views)
File Type: zip print-page-approximator_v1.1.4_source.zip (31.4 KB, 365 views)
Thertzler is offline   Reply With Quote
Reply

Tags
epub, page breaks, page numbering, python, tool


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
embed epub3 in a web page increase ePub 2 06-30-2020 11:41 AM
Stop audio on another page for Epub3 Barra ePub 7 09-12-2019 10:12 AM
Print page range in viewer outputs single empty page larzeb Library Management 2 04-30-2013 05:24 AM
Start page on fixed layout epub3 brunobruno ePub 12 03-30-2013 01:50 AM
EPUB3 - Float audio pane with page turns _savage ePub 22 01-30-2013 01:38 AM


All times are GMT -4. The time now is 07:01 AM.


MobileRead.com is a privately owned, operated and funded community.