After years of reading ebooks with reflowing page numbers and reading progress measured in percents, I discovered KOreader's support for print/reference pages and quickly got hooked. Now I'm one of
those people who expect ebooks to retain the page numbering of the print version regardless of font or screen size and it distresses me how few ebooks seem to use the pageList feature.
It distressed me even more that there was no quick way to add a page list yourself to ebooks that don't have it.
- I don't use ADE and don't think that a single page detection algorithm fits all books.
- Calibre can approximate page numbers when converting to kfx but does not offer any option to get those numbers back into an epub.
So I started developing my own tool for the job and I think it's at a point where it's fully usable and produces good results.
Print Page Approximator is a simple command line utility and using it to paginate a book is as simple as this:
Code:
.\page_approximator.exe .\example_book.epub 150
...You should now have a copy of your book with 150 "pages".
As of version 1.1.5 the tool also supports calculating a custom page count based on book contents (characters/words/lines).
Otherwise, it takes
any page count you want and calculates page-breaks based on that.
And as of version 1.1.8 you can also "upgrade" books that have non-standard page markers, converting said markers to working print reference pages with page-list entries.
For those who want finer control over how page breaks are generated there are quite a few advanced options available, among them are:
- [--pagingmode] Decide if page breaks can be inserted within any line, after existing line breaks, or only within lines/paragraphes above a certain character count.
- [--breakmode] Decide if page breaks should be inserted on the next or previous available whitespace character, or if the script shouldn't care and break the page within a word.
- [--tocpages] You can even tell the tool on which pages the individual chapters are in the print version and it will take it into account using the ebook's ToC.
- [--romanfrontmatter] Add a number of pages with Roman numerals in the front matter. Can be in the form of a Roman numeral or a normal integer
- [--nonlinear] Choose how to handle documents that are designated as 'nonlinear' in the book's spine: append, prepend or ignore.
- [--unlisted] Choose how to handle documents not listed in the book's spine: append, prepend or ignore.
Details about how to use the more advanced functionality can be found in the
Readme and Wiki on GitHub, I don't want to bloat this post with them.
The output of this tool is spec compliant for both the pageList in EPUB2 and the page-list nav in EPUB3, so if a device supports pageList normally, there should be no problem.
Important: For devices/apps that only support an the adobe version of pageList, "page-map" (apparently this includes the standard reader on Kobo, thanks to @Sirtel for testing this) an additional page map file can be generated by appending the flag
--page-map.
Personally I don't really have any way to test the results outside of KOreader, so I'd really appreciate feedback about how well page support works on different devices.
I am aware that some might question the point of generating an arbitrary and inaccurate approximation of an already arbitrary and inconsistent metric that is technically obsolete anyway. But I just think it shouldn't be too much to ask that the book that has 344 pages on my shelf also has 344 pages on my tablet and with this it's possible within a few seconds.
Attached are a standalone executable for 64-bit Windows as well as the python source code for other platforms.
*If you're running the script, please note that Python 3.10 and the "ebooklib" library are required.
...I am also thinking of turning this tool into a calibre plugin, but that's a bit of a long term goal.
Links:
Source on GitHub