MobileRead Forums - View Single Post - Clara HD Crosscompile a web browser for the KOReader terminal

elinkser · 09-10-2024, 10:44 AM

A FEW UTILITIES FOR FBPDF ON YOUR KOBO

***
***
***

grunfink/html2epub
https://codeberg.org/grunfink/html2epub

$ git clone https://codeberg.org/grunfink/html2epub.git

html2epub is a simple shell script to convert html pages to epub, so it can be used without having to crosscompile for the Kobo.

However, it requires the paste(from the coreutils package) and zip utility programs, which we can acquire from Alpine Linux:

***

coreutils
The basic file, shell and text manipulation utilities
Depends (5)
busybox-binsh**
libacl
libattr
musl*
utmps-libs**

zip
Creates PKZIP-compatible .zip files
Depends (2)
musl*
unzip***

*already got these from elinks and nano installs in Post #3 above.
**already got this from screen install in Post #12 above.
***already got this from base busybox.

* note: Alpine Linux files were migrated from the /mnt/onboard/.adds/koreader/ folder to the /mnt/onboard/.adds/kordir/ folder since Post #14.

DOWNLOAD THE REQUIRED PACKAGES:

$ wget https://dl-cdn.alpinelinux.org/alpin...ils-9.1-r0.apk
$ wget https://dl-cdn.alpinelinux.org/alpin...l-2.3.1-r1.apk
$ wget https://dl-cdn.alpinelinux.org/alpin...r-2.5.1-r2.apk

$ wget https://dl-cdn.alpinelinux.org/alpin...ip-3.0-r10.apk

RUN THESE COMMANDS FROM LINUX DESKTOP:

$ cd myalpine/

$ tar zxvf coreutils-9.1-r0.apk
$ tar zxvf libacl-2.3.1-r1.apk
$ tar zxvf libattr-2.5.1-r2.apk

$ tar zxvf zip-3.0-r10.apk

$ cp usr/bin/coreutils scripts/paste
$ cp lib/libacl.so.1.1.2301 libs/libacl.so.1
$ cp lib/libattr.so.1.1.2501 libs/libattr.so.1

$ mv usr/bin/zip scripts/

NOW CONNECT YOUR KOBO TO YOUR PC:

Copy your paste/zip binaries from the scripts/ folder on the PC to the /mnt/onboard/.adds/kordir/scripts/ folder on the kobo.

Copy your libacl.so.1/libattr.so.1 libs from the libs/ folder on the PC to the /mnt/onboard/.adds/kordir/libs/ folder on the kobo.

Copy your html2epub shell script to the /mnt/onboard/.adds/kordir/scripts/ folder on the kobo.

From kobo terminal or ssh session to kobo:

# . /korenv.sh

# mkdir myhtml2epub

# cd myhtml2epub/

Here we download using the full wget binary we acquired from Alpine Linux (and renamed to 'wgets') in Post #3 above:

# wgets --restrict-file-names=windows -k -nd -D codeberg.org -E -r -A jpg,jpeg,png,gif,svg,htm,html -l1 -Q20M https://codeberg.org/grunfink/html2epub

-np, --no-parent don't ascend to the parent directory
-k, --convert-links make links in downloaded HTML or CSS point to local files
-nd, --no-directories don't create directories
-D, --domains=LIST comma-separated list of accepted domains
--exclude-domains=LIST comma-separated list of rejected domains
-E, --adjust-extension save HTML/CSS documents with proper extensions
-r, --recursive specify recursive download
-A, --accept=LIST comma-separated list of accepted extensions
-R, --reject=LIST comma-separated list of rejected extensions
-l, --level=NUMBER maximum recursion depth (inf or 0 for infinite)
-Q, --quota=NUMBER set retrieval quota to NUMBER

# ls
avatar_default.png favicon.svg index.html
favicon.png html2epub.tmp.html

# html2epub html2epub.epub -ntp html2epub.tmp.html
html2epub recommendation: install 'tidy' or 'tidyp'
Gathering data...
Building .epub file...
adding: mimetype (stored 0%)
adding: META-INF/ (stored 0%)
adding: META-INF/container.xml (deflated 34%)
adding: content.opf (deflated 50%)
adding: part-0001.html (deflated 74%)
adding: toc.ncx (deflated 46%)
Finished.

# fbpdf html2epub.epub
(space to page down, q to quit.)

The epub was created. Now let's try another wget variation:

# wgets --restrict-file-names=windows -nd -E -k -p -Q20M "https://www.google.com"

-nd, --no-directories don't create directories
-E, --adjust-extension save HTML/CSS documents with proper extensions
-k, --convert-links make links in downloaded HTML or CSS point to local files
-p, --page-requisites get all images, etc. needed to display HTML page
-Q, --quota=NUMBER set retrieval quota to NUMBER

# html2epub google.epub -ntp index.html.1.html
html2epub: recommendation: install 'tidy' or 'tidyp'
Gathering data...
Building .epub file...
adding: mimetype (stored 0%)
adding: META-INF/ (stored 0%)
adding: META-INF/container.xml (deflated 34%)
adding: content.opf (deflated 50%)
adding: part-0001.html (deflated 58%)
adding: toc.ncx (deflated 48%)
Finished.

# fbpdf google.epub

The page is there but no images! Let's try again:

# html2epub google.epub -ntp index.html.1.html *.png
html2epub: recommendation: install 'tidy' or 'tidyp'
Gathering data...
Building .epub file...
adding: mimetype (stored 0%)
adding: META-INF/ (stored 0%)
adding: META-INF/container.xml (deflated 34%)
adding: avatar_default.png (stored 0%)
adding: content.opf (deflated 50%)
adding: favicon.png (stored 0%)
adding: googlelogo_white_background_color_272x92dp.png (stored 0%)
adding: nav_logo229.png (deflated 0%)
adding: part-0001.html (deflated 58%)
adding: toc.ncx (deflated 48%)
Finished.

# fbpdf google.epub

Now the image is there.

***
***
***

gonejack/html-to-epub
https://github.com/gonejack/html-to-epub

$ wget https://github.com/gonejack/html-to-...v1.0.26.tar.gz

$ mv v1.0.26.tar.gz html-to-epub-v1.0.26.tar.gz

html-to-epub is a Golang binary to convert html pages to epub, so we have to crosscompile for the Kobo.

However, it is self-contained once statically compiled, and also has another benefit which you will see :

$ tar zxvf html-to-epub-v1.0.26.tar.gz

$ cd html-to-epub-1.0.26/

$ source $HOME/.profile

$ go version
go version go1.20.2 linux/amd64

BUILD FOR DESKTOP:

$ go build
go: downloading github.com/PuerkitoBio/goquery v1.7.1
go: downloading github.com/alecthomas/kong v0.2.17
go: downloading github.com/gabriel-vasile/mimetype v1.3.1
go: downloading github.com/gonejack/get v1.0.9
go: downloading github.com/andybalholm/cascadia v1.2.0
go: downloading golang.org/x/net v0.0.0-20210614182718-04defd469f4e
go: downloading github.com/dustin/go-humanize v1.0.0
go: downloading github.com/go-resty/resty/v2 v2.6.0
go: downloading golang.org/x/sync v0.0.0-20210220032951-036812b2e83c
go: downloading github.com/gofrs/uuid v3.1.0+incompatible

$ mkdir myhtml-to-epub

$ cd myhtml-to-epub/

Here we download just the base html page for the Kobo Developer Forum, using the desktop wget command:

$ wget "https://www.mobileread.com/forums/forumdisplay.php?f=247"
Saving to: ‘forumdisplay.php?f=247’

$ ls
'forumdisplay.php?f=247'

$ ../html-to-epub forumdisplay.php\?f\=247

$ ls
'forumdisplay.php?f=247' images output.epub

With ebook-viewer from calibre, you can see the output.epub:
$ ebook-viewer output.epub

It looks like html-to-epub has gone and downloaded the images linked to by the ‘forumdisplay.php?f=247’ page, even though we did not fetch them with our wgets command.

BUILD FOR KOBO:

$ source ~/koxtoolchain/refs/x-compile.sh kobo env bare
( Prepare build as in https://www.mobileread.com/forums/sh...d.php?t=350054 Post #4, or you could try Linaro build as in Post #1.)

$ env GOOS=linux GOARCH=arm CC=arm-kobo-linux-gnueabihf-gcc CXX=arm-kobo-linux-gnueabihf-g++ go build
go: downloading github.com/PuerkitoBio/goquery v1.7.1
go: downloading github.com/alecthomas/kong v0.2.17
go: downloading github.com/gabriel-vasile/mimetype v1.3.1
go: downloading github.com/gonejack/get v1.0.9
go: downloading github.com/andybalholm/cascadia v1.2.0
go: downloading golang.org/x/net v0.0.0-20210614182718-04defd469f4e
go: downloading github.com/dustin/go-humanize v1.0.0
go: downloading github.com/go-resty/resty/v2 v2.6.0
go: downloading golang.org/x/sync v0.0.0-20210220032951-036812b2e83c
go: downloading github.com/gofrs/uuid v3.1.0+incompatible

$ ls -l html-to-epub
-rwxr-xr-x 1 10881529 html-to-epub

$ file html-to-epub
html-to-epub: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, Go BuildID=_DHZTlkwz4xzPeiVUlaW/Sj0LAzd2dr0VlhzcEumV/nqh8YWiS3s137C5H_5Dy/OB662nwm1KaHmiJTMIeA, with debug_info, not stripped

Copy html-to-epub binary to /mnt/onboard/.adds/kordir/scripts/ folder of kobo.

From kobo terminal or ssh session to kobo:

# . /korenv.sh

# mkdir myhtml-to-epub

# cd myhtml-to-epub/

Here we download just the base html page for the Kobo Developer Forum, using the full wget binary we acquired from Alpine Linux in Post #3 above:

# wgets --restrict-file-names=windows "https://www.mobileread.com/forums/forumdisplay.php?f=247"
Saving to: 'forumdisplay.php@f=247'

# ls
forumdisplay.php@f=247

# html-to-epub -o kobodev.epub forumdisplay.php@f\=247

# ls
forumdisplay.php@f=247 images kobodev.epub

Confirm the images have been added to the epub:

# fbpdf kobodev.epub
(space to page down, q to quit.)

OK let's see if that means we can use the stripped-down busybox version of wget that already comes with the Kobo, removing the need to install the full Alpine Linux version:

# which wget
/usr/bin/wget
# ls -l /usr/bin/wget
lrwxrwxrwx root root /usr/bin/wget -> ../../bin/busybox

# wget "https://www.mobileread.com"
saving to 'index.html'

# html-to-epub -o mobileread.epub index.html

# ls
images index.html mobileread.epub

# fbpdf mobileread.epub

html-to-epub has downloaded the images linked to by the index.html page, even though we could not fetch them with the stripped-down busybox/wget command.

This means we can use the html-to-epub binary without the help of other than built-in commands, BUT the built-in wget still does not work with all sites, unfortunately.

* Oops, actually wget can handle it using the -O option:

# wget "https://www.mobileread.com/forums/forumdisplay.php?f=247"
Connecting to www.mobileread.com (162.55.243.172:443)
wget: can't open 'forumdisplay.php?f=247': Invalid argument

# wget -O index.html "https://www.mobileread.com/forums/forumdisplay.php?f=247"
Connecting to www.mobileread.com (162.55.243.172:443)
saving to 'index.html'
index.html 100% |************************************************* ***| 102k 0:00:00 ETA
'index.html' saved

***
***
***

I have used Koreader to read huge 100MB 1000-page files, but even it couldn't handle a 45MB 576-page pdf file of scanned images. Even the sparse fbpdf reader from Post #16 could barely handle it, but with an intolerable minute delay between page turns.

The mutool utility is included in the attached files fbpdf-build.zip of Post #16 above, and we can use it to split up the pdf file:

# mkdir workdir

# cd workdir/

# mutool draw -L -c gray -o out%d.png big.pdf 1-50

# mutool convert -O compress -o out.pdf out*.png

# ls -l *.pdf
-rwxr-xr-x 47245268 big.pdf
-rwxr-xr-x 8076822 out.pdf

Now we have created a grayscale out.pdf which contains just pages 1-50 of big.pdf.
It took about 15 seconds per page to convert.

Reading in Koreader in reflow mode (with contrast level cranked up a notch) is now acceptable.

***

UPDATE:

So I did the same operation on my (MXLinux-converted) chromebook, except I tried generating half the book at a time (takes about 10 minutes), then tried generating the whole book.
Interestingly, even the whole book is now digestible by Koreader, even though the file size is double.

$ mutool draw -L -c gray -o out%03d.png big.pdf 1-288
$ mutool convert -O compress -o out1-288.pdf out*.png

$ mutool draw -L -c gray -o out%03d.png big.pdf 1-552
$ mutool convert -O compress -o out1-552.pdf out*.png

Remove out002.png to out287.png

$ mutool convert -O compress -o out288-552.pdf out*.png

$ rm *.png

$ ls -l *.pdf
-rwxr-xr-x 47245268 big.pdf
-rwxr-xr-x 8076822 out.pdf
-rwxr-xr-x 49088131 out1-288.pdf
-rwxr-xr-x 95610303 out1-552.pdf
-rwxr-xr-x 47191462 out288-552.pdf

* note: I also changed the %d parameter to %03d to generate sequential file names.

** note: I also tried the qpdf utility as follows:
$ qpdf big.pdf --pages . 1-50 -- out.pdf
and
$ qpdf big.pdf --split-pages=50 -- out.pdf
Result:
conversion was super-quick - only a couple of seconds - but the resulting file was much slower to load and for page turns.

***
***
***

Wget should also be useful for things like sites that require a subscription, although I haven't tried.

This page explains how you might attempt it:
https://stackoverflow.com/questions/...page-with-wget

An example of a useful technique is:
Use "Copy as cURL" in the Network tab of Firefox's browser developer tools and replace curl's flag -H with wget's --header (and also --data with --post-data if needed).

So trying this on the page you get from a login attempt at https://www.mobileread.com/forums/fo...play.php?f=247 yields:

curl 'https://www.mobileread.com/forums/login.php?do=login' -X POST -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64) Gecko Firefox' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8' -H 'Accept-Language: en-US,en;q=0.5' -H 'Accept-Encoding: gzip, deflate, br' -H 'Content-Type: application/x-www-form-urlencoded' -H 'Origin: https://www.mobileread.com' -H 'Connection: keep-alive' -H 'Referer: https://www.mobileread.com/forums/fo...play.php?f=247' -H 'Cookie: bblastvisit=...' -H 'Upgrade-Insecure-Requests: 1' -H 'Sec-Fetch-Dest: document' -H 'Sec-Fetch-Mode: navigate' -H 'Sec-Fetch-Site: same-origin' -H 'Sec-Fetch-User: ?1' --data-raw 'vb_login_username=myname&vb_login_password=&s=&se curitytoken=guest&do=login&vb_login_md5password=.. .'
(truncated-for illustrative purposes only)

Another example would be getting a token from a "Wallabag-like" server.
You could do this with curl:

$ curl -X POST 'http://127.0.0.1:8081/oauth/v2/token' --data 'client_id=myid&client_secret=mysecret' -H 'Content-Type:application/x-www-form-urlencoded'
{
"access_token": "",
"expires_in": 3600,
"refresh_token": "",
"scope": null,
"token_type": "bearer"
}

Or the same thing with wget:

$ wget 'http://127.0.0.1:8081/oauth/v2/token' --post-data 'client_id=myid&client_secret=mysecret' --header 'Content-Type:application/x-www-form-urlencoded'
Connecting to 127.0.0.1:8081... connected.
HTTP request sent, awaiting response... 200 OK
Length: 122 [application/json]
Saving to: ‘token’
token 100%[==================================>] 122 --.-KB/s in 0s
‘token’ saved [122/122]

$ cat token
{
"access_token": "",
"expires_in": 3600,
"refresh_token": "",
"scope": null,
"token_type": "bearer"
}

These may not work with the stripped-down busybox version of wget that already comes with the Kobo, so you may need to install the full Alpine Linux version, as in Post #3.

***
***
***

USE KOREADER AS "BROWSER":

Although Koreader on Kobo doesn't open external links at present (as it does on Android), we can kludge it to enable it.

In a Kobo terminal or SSH/Telnet to Kobo:

# . /korenv.sh

# cd /mnt/onboard/.adds/koreader/

# cd frontend/apps/reader/modules/

# cp readerlink.lua readerlink.lua.bak

Open the readerlink.lua file in nano or vi:

# nano -l readerlink.lua

Code:

...
 138     -- Set up buttons for alternative external link handling methods
 139     self._external_link_buttons = {}
 140     self._external_link_buttons["10_copy"] = function(this, link_url)
 141         return {
 142             text = _("Copy"),
 143             callback = function()
 144                 UIManager:close(this.external_link_dialog)
 145             end,
 146         }
 147     end
...

Compare that code to that of readerhighlight.lua:

# nano -l readerhighlight.lua

Code:

...
  81         ["03_copy"] = function(this)
  82             return {
  83                 text = C_("Text", "Copy"),
  84                 enabled = Device:hasClipboard(),
  85                 callback = function()
  86                     Device.input.setClipboardText(cleanupSelectedText(this.selected_text.text))
  87                     this:onClose()
  88                     UIManager:show(Notification:new{
  89                         text = _("Selection copied to clipboard."),
  90                     })
  91                 end,
  92             }
  93         end,
...

Try modifying the readerlink.lua copy function to look like the readerhighlight.lua copy function:

*** BACKUP ANY FILES BEFORE MODIFYING. IF YOUR FILES DON'T LOOK LIKE THE ABOVE, THIS KLUDGE IS NOT FOR YOU ***

# nano -l readerlink.lua

Code:

 ...
 138     -- Set up buttons for alternative external link handling methods
 139     self._external_link_buttons = {}
 140     self._external_link_buttons["10_copy"] = function(this, link_url)
 141         return {
 142             text = _("Copy"),
 143                         enabled = Device:hasClipboard(),
 144                         callback = function()
 145                                 Device.input.setClipboardText(link_url)
 146                                 UIManager:close(this.external_link_dialog)
 147                                 UIManager:show(Notification:new{
 148                                         text = _("Url copied to clipboard."),
 149                 })
 150             end,
 151         }
 152     end
...

Now the external links copy function should work:

1) In Koreader, open an epub/html file that has external links in it
2) Tap on the external link. The "External link:" dialog should open as usual.
3) Select "Copy"
4) Turn on WiFi. (e.g. I have set spread/pinch gesture for this in Koreader.)
5) Open a terminal. (e.g. I have set two-finger diagonal swipe gesture for this in Koreader.)
6) Type elinks (or wget or whatever command you want.)
7) Tap and hold. The "Clipboard dialog" should open as usual.
8) Select "Paste".
9) Confirm that the pasted URL doesn't contain "rm *" or other such nasties, then press return.
10) You are now browsing the external link!
"q" to quit elinks, "X" to close terminal, and you are right back in your original document in Koreader.

So basically it's tap link, Select "Copy", swipe for terminal, type "elinks/wget", tap screen, select "Paste", and hit return to browse/fetch.

***
***
***

How about browsing without leaving Koreader?

For this we copy the wget111s binary we crosscompiled in Post #1 to the /mnt/onboard/.adds/ folder.
Unfortunately we cannot use the built-in wget command because it leaves many links as relative, and we would not be able to browse by clicking on links of an epub we create with wget.

We also copy the html-to-epub binary we crosscompiled above to the /mnt/onboard/.adds/ folder.

Try modifying the readerlink.lua menu function to add a "add epub" item:

# nano -l readerlink.lua

Code:

...
 153     self._external_link_buttons["15_add_epub"] = function(this, link_url)
 154         return {
 155             text = _("Add epub"),
 156                         callback = function()
 157                                 if not util.pathExists("/mnt/onboard/.adds/ae/") then
 158                                         os.execute("mkdir /mnt/onboard/.adds/ae")
 159                                 end
 160                                 os.execute("rm -r /mnt/onboard/.adds/ae/images/ ")
 161                                 local datetime = os.date("%Y%m%d-%H%M%S")
 162                                 os.execute("/mnt/onboard/.adds/wget111s --ca-certificate=/mnt/onboard/.adds/koreader/data/ca-bundle.crt -k -O /mnt/onboard/.adds/ae/index.html "..'"'..link_url..'"')
 163                                 os.execute("cd /mnt/onboard/.adds/ae && /mnt/onboard/.adds/html-to-epub -o /mnt/onboard/.adds/ae/"..datetime..".epub /mnt/onboard/.adds/ae/index.html")
 164     --                          os.execute("rm /mnt/onboard/.adds/ae/index.html")
 165                                 UIManager:close(this.external_link_dialog)
 166                                 UIManager:show(Notification:new{
 167                                         text = _("Added epub to /mnt/onboard/.adds/ae/ folder"),
 168                 })
 169             end,
 170         }
 171     end
...

Now create an initial html file, index2.html, that we can open in Koreader:

# mkdir mnt/onboard/.adds/ae

# /mnt/onboard/.adds/wget111s --ca-certificate=/mnt/onboard/.adds/koreader/data/ca-bundle.crt -k -O /mnt/onboard/.adds/ae/index2.html "https://www.mobileread.com/forums/forumdisplay.php?f=247"

In Koreader file browser, open, the index2.html file.

1) Click on an external link and select "Add epub".

2) Go back to filemanager and open the epub.

* Instead of crosscompiling wget, you can use the wget binary you got from Alpine Linux (initially installed to /mnt/onboard/.adds/koreader in Post #3, then migrated to /mnt/onboard/.adds/kordir in Post #14).
Just change line 162 as follows:
# nano -l readerlink.lua

Code:

 162                                 os.execute("LD_LIBRARY_PATH=/mnt/onboard/.adds/kordir/libs /mnt/onboard/.adds/kordir/scripts/wgets --ca-certificate=/mnt/onboard/.adds/koreader/data/ca-bundle.crt -k -O /mnt/onboard/.adds/ae/index.html

Similarly, you have to grab the initial index2.html file with:
# LD_LIBRARY_PATH=/mnt/onboard/.adds/kordir/libs /mnt/onboard/.adds/kordir/scripts/wgets --ca-certificate=/mnt/onboard/.adds/koreader/data/ca-bundle.crt -k -O /mnt/onboard/.adds/ae/index2.html "https://www.mobileread.com/forums/forumdisplay.php?f=247"

** You can rewrite the above code to use the built-in wget, but you won't be able to follow links past the first page.

*** For searches, a searches.html file could be kept in the ae/ folder:

Code:

<html>
<body>
<ul>
<li><a href='http://www.nosysearchengine.com/search?q="mobileread"+forum+kobo+developer'>Edit search terms</a>
</ul>
</body>
</html>

***
***
***