This post deals with the technical aspects of setting up the overlay and taking it down. So if you're just curious or you want to use it on other Kindles then read on. Otherwise just move along or it will make your head hurt.
Note on naming: - read only root (FS) is the original filesystem mounted on / that everyone has now
- overlay image (or overlay filesystem) is the filesystem we have stored in file on /mnt/us containing all the modifications we would like to have on read only FS
- writable root (FS) is the virtual filesystem created by combining read only root with overlay image
For simplicity I will not post any shell code here, for that refer to the files. You should not have a problem finding what you need there.
Before the init
To properly setup the overlay on the root FS (or changing the root FS in general) you have to do it at an early stage. And early really means early -- before the init kicks in. To do that you have two choices:
- do it in initrd before init is called
- replace the init with your own and call the real init later
The first option is more difficult since it means patching the cpio archive "hidden" in /dev/mmcblk0. Option 2) is easier, so we choose to replace real init with our shell script that would do all the dirty work. Moreover, the real init already is the shell script! What a surprise. The /sbin/init on K5 checks whether it is supposed to use upstart or (older) SysV init and then calls the real init (which is /sbin/init.exe). Another advantage of using second option is that we have the real root FS already to our disposal.
EDIT: Removed the note that you need /bin/sh in initrd for 2), in fact you would need it for 1). For 2) you have /bin/sh from the root filesystem.
To keep things simple I decided to put our script into /sbin/pre_init and the only change to /sbin/init is execution of /sbin/pre_init right before /sbin/init.exe is called.
Do not forget that this is tricky task. And believe me, you don't want to let Kindle start with only a half of the job done. You can bet on it that it will have some issue with that. That means you have to track all you do so you can roll back later in case of an error.
To setup the writable root FS you have to do the following steps:
- You will need a /proc otherwise mount will refuse to work (busybox?)
- Next you have to make yourself a /dev with everything you will need before init starts udev. Apart from loop devices or partitions this also means /dev/console and /dev/null, and if you want to write on the eink display also /dev/fb0.
- After that you can start mounting. The first thing you need is /var/local because it contains file mntus.params necessary to mount /mnt/base-us. For those who don't know, mmcblk0p4 does not contain /mnt/base-us directly. It contains also a partition table and then the filesystem.
- When you have mntus.params you can mount /mnt/base-us that contains the overlay image. It's better to let 'mntus --fsck mount' prepare the loop device and mount it for us so we don't have to deal with it. However note that 'mntus umount' will just umount the filesystem, but will not free the /dev/loop/0 loop device! (!)
- Before we mount image with an overlay it is a good idea to check it for errors with fsck. But wait! You say that last mount was done in the future?! But how is that ... ah right, the clock is screwed up (!). One problem is that kindle uses it's own epoch and changes the date during boot but doesn't bother storing the new date to hardware clock. The second problem is that when you do a hard reset of your device the clock is reset. Good thing is we can tell e2fsck to ignore the problem with 'mounted in the future', the bad thing is that the only way to do that is by using a configuration file (no command line option). Rather than creating new file on read only root I decided to feed the configuration file to e2fsck on the input (through /dev/stdin).
- When the overlay image exists and is clean we can mount it to /overlay.
- We load mini_fo kernel module contained on the image and mount the writable root on /root-rw created as combination of read only root (on /) and our overlay image (on /overlay).
- This is correct time to do any clean up necessary, for us this means umounting /var/local. Unfortunately we can't umount /dev because it is in use (remember the loop device for /mnt/base-us?). And as for the /proc ... we will need it later.
- Now we're almost ready to change the root to our new writable /root-rw by invoking pivot_root. Before we do that we have to remount the procfs from /proc to /root-rw/proc. Again, we do that so that we can call mount later. After the procfs is where we need it we call 'pivot_root /root-rw /root-rw/root-ro'. This will set new / to be located where /root-rw used to be and place old / where /root-rw/root-ro used to be. In plain English: we will have our writable root on / and the old read only root on /root-ro.
- The only thing left to do is move all previously mounted filesystems (now located on /root-ro/dev, /root-ro/overlay and /root-ro/mnt/base-us) to correct locations: /dev, /overlay, /mnt/base-us
When all is done it's time to invoke the real init. But instead of using the path /sbin/init.exe which points to our writable root we use /root-ro/sbin/init.exe. That way the binary /sbin/init.exe on our writable root is not used and we can properly remove the writable root later during shutdown.
Safety:
To stay safe in case something bad happens I decided to introduce two sentinel files. We don't want users to get stuck with unusable devices, do we? The first file is /NO_PRE_INIT. If this file exists pre_init is not executed and the control is returned to /sbin/init. The file is created automatically at the beginning of pre_init and removed before pre_init pivots the root. If something bad happens it will contain the error message.
Second file is /mnt/base-us/NO_OVERLAY. If it exists the overlay is not mounted and (after proper cleanup) the control is again returned to /sbin/init. It is automatically created before NO_PRE_INIT is removed and it is removed after successful startup (see below).
You are free to create any of these files by hand to
disable the whole process.
At this point existence of NO_OVERLAY also leads to the existence of NO_PRE_INIT. This is mainly to discourage the commoners without Linux experience (and no ssh access). After the whole thing is properly tested we can change things to just keep NO_OVERLAY so user can remove it from the computer by USB mount.
The startup and normal use
EDIT: I forgot to mention that: It is strange, but the Kindle software polutes the root filesystem with files that could prevent proper startup. This is most likely because it doesn't expect writable filesystem and what would silently fail now works. It is good to remove such files before anything else is started. One place to do that might be pre_init, but I rather chose to do it in first init job that is executed (in our case system.conf).
Remember that we have already mounted /mnt/base-us ourselves. Startup job called filesystems.conf doesn't know that and tries to call 'mntus' to mount it. This will fail and the job get's confused and dies. We don't want that. Neither we want anyone to umount the filesystem because we're using it. Solution is easy: replace 'mntus' with fake script that does nothing.
Unfortunately by creating the writable root we have created an environment the kindle software was not ready for. The main problem being that lot of things likes to remount the root writable, do a couple of things and remount back to read only. The problem is two-fold. First we would like to keep the root writable and second mini_fo filesystem doesn't even support remounting. Again replacing '/usr/sbin/mntroot' with a fake script that does nothing solves most of the troubles. But there is still a problem -- /etc/upstart/firsttime script called by system.conf upstart job and system.conf itself. Both try to call 'mount' directly and don't rely on 'mntroot'. So we patch system.conf not to do the remount and not to call '/etc/upstart/firsttime' (it's not a first boot anyway, and it may screw things we did on purpose).
After the framework is up upstart emits the 'framework_ready' event. This triggers our after_boot.conf job which removes NO_OVERLAY sentinel. I assume that at this point all should be OK and user can use USB mount if necessary.
NOTE: Sometimes the NO_OVERLAY is not removed. I don't know yet whether the after_boot.conf is not started at all, or it fails for some reason. (I don't even know how to reproduce this.)
Shutdown
To keep our overlay filesystem free of errors it is a necessary to properly umount the overlay. For that we have to switch back to the read only filesystem.
The upstart job that handles power-off and restarts is called shutdown.conf. In it's original version it doesn't do much besides stopping framework and showing a neat image. For us this is not enough. First we need to stop all other upstart jobs that are still running. This also includes the system.conf script and there lies another issue. This job is expected to never stop and when it does system_monitor.conf job kicks in and causes a restart. Again, we don't like that so we patch the job to ignore situations when system.conf stops but shutdown.conf is running.
After all the upstart jobs are stopped we have to make sure no other programs are running. The only things we want to keep is ourselves, our parent process and the init. The rest gets sent TERM signal and later KILL if necessary.
EDIT: After we kill what we can we try to umount all that we can.
When no other pocess is running we again remount the /proc to /root-ro/proc and pivot the root back to /root-ro. After that we have to free the shutdown.conf file (!) so that we can umount the writable root fs. This is done by starting a script 'shutdown_real'. That handles the rest of umounting and moves all the mounts we can't remove to /root-ro. It also does all the other things shutdown.conf was supposed to do. But, simply running 'shudown_real' will not do. That would change one open file for another. We have to move it onto another filesystem first. However most of them were already umounted, and modifying some filesystem is not nice either. Unfortunately at this point the only tmpfs available to our disposal is /root-ro/dev. It's not clean, but I have decided to use it anyway. That is, the 'shutdown_real' is first copied to /root-ro/dev and then started from there in the background. That way the upstart job can terminate.
While all the steps are important, I have tried to point out the few most important things with (!) in the text.
It took me quite a time to write this post, hopefully it wasn't in vain.