I am thinking raw video and raw audio piped straight out of ffmpeg using the params I posted in the forum. Then interleave them (perhaps with a 1.5 second pre-roll on the audio so it does not glitch when the video falls one second behind and starts to framedrop now and then.
I have no problem with pitch shifting and speed changing audio using a 3-phase ring buffer in software. Did that stuff on an old sound card that had a programmable DSP in it back in the win3.x days. Made some cool voice modification demos back then. Happy days.

The point is, we can dynamically change sound to match the video, or vice versa.
Regarding what a K3 can do, we may need interrupts (or lpthreads) so we can do stuff while it is sitting in the eink syscalls...