I would recommend a computer program like unpaper, ScanTailor, pdfsandwich, readablepdf (that's a little wrapper I wrote around ScanTailor), etc.
It's mostly a question of speed and memory, but of course if anybody wants to help implement something with unpaper, PRs welcome.