it looks like a common task: "mark every letter 'p' but skip tags and named entities".
you can "cheat" by converting named entities to numbers which is recommended for ebooks (if i am correct?), but i am looking for "elegant" solution for this problem.