More thoughts on "buffered input" (if the file is too large and you can't figure out how to make Python handle the whole thing at once, and probably a good idea to do anyway, as you never know what kind of system someone might want to run it on, and what the limits of their resources will be)...
IF you're planning on generating a table of contents, or any other set of section listings at the front of the file, then what might be best is something like this:
1) Maintain, entirely in memory, the information about the overall structure of the file (sections, headings, etc) along with "section numbers" or spool file offset for each section. Normally, I wouldn't expect there to be more than a few dozen entries in this structure, maybe up to a few hundred, but that might change depending upon what all info you chose to put there.
2) Create the input buffer and initialize the overall structure (which would also involve starting the first section, it's first block, etc).
3) Create a temporary spool file to hold 'blocks' of processed data. Each time you finish a block in a fashion that you know you won't need to refer to it again until the "dump phase", write it out the temp spool. As you detect new sections/chapters/etc, create a new entry on the "overall structure" heap (see #1 above) with the needed information, including the current point in the spool file (length).
4) Process, process, process...
5) When finished with input, you now start the actual output file using the information in the overall structure, and then dump the rest of the file by re-reading the temp spool file and formatting it to the output in conjunction with the info in the overall structure heap.
Something along that line would allow you to process pretty much ANY size file on almost any machine, and would allow you to do Tables of Content, etc. pretty easily.
Another thing you could do is include, at the front of the document, notes to the user (perhaps command line option, or perhaps write to a separate output file, or just to standard out), information about potential problem areas, so the user would know where to look. ...But I think you were already doing that, so never mind.