Preprocess does very little for pdf aside from some more aggressive chapter detection patterns.
I haven't seen this sort of problem with any pdfs, aside from multi-column pdfs, which aren't supported at this time. The line wrapping is set under the pdf input options, current default is .45, you can set this lower to make the unwrapping more aggressive.
You could also open a bug with the pdf's, but depending on the cause it may not be addressed until the new pdf engine is ready. In any event it could be useful to open bugs so that the pdfs are used as test cases against the new engine.
|