Ok thanks then it likley means that the person modified the original source of pdftohml.
Will have further look after work but sounds that /HtmlOutputDev.cc
is where some changes occurs with introduction of reflow to better handle the paragraph and <br> generations.
// Heuristic: if the last character in str1 is a hyphen,
// turn off addNewline. This will "glue" hyphenated words
// that have been split over multiple lines.
if (reFlow && str1->text[str1->len -1] == '-') {
addNewline=0;
// Also remove the hyphen
str1->len--;
str1->htext->del(str1->htext->getLength() - 1, 1);
}
//printf("coalesce %d %d %f? ", str1->dir, str2->dir, d);
// Is str2 a new paragraph?
if (nextLine && (
======
and after
|