Dear All,
I'm new to Calibre, however those of you who are not surely know about the problem of broken lines when converting PDF to ePUB. <BR> codes appear wherever they want to and split text into thousands of passages which looks weird.
This article (
https://dearauthor.com/ebooks/calibr...nversion-tips/) suggests using Heuristic Processing during conversion to get rid of <BR>s, but it didn't work for me - I used the range from 0.4 to 0.6 with absolutely no result.
The same article proposes to use Search & Replace function and it
was a solution in my case! I used the following logic:
\. +<br>(*SKIP)(*FAIL)|\<br>|\d +<br>
I assumed that <BR>s after dot (".") were an
author-defined start of the new passage, so i didn't touch them (
\. +<br>(*SKIP)), while standalone <BR>s (
\<br>) and <BR>s which follow any word (
\d +<br>) were replaced with nothing (= deleted), as almost always
they were breaking sentence into useless passages.
Everything would have been prefectly fine, except one thing: the above-mentioned algorythm deletes
"useful" <BR>s after headlines, which are usually highlighted with <b> code (
<b>THIS IS HEADLINE </b><br>) and paragraphs
(chapters???), which are highlighted with <a id> code (
<a id="p8"></a> <br>).
So, what I need is to add an exception to my algorythm so that <BR>s are
not deleted when they follow
</a> and
</b> codes. I played around with quite a number of different variants, but still can't find my Grails. Possibly
(*SKIP)(*FAIL) architecture does not suppose
multiple skip logic: I ignore 1 parameter from the very beginning and want to add 2 more - so finally 3 in total.
Any thoughts?