View Single Post
Old 01-23-2013, 03:25 PM   #5
Junior Member
nymano began at the beginning.
Posts: 3
Karma: 10
Join Date: Jan 2013
Device: sony
Let go back to orign. Reason of the question isn't to say or bargain on version but trying to fix an issue. Also share some research result if it could benefit other.

I have a pdf file that i wanted to translate into epub to optimize reading.
I face 3 issues and some additionnal minor one.
2 can be fixed though regexp replacement rules, many minor are handled pretty well throught the caliber heuristic mode.

Unfortunately the paragraph issue does exist .
That is why i started some hypothesis and direction, but if the creator of the solution state he build with latest version it means investigation of resolution is somewhere else.

That is why i made the second post, as it sounds the changes isn't in the in depth of the library but rather in another file of the pdftohtml itself which hasn't been changed for a long time.

The paragraph issue mention in the article does exist for me and if i used the pdf of that article the output goes into the element below.
you will see that each end of visual line as a <br> but in reality the paragraph end is later. So i believe this is confirming calibre process same way, and as such if someone made some improvment could be beneficial to have a look....
Now i'm a newbie to all that, but truely the end result could be improved and if that is the solution ...

I don't have a developper worstation, neither a microsoft compiler , i started to download cygwin, gcc, several make imake and cmake, but sounds this is not very productive for the number of files + needs to identify the dependencies lib.

<i><b>From Wikipedia, the free encyclopedia</b></i><br>
Douglas NoŽl Adams (11 March 1952 – 11 May 2001) was an En-<br>
glish author, comic radio dramatist, and musician. He is best<br>
known as the author of the <i>Hitchhiker’s Guide to the Galaxy </i>series.<br>
<i>Hitchhiker’s </i>began on radio, and developed into a “trilogy” of five<br>
books (which sold more than fifteen million copies during his life-<br>
time) as well as a television series, a comic book series, a computer<br>
game, and a feature film that was completed after Adams’ death.<br>
The series has also been adapted for live theatre using various<br>
scripts; the earliest such productions used material newly written<br>
by Adams. He was known to some fans as <i>Bop Ad </i>(after his illegi-<br>
ble signature), or by his initials ‘DNA’; he was born the year before<br>
the elucidation of the structure of “<i>the meaning of life</i>” or D.N.A. by<br>
Francis Crick and James Watson in Cambridge i.e. where he was<br>
In addition to <i>The Hitchhiker’s Guide to the Galaxy</i>, Douglas Adams<br>
wrote or co-wrote three stories of the science fiction television se-<br>
nymano is offline   Reply With Quote