Hi
Experiment report with BRISS and pdftohtml
Using Linux Ubuntu Lucid 32 bits.
I used Briss 0.0.8 to crop a two-columnn PDF book. I had to take out the first pages which had only one column. Then from page 9, I cropped, first the even, then the odd pages. The result was very nice and exactly what I expected. Of course, some full width titles were cut ...But I could read it easily on my 505.
Then, because I am a perfectionnist, I tried to use pdfreflow with this cropped pdf (after a pdftohtml -xml process). I thought it would be easy because it was now a one column pdf. I got a segmentation error.
So, I thought, maybe only a straight pdftohtml would be sufficient. It did indeed process the file, putting aside all the photos. But the end result was surprising: it was as if pdftohtml had processed the initial double column pdf file in a one column way. Unlegible result.
I am sure I made no mistake and did process the cropped one column pdf file. I can't explain this result. It is as if the BRISS file "remembered" its initial double columns.
To sum it up :
- BRISS is very efficient to crop a double-column PDF file
- processing a BRISS cropped file with pdftohtml or pdfreflow yields bad results
Last edited by roger64; 06-02-2010 at 06:11 AM.
|