View Single Post
Old 06-02-2010, 03:14 AM   #29
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Hi

Experiment report with BRISS and pdftohtml

Using Linux Ubuntu Lucid 32 bits.

I used Briss 0.0.8 to crop a two-columnn PDF book. I had to take out the first pages which had only one column. Then from page 9, I cropped, first the even, then the odd pages. The result was very nice and exactly what I expected. Of course, some full width titles were cut ...But I could read it easily on my 505.

Then, because I am a perfectionnist, I tried to use pdfreflow with this cropped pdf (after a pdftohtml -xml process). I thought it would be easy because it was now a one column pdf. I got a segmentation error.

So, I thought, maybe only a straight pdftohtml would be sufficient. It did indeed process the file, putting aside all the photos. But the end result was surprising: it was as if pdftohtml had processed the initial double column pdf file in a one column way. Unlegible result.

I am sure I made no mistake and did process the cropped one column pdf file. I can't explain this result. It is as if the BRISS file "remembered" its initial double columns.

To sum it up :
- BRISS is very efficient to crop a double-column PDF file
- processing a BRISS cropped file with pdftohtml or pdfreflow yields bad results

Last edited by roger64; 06-02-2010 at 06:11 AM.
roger64 is offline   Reply With Quote