MobileRead Forums - View Single Post

roger64 · 06-02-2010, 03:14 AM

Hi

Experiment report with BRISS and pdftohtml

Using Linux Ubuntu Lucid 32 bits.

I used Briss 0.0.8 to crop a two-columnn PDF book. I had to take out the first pages which had only one column. Then from page 9, I cropped, first the even, then the odd pages. The result was very nice and exactly what I expected. Of course, some full width titles were cut ...But I could read it easily on my 505.

Then, because I am a perfectionnist, I tried to use pdfreflow with this cropped pdf (after a pdftohtml -xml process). I thought it would be easy because it was now a one column pdf. I got a segmentation error.

So, I thought, maybe only a straight pdftohtml would be sufficient. It did indeed process the file, putting aside all the photos. But the end result was surprising: it was as if pdftohtml had processed the initial double column pdf file in a one column way. Unlegible result.

I am sure I made no mistake and did process the cropped one column pdf file. I can't explain this result. It is as if the BRISS file "remembered" its initial double columns.

To sum it up :
- BRISS is very efficient to crop a double-column PDF file
- processing a BRISS cropped file with pdftohtml or pdfreflow yields bad results

06-02-2010, 03:14 AM	#29
roger64 Wizard Posts: 2,608 Karma: 3000161 Join Date: Jan 2009 Device: Kindle PW3 (wifi)	Hi Experiment report with BRISS and pdftohtml Using Linux Ubuntu Lucid 32 bits. I used Briss 0.0.8 to crop a two-columnn PDF book. I had to take out the first pages which had only one column. Then from page 9, I cropped, first the even, then the odd pages. The result was very nice and exactly what I expected. Of course, some full width titles were cut ...But I could read it easily on my 505. Then, because I am a perfectionnist, I tried to use pdfreflow with this cropped pdf (after a pdftohtml -xml process). I thought it would be easy because it was now a one column pdf. I got a segmentation error. So, I thought, maybe only a straight pdftohtml would be sufficient. It did indeed process the file, putting aside all the photos. But the end result was surprising: it was as if pdftohtml had processed the initial double column pdf file in a one column way. Unlegible result. I am sure I made no mistake and did process the cropped one column pdf file. I can't explain this result. It is as if the BRISS file "remembered" its initial double columns. To sum it up : - BRISS is very efficient to crop a double-column PDF file - processing a BRISS cropped file with pdftohtml or pdfreflow yields bad results Last edited by roger64; 06-02-2010 at 06:11 AM.