View Single Post
Old 08-11-2014, 09:48 AM   #879
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by pmarty View Post
Greetings to everyone, this is my first post to the forum.

k2pdfopt -col 1 -cbox 0,1 -de 1.5 -x -ui- 43.pdf
  • When joining lines the baselines are not properly aligned (see warehouses on the second line of output). The line being attached appears to be systematically lower.
  • Indented blocks like headings (1.2 Overview of Database Management System) and ordered lists (1. Conventional users (...)) are not joined at all.
I'm asking for your help on tweaking k2pdfopt parameters to resolve above issues (if possible). See my attached files. The output is already pretty decent, just needs a few polishing touches

Best regards,
pmarty
@pmarty -- Welcome to the MR forums. Thank you for your detailed post and for including your source document. I believe the joining problem with "warehouses" is because your source doc is slightly skewed. Try using -as to autostraighten it. It may not work because there is a threshold amount which your doc has to be rotated by for k2pdfopt to apply the straightening procedure (since it has a cpu cost). It will echo confirmation to the screen if it straightens a page. The second issue is because of the indentation of the following lines. K2pdfopt uses some fuzzy logic to decide when to join lines, and it is hard to make it foolproof. There currently are not any options that affect that behavior.
willus is offline   Reply With Quote