MobileRead Forums - View Single Post - Have Good Source File But In Conversion to PDF Nearly Every Other Word is Split

kovidgoyal · 07-23-2020, 05:46 AM

Nothing particularly surprising. In PDF individual font glyphs are often positioned one by one, not as complete words or sentences. SO when extracting text from PDF, such as for copying, programs have to guess what are word boundaries based on positioning, they sometimes guess wrong.

07-23-2020, 05:46 AM	#3
kovidgoyal creator of calibre Posts: 45,590 Karma: 28548962 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Nothing particularly surprising. In PDF individual font glyphs are often positioned one by one, not as complete words or sentences. SO when extracting text from PDF, such as for copying, programs have to guess what are word boundaries based on positioning, they sometimes guess wrong.