|01-08-2011, 10:03 AM||#1|
Join Date: Jul 2010
Device: Sony PRS-600
convert PDF input issue
Hi there, first off sorry if this was discussed but I was unable as the searches I did returned too many results.
I couldn't find the solution in the calibre manual either.
I have a PDF that is justified text, and in some lines the spacing between the words of a full line seems to be so big that the PDF input module interprets them as "end of line". This of course results in each word becoming its own line.
This _ would _ be _ the _ justified _ line.
I've tried several different in- and output settings as well as different output formats, to no avail. Same problem in output epub, rtf or txt, which is why I suspect the PDF input to be the problem. I also changed the unwrap factor, both in PDF input and structure detection with results show them at work, but not helping this issue.
Anyone can enlighten me?
Last edited by Cid; 01-08-2011 at 10:28 AM.
|01-08-2011, 10:06 AM||#2|
creator of calibre
Join Date: Oct 2006
Location: Mumbai, India
PDF input tries to detect line endings based on spacing between characters. There's no way around that because of the nature of PDF. It will fail for some PDF files and succeed for others. I'd suggest you try a copy paste from your pdf or use acrobat professional to convert it to html. Either of those tools may use different parameters when interpreting the spaces and so might succeed.
|Thread Tools||Search this Thread|
|Thread||Thread Starter||Forum||Replies||Last Post|
|bookmarks in PDF input, and TOC in output||pimpoum||Calibre||3||12-14-2010 01:21 PM|
|how processes calibre PDF Input||gucky||Calibre||1||11-04-2010 11:23 AM|
|PDF to ePub in Calibre - input somewhat scrambled||Seanette||ePub||2||11-04-2010 08:34 AM|
|Bulk Convert problem - prefered format input||captpete||Calibre||4||08-24-2010 10:26 AM|
|PDF Input||asjogren||Calibre||8||04-26-2010 12:04 AM|