![]() |
#1 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
pdf convert artifacts
what causes this type of distortion ?
FI F GU G RE 60 Cha h se s Chart r Out u line it is meant to say figure 60 chase chart... i think it is when text is along side of images ? |
![]() |
![]() |
![]() |
#2 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
Use the source, Luke. Either use the debug option for converting or use the wizard for the header/footer regex.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 12,444
Karma: 8012886
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Calibre has tremendous problem with multi-column input. Margin notes (text beside images) are a form of multi-column.
The problem arises because bits of text are intermixed in the document, using absolute page positioning to make it appear correct to the eye. Calibre doesn't use positioning. It processes the text in the order and form in which it appears in the document. |
![]() |
![]() |
![]() |
#4 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
that example is not in header or footer , and not multi column either - it is a single line of text above a diagram. not along side an image
it reads figure 60 chase chart outline i will find the source bit within pdf , grab and paste an image... i have seen this in other converts - some letters get pickep up twice and teh sentence gets split into multiple lines.. you can see all the letters of chase chart outline in the orig. post , with the confusing duplicate letters i appreciate the pdf is hard to convert but surely the duplicate letters should not occur ? Last edited by cybmole; 01-16-2011 at 07:25 AM. |
![]() |
![]() |
![]() |
#5 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
What I meant was to look at the XHTML source Calibre produces from the PDF. I'm quite sure the text is presented in exactly the way Calibre converts it. How to work around that is a whole other deal, though.
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 12,444
Karma: 8012886
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Although I can't know without seeing the PDF (actually, the postscript in the PDF), I am virtually certain that the text is being placed on the image using absolute positioning. This is the same method that multi-column uses to place text on the page so it is visually correct.
Back in the days when I was doing raw postscript, I saw documents where the order of the text in the document had zero relationship with the visual order. For example, some postscript generators do bold and shadow by laying the same character down twice, a point or two apart. If you look at the text, you see two characters. Some others do headings after the text, repositioning the heading so that it prints in the right place. I have even seen some where every other line was rendered backwards to aid with justification and avoiding "rivers of white". Capturing text from such a document would be a challenge. The thing to remember with PDF: what you see on the page may have nothing to do with flow of text in the source. The more complex the formatting, the more likely this is to be true. |
![]() |
![]() |
![]() |
#7 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
makes sense - also its a huge file and sigil chokes when asked to open the epub, so I guess I'll just leave it in pdf for now
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Help! Anyone can tell me how to convert word to PDF | iMac | 19 | 01-26-2010 02:59 PM | |
Best way to convert PDF with images? | anjelika | Sony Reader | 4 | 08-21-2009 02:23 AM |
Should I convert pdf? | sammsmom | Sony Reader | 1 | 02-23-2009 02:05 PM |
Convert cbr 2 pdf | jæd | iRex | 1 | 02-13-2009 01:40 AM |
Convert PDF to what??? | astrodad | Workshop | 2 | 12-28-2007 04:54 PM |