|
![]() |
|
Thread Tools | Search this Thread |
![]() |
#1 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 119
Karma: 16268
Join Date: Apr 2020
Device: none
|
Have Good Source File But In Conversion to PDF Nearly Every Other Word is Split
Hi,
I am converting a file to PDF. HAve used the same source for these files many times. Never had this anomaly before. Once the file arrives in PDF format, it LOOKS great! But, in copying and pasting text from it it seems that the actual text has split many of the words. So that, if you copy and paste the words: with security was demoralizing You get... with se curity was demor aliz ing This does not appear in te PDF itself. All the words look perfect in the PDF. I've tried running the conversion with: Smarten Punctuation an Unsmarten Punctuation on. It is a pure text-to-text conversion. Not from a scanned image. What do you think is happening? Sincerely, Blaine Last edited by Blaineoreski; 07-23-2020 at 05:01 AM. |
![]() |
![]() |
![]() |
#2 |
Member Retired
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 805
Karma: 2091358
Join Date: May 2019
Device: Kindle Oasis 1st Gen, PB Era
|
What format is the source?
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,253
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Nothing particularly surprising. In PDF individual font glyphs are often positioned one by one, not as complete words or sentences. SO when extracting text from PDF, such as for copying, programs have to guess what are word boundaries based on positioning, they sometimes guess wrong.
|
![]() |
![]() |
![]() |
#4 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 119
Karma: 16268
Join Date: Apr 2020
Device: none
|
Hi!
Feel kinda worried because this is an IMPORTANT file for the project I'm doing and I'll need to reply on it for lots of quotes. I can use ANY destination filetype. Is there any other filetype to convert to that may avoid the problem? Thanks!!!!!!!!!!!!!!!!!!! |
![]() |
![]() |
![]() |
#5 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,253
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Not sure what you are asking?? PDF is the ONLY filetype with this issue.
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 119
Karma: 16268
Join Date: Apr 2020
Device: none
|
Hi Kovid,
Ah! Yep - just tried RTF and it's all good. More proof of the POWER of Calibre! Can't thank you enough for developing this application. When I think about all the help you're giving people I feel impressed. Thank you, Kovid! Sincerely, Blaine |
![]() |
![]() |
![]() |
#7 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,253
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You're welcome
![]() |
![]() |
![]() |
![]() |
#8 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 119
Karma: 16268
Join Date: Apr 2020
Device: none
|
Hi Kovid,
Is there any FONT text source to PDF which will largely avoid the glyphs problem? Somehow, I'm thinking a sans-serif font would make things easier? Or...a monospaced font? Here's my thinking: converting to Word - which worked PERFECTLY - lost all the chapter links. So, my main goal is: text source > any filetype that will preserve the navigation structure. What do you think? Sincerely, Blaine |
![]() |
![]() |
![]() |
#9 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,253
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
It doesnt have anything to do with fonts. And conversion to DOCX will preserve links, if it is not, then open a bug report and attach a sample showing the issue.
|
![]() |
![]() |
![]() |
#10 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 119
Karma: 16268
Join Date: Apr 2020
Device: none
|
Hi Kovid,
Ah! You're right. The conversion to Work does keep all the chapters. And! Solves the word spacing problem. Challenge is...the tools I need for editing are all...built around my PDF app. So, need a way to get the file BACK to PDF with the chapters / section navigation. Seems like Word's own Word > PDF didn't respect the chapter structure. Trying with other tools. So the idea is: pure text source > word > PDF with chapter divisions/navigation Any suggestions? Sincerely, Blaine |
![]() |
![]() |
![]() |
#11 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,253
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Sorry no really am not a big PDF guy.
|
![]() |
![]() |
![]() |
#12 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 119
Karma: 16268
Join Date: Apr 2020
Device: none
|
Yep. Yep. Thanks just the same! : )))))
|
![]() |
![]() |
![]() |
#13 |
Junior Member
![]() Posts: 4
Karma: 10
Join Date: Jul 2020
Device: Kindle
|
I noticed the same issue in converting from epub to pdf. Try to revert from Calibre 4 to 3. In the "3" version, text rendering in the pdf output is different, and is possible to select & copy elsewere the text without the "chopping" effect.
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Delete source-file after conversion? | theincredib13 | Library Management | 1 | 04-04-2018 10:41 AM |
Conversion from Word file to Epub | Charlie658 | Conversion | 2 | 11-10-2014 03:45 PM |
Conversion Settings from MS Word Source | tochill | Calibre | 0 | 07-13-2010 02:02 AM |
After split pdf file, use Rasterfarian. | harpum | Sony Reader | 0 | 07-14-2007 01:20 AM |