![]() |
#1 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 217
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
|
![]()
Hello,
With some PDFs, I get this kind of garbage when copy-pasting: Code:
�������������������������������������������������������� ���������������������������������������������������������������������������������� ��������������������������������������������������������������������������������� ����������������������������������������������������������������������� ���������������������������������������������������� Code:
Fonts: Bauhaus93 (Type1; embedded) Calibri (Type1; embedded) Calibri,Italic (TrueType (CID); Identity-H) Calibri-Bold (Type1; embedded) Calibri-Bold-KSCms-UHC-H (Type1 (CID); Identity-H; embedded) Calibri-BoldItalic-KSCms-UHC-H (Type1 (CID); Identity-H; embedded) Calibri-Italic (Type1; embedded) Calibri-Italic-KSCms-UHC-H (Type1 (CID); Identity-H; embedded) Calibri-KSCms-UHC-H (Type1 (CID); Identity-H; embedded) NirmalaUI-Bold (Type1; embedded) Thank you. |
![]() |
![]() |
![]() |
#2 |
Still reading
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 14,554
Karma: 108666825
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
OCR the image?
|
![]() |
![]() |
![]() |
#3 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 217
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
|
I thought about it, but before, I'd like to 1) understand what the problem is and 2) check if the PDF can't be doctored to solve the problem at the root (change fonts?)
|
![]() |
![]() |
![]() |
#4 |
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 531
Karma: 2268308
Join Date: Nov 2015
Device: none
|
Most PDF tools cannot work with identity-encoded fonts. I found the PDFMiner Python package can.
|
![]() |
![]() |
![]() |
#5 |
Still reading
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 14,554
Karma: 108666825
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
Also try export using Ghostscript (or Ghostview GUI of it).
|
![]() |
![]() |
![]() |
#6 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 217
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
|
"identity-encoded fonts": What's that?
Before I investigate, would you have the commands handy? |
![]() |
![]() |
![]() |
#7 |
Fuzzball, the purple cat
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,305
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
Can you attach a page or two of the PDF that gives you the "garbage" cut and paste result?
|
![]() |
![]() |
![]() |
#8 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 217
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
|
Here's one:
|
![]() |
![]() |
![]() |
#9 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,640
Karma: 9500498
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
|
Out of curiosity I ran it through gimagereader and it found and exported the text correctly...
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Fixing hyphenation or word breaks from PDF conversion | democrite | ePub | 13 | 12-10-2023 06:36 PM |
Kindle conversion to PDF results in garbage | jgt1942 | Amazon Kindle | 1 | 12-03-2021 06:23 PM |
Problems with fixing PDF's converted to HTML (allignment, font) | SpaceCase42 | Conversion | 4 | 09-23-2011 12:10 AM |
pdf to epub results in 'garbage'? | wulfie | Calibre | 6 | 09-23-2010 08:01 AM |
Blank PDF with Booken - fixing | shane | Bookeen | 6 | 01-30-2009 02:08 PM |