![]() |
#1 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 213
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
|
![]()
Hello,
With some PDFs, I get this kind of garbage when copy-pasting: Code:
�������������������������������������������������������� ���������������������������������������������������������������������������������� ��������������������������������������������������������������������������������� ����������������������������������������������������������������������� ���������������������������������������������������� Code:
Fonts: Bauhaus93 (Type1; embedded) Calibri (Type1; embedded) Calibri,Italic (TrueType (CID); Identity-H) Calibri-Bold (Type1; embedded) Calibri-Bold-KSCms-UHC-H (Type1 (CID); Identity-H; embedded) Calibri-BoldItalic-KSCms-UHC-H (Type1 (CID); Identity-H; embedded) Calibri-Italic (Type1; embedded) Calibri-Italic-KSCms-UHC-H (Type1 (CID); Identity-H; embedded) Calibri-KSCms-UHC-H (Type1 (CID); Identity-H; embedded) NirmalaUI-Bold (Type1; embedded) Thank you. |
![]() |
![]() |
![]() |
#2 |
Still reading
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 14,534
Karma: 108666825
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
OCR the image?
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 213
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
|
I thought about it, but before, I'd like to 1) understand what the problem is and 2) check if the PDF can't be doctored to solve the problem at the root (change fonts?)
|
![]() |
![]() |
![]() |
#4 |
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 531
Karma: 2268308
Join Date: Nov 2015
Device: none
|
Most PDF tools cannot work with identity-encoded fonts. I found the PDFMiner Python package can.
|
![]() |
![]() |
![]() |
#5 |
Still reading
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 14,534
Karma: 108666825
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
Also try export using Ghostscript (or Ghostview GUI of it).
|
![]() |
![]() |
Advert | |
|
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Fixing hyphenation or word breaks from PDF conversion | democrite | ePub | 13 | 12-10-2023 06:36 PM |
Kindle conversion to PDF results in garbage | jgt1942 | Amazon Kindle | 1 | 12-03-2021 06:23 PM |
Problems with fixing PDF's converted to HTML (allignment, font) | SpaceCase42 | Conversion | 4 | 09-23-2011 12:10 AM |
pdf to epub results in 'garbage'? | wulfie | Calibre | 6 | 09-23-2010 08:01 AM |
Blank PDF with Booken - fixing | shane | Bookeen | 6 | 01-30-2009 02:08 PM |