Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 09-13-2025, 07:06 AM   #1
Shohreh
Addict
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 219
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
Question Fixing garbage when pasting from PDF?

Hello,

With some PDFs, I get this kind of garbage when copy-pasting:

Code:
��������������������������������������������������������
����������������������������������������������������������������������������������
���������������������������������������������������������������������������������
�����������������������������������������������������������������������
����������������������������������������������������
FWIW, here's the fonts used by such PDF:
Code:
Fonts: Bauhaus93 (Type1; embedded)
Calibri (Type1; embedded)
Calibri,Italic (TrueType (CID); Identity-H)
Calibri-Bold (Type1; embedded)
Calibri-Bold-KSCms-UHC-H (Type1 (CID); Identity-H; embedded)
Calibri-BoldItalic-KSCms-UHC-H (Type1 (CID); Identity-H; embedded)
Calibri-Italic (Type1; embedded)
Calibri-Italic-KSCms-UHC-H (Type1 (CID); Identity-H; embedded)
Calibri-KSCms-UHC-H (Type1 (CID); Identity-H; embedded)
NirmalaUI-Bold (Type1; embedded)
Do you know of a fix?

Thank you.
Shohreh is offline   Reply With Quote
Old 09-13-2025, 02:28 PM   #2
Quoth
Still reading
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 14,671
Karma: 109269703
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
OCR the image?
Quoth is offline   Reply With Quote
Old 09-13-2025, 05:40 PM   #3
Shohreh
Addict
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 219
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
I thought about it, but before, I'd like to 1) understand what the problem is and 2) check if the PDF can't be doctored to solve the problem at the root (change fonts?)
Shohreh is offline   Reply With Quote
Old 09-14-2025, 07:10 AM   #4
Sarmat89
Fanatic
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 531
Karma: 2268308
Join Date: Nov 2015
Device: none
Most PDF tools cannot work with identity-encoded fonts. I found the PDFMiner Python package can.
Sarmat89 is offline   Reply With Quote
Old 09-14-2025, 07:42 AM   #5
Quoth
Still reading
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 14,671
Karma: 109269703
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
Also try export using Ghostscript (or Ghostview GUI of it).
Quoth is offline   Reply With Quote
Old 09-15-2025, 07:13 AM   #6
Shohreh
Addict
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 219
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
"identity-encoded fonts": What's that?

Before I investigate, would you have the commands handy?
Shohreh is offline   Reply With Quote
Old 09-15-2025, 09:50 PM   #7
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,306
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Can you attach a page or two of the PDF that gives you the "garbage" cut and paste result?
willus is offline   Reply With Quote
Old 09-16-2025, 09:12 AM   #8
Shohreh
Addict
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 219
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
Here's one:
Attached Files
File Type: pdf garbage.pdf (33.7 KB, 49 views)
Shohreh is offline   Reply With Quote
Old 09-16-2025, 03:33 PM   #9
Karellen
Wizard
Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.
 
Karellen's Avatar
 
Posts: 1,643
Karma: 9500498
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
Out of curiosity I ran it through gimagereader and it found and exported the text correctly...
Attached Thumbnails
Click image for larger version

Name:	garbagepdf.jpg
Views:	66
Size:	215.2 KB
ID:	218121  
Karellen is offline   Reply With Quote
Old 09-19-2025, 10:06 PM   #10
Shohreh
Addict
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 219
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
Why doesn't SumatraPDF display it correctly?
Shohreh is offline   Reply With Quote
Old 09-20-2025, 12:41 AM   #11
Karellen
Wizard
Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.
 
Karellen's Avatar
 
Posts: 1,643
Karma: 9500498
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
Quote:
Originally Posted by Shohreh View Post
Why doesn't SumatraPDF display it correctly?
Sorry, no idea. I am not familiar with that software.
Karellen is offline   Reply With Quote
Old 09-21-2025, 03:22 PM   #12
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,306
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Thanks for posting an example PDF. Do you have any software that correctly copies and pastes the text from your sample (garbage.pdf)? I tried:

1. Loading the PDF directly into MS Word
2. Extracting the text w/k2pdfopt
3. Copying and pasting from SumatraPDF v3.5.2
4. Copying and pasting from Adobe Reader
5. Copying and pasting from Abby FineReader v16

All of them showed the same thing--basically repeated UTF-8 values of 0xEF 0xBF 0xBD.

I think the PDF itself is likely encoded incorrectly.

-Will

PS. gImageReader is a Tesseract OCR front-end. I don't believe it is extracting the text layer from the PDF. I think it's doing OCR on the sample to get the text.

Last edited by willus; 09-21-2025 at 03:31 PM.
willus is offline   Reply With Quote
Old 09-22-2025, 12:31 PM   #13
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,564
Karma: 20150435
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
I think the font is intentionally garbled, possibly to make copy-paste impossible or very inconvenient. With some patience and a hex editor, it may be possible to find a one-to-one equivalence to characters.

Here's what pdftotext and fontforge give for the text and font.
Attached Files
File Type: zip garbage.zip (9.9 KB, 9 views)
Jellby is offline   Reply With Quote
Old 09-23-2025, 06:58 AM   #14
Shohreh
Addict
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 219
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
Thanks. I had the same problem recently with a PDF from a different source, both opened in SumatraPDF — since it's the default PDF/EPUB reader I use.
Shohreh is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Fixing hyphenation or word breaks from PDF conversion democrite ePub 13 12-10-2023 06:36 PM
Kindle conversion to PDF results in garbage jgt1942 Amazon Kindle 1 12-03-2021 06:23 PM
Problems with fixing PDF's converted to HTML (allignment, font) SpaceCase42 Conversion 4 09-23-2011 12:10 AM
pdf to epub results in 'garbage'? wulfie Calibre 6 09-23-2010 08:01 AM
Blank PDF with Booken - fixing shane Bookeen 6 01-30-2009 02:08 PM


All times are GMT -4. The time now is 06:26 PM.


MobileRead.com is a privately owned, operated and funded community.