03-22-2020, 05:52 PM | #1 |
Guru
Posts: 924
Karma: 53902736
Join Date: Jun 2015
Device: multiple
|
Easy way to check for pdfs with no text or buggy text?
Sometimes pdfs just lack text and need ocr. Sometimes they start with text, but lose it to pre-processing bugs. Is there a sort of Quality Check tool for pdfs that can find ones which lack text or have seriously screwed up text?
|
03-23-2020, 09:15 AM | #2 |
the rook, bossing Never.
Posts: 11,154
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
If you can't select any text in any Linux based PDF reader, then it's only an image.
Selecting and pasting one page into a text editor usually shows if it's rubbish OCR only really to provide search. |
Advert | |
|
03-23-2020, 04:48 PM | #3 |
null operator (he/him)
Posts: 20,567
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
@MarjaE - I'm not aware of any calibre tools that will check pdf content in the way you want.
There maybe a 3rd party utility that can do the check, probably to a single file, meaning you could use it in a script that walked the directory tree. Best place to ask is in the PDF forum ==>> PDF BR |
Tags |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Make Sure that You Check Your Text Messages--Amazon May Have Nice Gift for You. | GtrsRGr8 | Deals and Resources (No Self-Promotion or Affiliate Links) | 4 | 07-10-2019 01:50 PM |
Renaming a text file is not so easy | roger64 | Editor | 4 | 02-25-2016 08:52 AM |
PDFs and Hidden Text Layers | aidren | enTourage Archive | 4 | 04-14-2010 01:23 PM |
Missing text in PDFs | Pulp | Bookeen | 9 | 10-02-2008 10:58 AM |
PRS-500 pielrf beta - Text to LRF with Easy TOC, autoflow, etc. | EatingPie | Sony Reader Dev Corner | 9 | 05-11-2007 10:51 PM |