Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 03-22-2020, 05:52 PM   #1
MarjaE
Guru
MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.
 
Posts: 924
Karma: 53902736
Join Date: Jun 2015
Device: multiple
Easy way to check for pdfs with no text or buggy text?

Sometimes pdfs just lack text and need ocr. Sometimes they start with text, but lose it to pre-processing bugs. Is there a sort of Quality Check tool for pdfs that can find ones which lack text or have seriously screwed up text?
MarjaE is offline   Reply With Quote
Old 03-23-2020, 09:15 AM   #2
Quoth
the rook, bossing Never.
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 11,154
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
If you can't select any text in any Linux based PDF reader, then it's only an image.
Selecting and pasting one page into a text editor usually shows if it's rubbish OCR only really to provide search.
Quoth is offline   Reply With Quote
Advert
Old 03-23-2020, 04:48 PM   #3
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,567
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
@MarjaE - I'm not aware of any calibre tools that will check pdf content in the way you want.

There maybe a 3rd party utility that can do the check, probably to a single file, meaning you could use it in a script that walked the directory tree.

Best place to ask is in the PDF forum ==>> PDF

BR
BetterRed is offline   Reply With Quote
Reply

Tags
pdf


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Make Sure that You Check Your Text Messages--Amazon May Have Nice Gift for You. GtrsRGr8 Deals and Resources (No Self-Promotion or Affiliate Links) 4 07-10-2019 01:50 PM
Renaming a text file is not so easy roger64 Editor 4 02-25-2016 08:52 AM
PDFs and Hidden Text Layers aidren enTourage Archive 4 04-14-2010 01:23 PM
Missing text in PDFs Pulp Bookeen 9 10-02-2008 10:58 AM
PRS-500 pielrf beta - Text to LRF with Easy TOC, autoflow, etc. EatingPie Sony Reader Dev Corner 9 05-11-2007 10:51 PM


All times are GMT -4. The time now is 08:55 AM.


MobileRead.com is a privately owned, operated and funded community.