|
|
#1 |
|
Member
![]() Posts: 17
Karma: 10
Join Date: Oct 2012
Device: Calibre
|
Only Convert PDFs with embedded OCRed text to EPUB?
|
|
|
|
|
|
#2 |
|
Staff to 4 Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 10,725
Karma: 2485850
Join Date: Aug 2009
Location: The (original) Silicon Valley, USA
Device: Galaxy Tab 2,Black Astak PEz, K4NT(now Wifes)
|
OCR'd is text.
Image only PDF's may need to be OCR'd. So is your question: "how do I differentiate Text PDF's from Image PDF's?" I would expect that a Image filled PDF file might be quite a bit larger.
__________________
Using: Ubuntu(32 bit):Oneric,Precise and XPpro SP3, W7HP(64)- - Libre Office w/Writer2EPUB
|
|
|
|
|
Enthusiast
|
|
|
|
#3 | ||
|
Member
![]() Posts: 17
Karma: 10
Join Date: Oct 2012
Device: Calibre
|
Quote:
Quote:
Code:
#!/bin/bash
# This script will find all PDFs lacking images in a Calibre library
#
# Run it with this:
# find ~/Calibre\ Library/ -iname "*.pdf" -print0 | xargs -0 -I{} ./pdf_no_images.bash {} 2> /dev/null > "PDFs lacking images.txt"
images=`pdfimages -list "$1" | awk '{print $2}' | grep 0`
if [ -z "$images" ]; then
echo "$1"
fi
|
||
|
|
|
|
|
#4 |
|
Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 228
Karma: 556000
Join Date: Jan 2012
Device: iPad, iPhone, Nook Simple Touch
|
Better to use something like pdftotext and see if it returns nothing. PDF files might contain both images *and* text, and I'm assuming you probably want to convert those as well.
|
|
|
|
|
|
#5 |
|
Member
![]() Posts: 17
Karma: 10
Join Date: Oct 2012
Device: Calibre
|
Oh yes, that's a good point. What I wrote above only finds pure-text PDFs, not mixed text/image ones like the PDFs from Archive.org. I don't think I have many, if any, mixed text/image PDFs, but all my DJVU books are that way. PDFs from Google Books or HathiTrust are mostly images, but they do have a small amount of text for copyright, etc., so making a script ignore that would be more complex.
|
|
|
|
![]() |
| Tags |
| conversion from .pdf, epub, ocr, pdf, text |
| Thread Tools | Search this Thread |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Convert PDF to EPUB in Text not pictures. | looloo | ePub | 6 | 10-03-2012 10:58 AM |
| Problem with EPUB/OCRed PDF and their convertion | tuliouel | Conversion | 2 | 07-24-2012 06:38 AM |
| Convert EPUB to HTML Zip extra meta text | meme | Conversion | 2 | 05-28-2012 01:34 PM |
| text -> epub as a tool to simply convert | ingyu72 | Sony Reader | 0 | 09-17-2009 08:59 PM |