Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 01-23-2012, 09:27 AM   #1
wastewater
Junior Member
wastewater began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jan 2012
Device: android
cleanup post scan PDF file

I have a searchable PDF file of a book and I wanted to clean up the images like scan tailor does ( page splitting, deskew, margin crop) . Is there a freeware post-scan processing program for PDF files simalar to Scan Tailor. If such a program does not exist and I convert the PDF file into a Tif file to run through Scan Tailor will I loose the searchable aspect of the PDF file or is there a way to maintain it when converting between the two formats

Any relevant thoughts or comments will be greatly appreciated

thanks
wastewater is offline   Reply With Quote
Old 01-23-2012, 10:43 AM   #2
DSpider
Evangelist
DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.
 
DSpider's Avatar
 
Posts: 450
Karma: 343115
Join Date: Nov 2009
Location: Romania
Device: PW2 2014
TIFF files can't have text layers underneath. You will obviously lose the OCR (text) if you export the PDF as a bunch of images. Even if you could keep it somehow, the page splitting, deskewing, cropping process will mess with the positional OCR. Here's what you could do:
  • export the PDF as a bunch of PNG images - I would advise against JPG because it could make compression artefacts stand out more, which could result in grain or fuzzy text after Scan Tailor
  • run them through Scan Tailor
  • if you really care about the ability to search, highlight text or copy-paste (for a dictionary look-up, maybe?), you could re-apply "good enough" OCR with ABBYY FineReader; alternatively you could look for some GUI based on tesseract
  • export as PDF

Of course, you could go all the way and proofread the OCR either in FineReader or side-by-side, save as .docx or .rtf, track down the fonts, vectorize the cover and any other graphics, do the layout in Word or InDesign and proofread the final product again. This will result in a much smaller file of a substantially better quality. It does take time, yes, but it's a pleasure to read such a book.
DSpider is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
need help w pagination/metadata post-scan ebooker Workshop 1 11-03-2010 01:01 PM
Help: Tips & Tutorials on how to debind, seperate pages & scan a hardback book to PDF thebigalphamale Workshop 4 04-17-2010 01:41 PM
Filling in gaps in a PDF scan Sparrow Workshop 0 08-10-2009 02:50 PM
Unpaper 1.1 book scan post-processor Alexander Turcic News 3 07-07-2009 03:01 PM
Please Help with scan PDF on my Sony reader nalbagli Sony Reader 15 06-02-2009 10:21 AM


All times are GMT -4. The time now is 06:03 PM.


MobileRead.com is a privately owned, operated and funded community.