02-09-2009, 05:32 PM | #1 |
Groupie
Posts: 159
Karma: 170
Join Date: Feb 2009
Device: PRS-505
|
Expert help required : Cleaning bad pdf scans
Hi guys,
well here's my first post, i have looked a bit for my answer but couldn't find anything. So here is my problem : I have about 20 books of 100 pages like these, all in pdf format. Needless to say i need to clean them up. Here is what i need to do : 1:Extract all pdf to images. (what format is best? tiff?) 2: Remove the black borders (crop them) 3: Divide the images in half so only one page shows per pdf page. 4: Center all the pages and make them viewable on a prs-505. I'm guessing i will have to extract the images from the pdfs and work with those, but doing one by one for each 200+ pages book seems crazy. Is there an easy way of doing this by batch and what software would i need, free or not, to extract, crop, divide and resize. Any help is appreciated thanks! Edit: Software tested : adobe acrobat, readiris, photoshop, amber pdf converter and some more... couldn t find any way to batch thing up atleast a bit... will probably need to do 4 or so steps but hopefully not manually crop over 1000 pages... . Last edited by Student1; 02-09-2009 at 05:51 PM. Reason: adding information |
02-10-2009, 04:51 AM | #2 |
Addict
Posts: 234
Karma: 214
Join Date: Nov 2008
Device: Galaxy Note 3, Galaxy NotePro 12.2, InkBook
|
You could use Adobe Acrobat (not Reader). It supports both cropping and dividing. There would be no need to extract images.
Batch processing won't work here as my guess is that all these 20 documents have different sizes (i.e. need to have cropping set individually). |
Advert | |
|
02-10-2009, 05:30 AM | #3 | |
Groupie
Posts: 159
Karma: 170
Join Date: Feb 2009
Device: PRS-505
|
Quote:
thanks, i have tried to use it, but it was just too long. I have found a software that claims that can crop based on black borders... pixedit, will give it a try and will report back! Last edited by Student1; 02-10-2009 at 05:44 AM. |
|
02-10-2009, 10:34 AM | #4 |
Retired & reading more!
Posts: 2,764
Karma: 1884247
Join Date: Sep 2006
Location: North Alabama, USA
Device: Kindle 1, iPad Air 2, iPhone 6S+, Kobo Aura One
|
I have ABBYY Finereader, an OCR program. It does all those things (but only semi batch - since all the crops are slightly different) plus you can OCR it, save it to a MS Word file and use BookDesigner to change it to LRF.
Note: it will require more work if there are images but it works well with text and few images. (Also not cheap) |
02-10-2009, 01:49 PM | #5 |
Grand Sorcerer
Posts: 5,185
Karma: 25133758
Join Date: Nov 2008
Location: SF Bay Area, California, USA
Device: Pocketbook Touch HD3 (Past: Kobo Mini, PEZ, PRS-505, Clié)
|
I adore Pixedit; have been using it for years. It will indeed crop the borders, but (unless there's a newer, updated version) won't do so automatically.
Quickest crop is using the selection tool (the one that's on by default when you open a document) to surround the content you want to keep, and then shift-delete to remove everything outside of it. You can also use the auto-deskew, either page-by-page or for a whole document. However, it sometimes picks an odd horizontal point and skews the whole page badly; you can Undo this (on each page individually, rather than the whole set at once) and manually deskew the page instead. Filtering to remove objects smaller than 3 pixels will get rid of most speckles for up-to-300dpi scans. When re-setting the filter numbers, watch your I's and .'s; when it starts to remove dots from letters & punctuation, it's set too high. |
Advert | |
|
02-10-2009, 03:48 PM | #6 |
Groupie
Posts: 159
Karma: 170
Join Date: Feb 2009
Device: PRS-505
|
Thank guys, i' ll try these softwares and see how it goes! I tried a few ocr softwares to test but wasn t satisfied, seems it always messes up on some pages (when there are graphics). Firereader seems interesting, if it can split pages in half its a done deal ! Thanks for your help guys i ll explore a bit and see from there!
|
02-10-2009, 04:46 PM | #7 |
Groupie
Posts: 159
Karma: 170
Join Date: Feb 2009
Device: PRS-505
|
Wow ABBYY Finereader is a BEAST ! Does it almost perfect! Even with pictures, and thats just on the default setting without adjusting anything... it converted the book almost flawlesly to word with all images intact! Still 2 pages, but i haven t messed with the options yet, just wanted to thank you for letting me know about this powerfull soft! Even Omnipage 16 pro could not do what ABBYY Finereader can do! I am amazed!!!
|
02-10-2009, 07:59 PM | #8 |
Junior Member
Posts: 6
Karma: 10
Join Date: Jan 2009
Device: Bebook
|
I would probably do something like...
1) Use pdfimages to extract all images. 2) Open a few images and figure out a color curve that pushes all text to black and most background to white. Save a gradient of this that imagemagick can grok. Figure out where to set some basic crop boxes. This assumes 3) Use imagemagick to crop and split the series of images all at once. 4) Use imagemagick to apply the color curve to all the images. 5) Do something useful with the series of converted images... DjVu, OCR, whatever. |
02-11-2009, 02:13 PM | #9 |
Groupie
Posts: 159
Karma: 170
Join Date: Feb 2009
Device: PRS-505
|
Thanks for the alternative way, what i used was easy reader 9 to crop and auto divide the pages. Cropping tools were very fast, had to do them one by one but could keep the same dimensions and still switch to the next page. Then i converted to tiff as i wanted to save in pdf/a image only (no ocr). Used adobe acrobat to rebuild the tiff into 1 pdf and saved as pdf/a. Worked perfect! Thanks for all your help!!!
|
02-11-2009, 08:28 PM | #10 |
Groupie
Posts: 159
Karma: 170
Join Date: Feb 2009
Device: PRS-505
|
Maybe one last question, it seems my pdf are 1o times larger (40 megs)than the original files, i surely didn t add any more details as i used the same scanned images from the original (4 megs). I did save in pdf/a, so any tip on dropping the size without changing the original quality?
|
02-11-2009, 09:23 PM | #11 |
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Try unpaper (post-processing scanned and photocopied book pages) to clean up bad .pdf scans.
It's an option callable from within PDFRead v1.8. You enter its options in the (white) input box as seen in the GUI input screen below: Last edited by nrapallo; 02-11-2009 at 09:46 PM. Reason: added GUI screen picture |
02-11-2009, 10:09 PM | #12 |
Groupie
Posts: 159
Karma: 170
Join Date: Feb 2009
Device: PRS-505
|
Thanks going to give it a try!
|
03-03-2009, 05:57 AM | #13 | |
Junior Member
Posts: 1
Karma: 10
Join Date: Mar 2009
Device: none
|
PixEdit
Quote:
Regards HOH |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Sony PRS-600 for PDF Magazine Scans? | andycorleone | Which one should I buy? | 9 | 11-24-2009 05:41 AM |
Bad PDF output | B81 | 1 | 09-21-2009 05:20 AM | |
Cleaning bad characters | alexxxm | Sony Reader | 27 | 01-10-2008 02:49 AM |
PDF Book Scans? | jalm1 | Sony Reader | 2 | 02-05-2007 04:48 PM |
PDF documents made from scans on ebook readers? | claudioita | Sony Reader | 7 | 11-28-2006 09:47 AM |