06-15-2020, 02:47 PM | #1 |
Connoisseur
Posts: 65
Karma: 2076068
Join Date: Apr 2017
Device: none
|
ABBYY for compression?
I heard that ABBYY Fine Reader does a better job at compression than Acrobat. I’ve been using Acrobat for making compressed PDFs and have not been satisfied with the results.
Q 1: I was wondering if I could use ABBYY to re-compress my existing PDFs, or if I would have to use ABBYY to create the PDFs from scratch ? Q 2: Also, is ABBYY SPRINT that comes bundled with document scanners, able to compress PDFs just as well as the full version of ABBYY? Thanks. |
06-17-2020, 06:19 PM | #2 |
Groupie
Posts: 184
Karma: 2019866
Join Date: Feb 2018
Device: Kobo Aura-One (using KOReader app), Boox Note-3, iPad(s)
|
Abbyy's OCR engine is probably the best there is, but as for scans, I usually prefer Adobe Acrobat's Clearscan mode over Abbyy's compression (I've been using only older 11 & 12 Abbyy versions though), because one thousand page textual book (A5 or A4) will usually be turned into a suitable 20-40 MB sized Clearscan pdf file (depending on the letter size and number of graphs/tables/pictures therein), also pretty easily flipped through on my older eink readers, whereas Abbyy's pdf scans (OCR text layer behind the scanned image) will not be as smooth to flip through i.e. they would need newer readers with stronger processors and more memory.
As for Q1, Yes, you can use ABBYY to re-compress your existing PDFs, or you can use some pdf editor to decompress pdf pages beforehand (to recreate original images), and then to feed those images to ABBYY, if that would be quicker. Last edited by Marinolino; 06-17-2020 at 07:34 PM. |
Advert | |
|
06-19-2020, 01:35 AM | #3 | |
Connoisseur
Posts: 65
Karma: 2076068
Join Date: Apr 2017
Device: none
|
Quote:
I use Acrobat Pro XI and I've never seen Clearscan mode anywhere in the Tools menu. |
|
06-19-2020, 10:40 AM | #4 | |
Groupie
Posts: 184
Karma: 2019866
Join Date: Feb 2018
Device: Kobo Aura-One (using KOReader app), Boox Note-3, iPad(s)
|
Quote:
It is in Tools/Text Recognition section. You should click on the "Edit" button therein, for the pop-up window to appear, with options to choose a desired Language, Pdf Output Style and Downsample resolution. Clearscan has been renamed to "Editable text and images" in the past years, and they say it is not available for Acrobat Standard version anymore. https://community.adobe.com/t5/acrob...7036679?page=1 Adobe Acrobat DC Help https://images-na.ssl-images-amazon....1J5kl6swPS.pdf Scanning tips from page 180: " • Acrobat scanning accepts images between 10 dpi and 3000 dpi. If you select Searchable Image or ClearScan for PDF Output Style, input resolution of 72 dpi or higher is required. Also, input resolution higher than 600 dpi is downsampled to 600 dpi or lower. • To apply lossless compression to a scanned image, select one of these options under the Optimization Options in the Optimize Scanned PDF dialog box: CCITT Group 4 for monochrome images, or Lossless for color or grayscale images. If this image is appended to a PDF document, and you save the file using the Save option, the scanned image remains uncompressed. If you save the PDF using Save As, the scanned image may be compressed. • For most pages, black-and-white scanning at 300 dpi produces text best suited for conversion. At 150 dpi, OCR accuracy is slightly lower, and more font-recognition errors occur; at 400 dpi and higher resolution, processing slows, and compressed pages are bigger. If a page has many unrecognized words or small text (9 points or smaller), try scanning at higher resolution. Scan in black and white whenever possible. • When Recognize Text Using OCR is disabled, full 10-to-3000 dpi resolution range may be used, but the recommended resolution is 72 and higher dpi. For Adaptive Compression, 300 dpi is recommended for grayscale or RGB input, or 600 dpi for black-and-white input. • Pages scanned in 24-bit color, 300 dpi, at 8-1/2–by-11 in. (21.59-by-27.94 cm) result in large images (25 MB) before compression. Your system may require 50 MB of virtual memory or more to scan the image. At 600 dpi, both scanning and processing typically are about four times slower than at 300 dpi. • Avoid dithering or halftone scanner settings. These settings can improve the appearance of photographs, but they make it difficult to recognize text. • For text printed on colored paper, try increasing the brightness and contrast by about 10%. If your scanner has color-filtering capability, consider using a filter or lamp that drops out the background color. Or if the text isn’t crisp or drops out, try adjusting scanner contrast and brightness to clarify the scan. • If your scanner has a manual brightness control, adjust it so that characters are clean and well formed. If characters are touching, use a higher (brighter) setting. If characters are separated, use a lower (darker) setting. " Last edited by Marinolino; 06-19-2020 at 03:39 PM. |
|
06-19-2020, 08:07 PM | #5 |
Connoisseur
Posts: 65
Karma: 2076068
Join Date: Apr 2017
Device: none
|
|
Advert | |
|
06-19-2020, 09:35 PM | #6 | |
Groupie
Posts: 184
Karma: 2019866
Join Date: Feb 2018
Device: Kobo Aura-One (using KOReader app), Boox Note-3, iPad(s)
|
Quote:
After you open your scanned pdf in Acrobat, you will be able to choose (click on) "In This File", and then in the opened pop-up window there will be shown your current OCR settings (language, exact image or clearscan, resolution) and you will also see the "edit" button, that you can use to change the settings. Or you can click on an available "In Multiple files" option, and then choose your scanned file(s) and desired settings from there. https://helpx.adobe.com/acrobat/11/u...nned_documents Last edited by Marinolino; 06-19-2020 at 11:22 PM. |
|
06-20-2020, 02:39 AM | #7 |
Connoisseur
Posts: 65
Karma: 2076068
Join Date: Apr 2017
Device: none
|
Ah...I see now. Thanks.
When I click on the Text Recognition, I get this pop up menu. I've always just clicked 'OK' because I'm not trying to 'edit' the document, just OCR it. 'Edit' doesn't seem inuitive for doing OCR. By clicking 'OK' it goes ahead and OCR the document. I've always just stopped there. However, when I do click 'Edit', other options come up: - Searchable Image - Searchable Image Exact - Clearscan The Clearscan option... I've always tried to reduce the PDF size by using the PDF optimizer in another tool menu. But, if I remember, it does reduce the file size, but strips the OCR layer off. I'd like to reduce the file size while retaining high quality. I don't need a 50% size reduction, but 20-30% would be nice. Last edited by Bookchin; 06-20-2020 at 02:41 AM. Reason: spelling |
06-21-2020, 05:57 PM | #8 | |
Groupie
Posts: 184
Karma: 2019866
Join Date: Feb 2018
Device: Kobo Aura-One (using KOReader app), Boox Note-3, iPad(s)
|
Quote:
For my text books (without many pictures and graphics therein) Clearscan mode applied to 300 dpi input files is usually good enough for me, and it results in 2-4 MB pdf file per 100 pages i.e. 10-20 MB pdf for 500 page book. I'd quickly crop double-paged scans and trim its margins using Briss or k2pdopt beforehand. In the future, if you want even smaller and neater pdfs, you can use Scantailor before applying Clearscan OCR , to automatically crop double-paged scans, deskew and despeckle scanned images, trim the margins, remove the background etc. https://www.youtube.com/watch?v=Edfs2_YJhx4 https://www.youtube.com/watch?v=dHZmTYTVL44 Last edited by Marinolino; 06-22-2020 at 12:56 PM. |
|
06-22-2020, 12:59 AM | #9 | |
Connoisseur
Posts: 65
Karma: 2076068
Join Date: Apr 2017
Device: none
|
Quote:
Thanks. I read the Adobe instructions for scanning that you linked to, and I'm still not too sure what the difference is between Clearscan and Searchable Image, other than it says Clearscan replaces the fonts with closely related ones, or something. Yes, I think 20MB per 500 page book is good enough for me, as well. I don't need to drastically reduce the file size, otherwise I would probably just convert them to DJVU files. What do you mean by 'double-paged' scans? |
|
06-22-2020, 10:30 AM | #10 | |
Groupie
Posts: 184
Karma: 2019866
Join Date: Feb 2018
Device: Kobo Aura-One (using KOReader app), Boox Note-3, iPad(s)
|
Quote:
Here is clearscan mode compared to the exact image mode, offering significantly improved visual quality of the text and averaging three times smaller files at 300 dpi and seven times smaller at 600 for aproxim. the same processing time. https://blogs.adobe.com/acrolaw/2009...rscan_is_smal/ R.Borstein's blog with Tips and Techniques https://blogs.adobe.com/acrolaw/cate...r-recognition/ By 'double-paged' scan I mean the book scanned with its both pages applied (left and right) at once, not separately, as here from 2:30 min. in this useful Scantailor tutorial when those double-page scans are automatically being rotated and split in half. Post-processing of scanned or photographed pages using Scan Tailor. https://youtu.be/N1WK2J1Dr-s?t=144 https://vimeo.com/12524529 Quick and easy pdf splitting and cropping using Briss https://superuser.com/questions/7910...e-pass/1303929 https://www.youtube.com/watch?v=4Wp4RIYUqC8 https://www.youtube.com/watch?v=TWfPWUf5y-s Abbyy would split the scans automatically, and we can also split pages using Acrobat, Foxit or another pdf editor using Print/Tile function (as shown in the video below), but I'd rather use Briss, k2pdfopt or Scantailor for their additional cropping capabilities. Page splitting using Print/Tile function https://www.youtube.com/watch?v=PYlk3FtfZSU Javascript-plugin-to-automatically-split-pages-in-adobe-acrobat https://www.mobileread.com/forums/sh...56&postcount=3 https://www.mobileread.com/forums/sh...15&postcount=8 Pdf cropping in Acrobat, or some pdf editor with similar cropping options (all pages, just selected pages, or even & odd pages separatelly) https://www.youtube.com/watch?v=-HIIqj-p3Kk We can also use Acrobat's cropping tool for splitting the whole double-page pdf. First we have to crop odd and even pages and save them as separate files, then we have to split those two files into separate pdfs using split command, but used together with the "output options" button and then with "add label - before original name" chosen therein, so that we can quickly and easily merge (concatenate) them ordered as in the original pdf, part_1_left.pdf, part_1_right.pdf, part_2_left.pdf etc. instead of a tedious manual way as for the 3rd step in the video below, or instead of using some 3rd party (renamer) app for the quick renaming of the files: https://www.youtube.com/watch?v=FDDt0hQTPKk Last edited by Marinolino; 06-24-2020 at 11:21 AM. |
|
08-21-2020, 02:44 PM | #11 |
Enthusiast
Posts: 39
Karma: 10
Join Date: Apr 2018
Device: Samsung Galaxy Tab S2, iPad 2 (Bluefire Reader); fire hd 10, Windows
|
In my experience, it doesn't always have a result that I want, but it's worth a shot passing heavy pdfs (like old book scans from Open Library/Archive.org) through ABBYY as often it does succeed in creating a much more viewable file on my PC, when Acrobat wasn't able to accomplish it. You will want to play around with the options, as sometimes different selections get you the result you want and not others. I've sort of developed a feeling for it over time.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Huffdic compression | NNToan | Kindle Formats | 2 | 11-16-2014 03:33 PM |
If I have ABBYY Finereader, do I need ABBYY PDF Transformer? | graycyn | 2 | 06-12-2012 06:23 PM | |
.mobi compression? | jpcapili | Calibre | 3 | 12-05-2011 09:44 PM |
Does anyone know the Mobipocket compression? | slayda | Kindle Formats | 8 | 03-29-2010 11:38 PM |
ePub compression | gonzule | ePub | 3 | 10-25-2008 03:35 PM |