Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 06-15-2020, 02:47 PM   #1
Bookchin
Enthusiast
Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.
 
Posts: 45
Karma: 2076068
Join Date: Apr 2017
Device: none
ABBYY for compression?

I heard that ABBYY Fine Reader does a better job at compression than Acrobat. I’ve been using Acrobat for making compressed PDFs and have not been satisfied with the results.

Q 1: I was wondering if I could use ABBYY to re-compress my existing PDFs, or if I would have to use ABBYY to create the PDFs from scratch ?

Q 2: Also, is ABBYY SPRINT that comes bundled with document scanners, able to compress PDFs just as well as the full version of ABBYY?

Thanks.
Bookchin is offline   Reply With Quote
Old 06-17-2020, 06:19 PM   #2
Marinolino
Groupie
Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.
 
Posts: 152
Karma: 1042664
Join Date: Feb 2018
Device: Kobo A1, iPad(s), Kindle DX, Hanvon E930
Abbyy's OCR engine is probably the best there is, but as for scans, I usually prefer Adobe Acrobat's Clearscan mode over Abbyy's compression (I've been using only older 11 & 12 Abbyy versions though), because one thousand page textual book (A5 or A4) will usually be turned into a suitable 20-40 MB sized Clearscan pdf file (depending on the letter size and number of graphs/tables/pictures therein), also pretty easily flipped through on my older eink readers, whereas Abbyy's pdf scans (OCR text layer behind the scanned image) will not be as smooth to flip through i.e. they would need newer readers with stronger processors and more memory.

As for Q1, Yes, you can use ABBYY to re-compress your existing PDFs, or you can use some pdf editor to decompress pdf pages beforehand (to recreate original images), and then to feed those images to ABBYY, if that would be quicker.

Last edited by Marinolino; 06-17-2020 at 07:34 PM.
Marinolino is offline   Reply With Quote
Advert
Old 06-19-2020, 01:35 AM   #3
Bookchin
Enthusiast
Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.
 
Posts: 45
Karma: 2076068
Join Date: Apr 2017
Device: none
Quote:
Originally Posted by Marinolino View Post
Abbyy's OCR engine is probably the best there is, but as for scans, I usually prefer Adobe Acrobat's Clearscan mode over Abbyy's compression (I've been using only older 11 & 12 Abbyy versions though), because one thousand page textual book (A5 or A4) will usually be turned into a suitable 20-40 MB sized Clearscan pdf file (depending on the letter size and number of graphs/tables/pictures therein), also pretty easily flipped through on my older eink readers, whereas Abbyy's pdf scans (OCR text layer behind the scanned image) will not be as smooth to flip through i.e. they would need newer readers with stronger processors and more memory.

As for Q1, Yes, you can use ABBYY to re-compress your existing PDFs, or you can use some pdf editor to decompress pdf pages beforehand (to recreate original images), and then to feed those images to ABBYY, if that would be quicker.
Thanks. I usually scan documents from my small Brother multifunction printer scanner and output as jpeg. Then I both convert jpegs to a pdf file and OCR it with Acrobat. As soon as I OCR it though, the file size really balloons.

I use Acrobat Pro XI and I've never seen Clearscan mode anywhere in the Tools menu.
Bookchin is offline   Reply With Quote
Old 06-19-2020, 10:40 AM   #4
Marinolino
Groupie
Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.
 
Posts: 152
Karma: 1042664
Join Date: Feb 2018
Device: Kobo A1, iPad(s), Kindle DX, Hanvon E930
Quote:
Originally Posted by Bookchin View Post
Thanks. I usually scan documents from my small Brother multifunction printer scanner and output as jpeg. Then I both convert jpegs to a pdf file and OCR it with Acrobat. As soon as I OCR it though, the file size really balloons.

I use Acrobat Pro XI and I've never seen Clearscan mode anywhere in the Tools menu.
You can choose among three Pdf Output Styles: Searchable image, Searchable image exact, Clearscan.

It is in Tools/Text Recognition section.

You should click on the "Edit" button therein, for the pop-up window to appear, with options to choose a desired Language, Pdf Output Style and Downsample resolution.

Clearscan has been renamed to "Editable text and images" in the past years, and they say it is not available for Acrobat Standard version anymore.

https://community.adobe.com/t5/acrob...7036679?page=1

Adobe Acrobat DC Help
https://images-na.ssl-images-amazon....1J5kl6swPS.pdf

Scanning tips from page 180:

" • Acrobat scanning accepts images between 10 dpi and 3000 dpi. If you select Searchable Image or ClearScan for PDF Output Style, input resolution of 72 dpi or higher is required. Also, input resolution higher than 600 dpi is downsampled to 600 dpi or lower.
• To apply lossless compression to a scanned image, select one of these options under the Optimization Options in the Optimize Scanned PDF dialog box: CCITT Group 4 for monochrome images, or Lossless for color or grayscale images. If this image is appended to a PDF document, and you save the file using the Save option, the scanned image remains uncompressed. If you save the PDF using Save As, the scanned image may be compressed.
• For most pages, black-and-white scanning at 300 dpi produces text best suited for conversion. At 150 dpi, OCR accuracy is slightly lower, and more font-recognition errors occur; at 400 dpi and higher resolution, processing slows, and compressed pages are bigger. If a page has many unrecognized words or small text (9 points or smaller), try scanning at higher resolution. Scan in black and white whenever possible.
• When Recognize Text Using OCR is disabled, full 10-to-3000 dpi resolution range may be used, but the recommended resolution is 72 and higher dpi. For Adaptive Compression, 300 dpi is recommended for grayscale or RGB input, or 600 dpi for black-and-white input.
• Pages scanned in 24-bit color, 300 dpi, at 8-1/2–by-11 in. (21.59-by-27.94 cm) result in large images (25 MB) before compression. Your system may require 50 MB of virtual memory or more to scan the image. At 600 dpi, both scanning and processing typically are about four times slower than at 300 dpi.
• Avoid dithering or halftone scanner settings. These settings can improve the appearance of photographs, but they make it difficult to recognize text.
• For text printed on colored paper, try increasing the brightness and contrast by about 10%. If your scanner has color-filtering capability, consider using a filter or lamp that drops out the background color. Or if the text isn’t crisp or drops out, try adjusting scanner contrast and brightness to clarify the scan.
• If your scanner has a manual brightness control, adjust it so that characters are clean and well formed. If characters are touching, use a higher (brighter) setting. If characters are separated, use a lower (darker) setting. "

Last edited by Marinolino; 06-19-2020 at 03:39 PM.
Marinolino is offline   Reply With Quote
Old 06-19-2020, 08:07 PM   #5
Bookchin
Enthusiast
Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.
 
Posts: 45
Karma: 2076068
Join Date: Apr 2017
Device: none


Close up



I use the text recognition to OCR, but there is nothing labeled Clearscan. So you're saying what used to be called Clearscan is still there, they just call it text recognition now?
Bookchin is offline   Reply With Quote
Advert
Old 06-19-2020, 09:35 PM   #6
Marinolino
Groupie
Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.
 
Posts: 152
Karma: 1042664
Join Date: Feb 2018
Device: Kobo A1, iPad(s), Kindle DX, Hanvon E930
Quote:
Originally Posted by Bookchin View Post
...



I use the text recognition to OCR, but there is nothing labeled Clearscan. So you're saying what used to be called Clearscan is still there, they just call it text recognition now?
In this picture an "In This File" option looks grayed-out i.e. not available, because you have not yet opened your scanned pdf in Acrobat.

After you open your scanned pdf in Acrobat, you will be able to choose (click on) "In This File", and then in the opened pop-up window there will be shown your current OCR settings (language, exact image or clearscan, resolution) and you will also see the "edit" button, that you can use to change the settings.

Or you can click on an available "In Multiple files" option, and then choose your scanned file(s) and desired settings from there.

https://helpx.adobe.com/acrobat/11/u...nned_documents

Last edited by Marinolino; 06-19-2020 at 11:22 PM.
Marinolino is offline   Reply With Quote
Old 06-20-2020, 02:39 AM   #7
Bookchin
Enthusiast
Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.
 
Posts: 45
Karma: 2076068
Join Date: Apr 2017
Device: none
Ah...I see now. Thanks.

When I click on the Text Recognition, I get this pop up menu. I've always just clicked 'OK' because I'm not trying to 'edit' the document, just OCR it. 'Edit' doesn't seem inuitive for doing OCR.

By clicking 'OK' it goes ahead and OCR the document. I've always just stopped there.





However, when I do click 'Edit', other options come up:

- Searchable Image
- Searchable Image Exact
- Clearscan





The Clearscan option...




I've always tried to reduce the PDF size by using the PDF optimizer in another tool menu. But, if I remember, it does reduce the file size, but strips the OCR layer off.

I'd like to reduce the file size while retaining high quality. I don't need a 50% size reduction, but 20-30% would be nice.

Last edited by Bookchin; 06-20-2020 at 02:41 AM. Reason: spelling
Bookchin is offline   Reply With Quote
Old 06-21-2020, 05:57 PM   #8
Marinolino
Groupie
Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.
 
Posts: 152
Karma: 1042664
Join Date: Feb 2018
Device: Kobo A1, iPad(s), Kindle DX, Hanvon E930
Quote:
Originally Posted by Bookchin View Post
...

I'd like to reduce the file size while retaining high quality. I don't need a 50% size reduction, but 20-30% would be nice.
You should experiment with 300 and 600 dpi downsampling, and see the quality and the conversion speed, using e.g. 30-40 sample pages from your 500 or 1000 pdf scan.

For my text books (without many pictures and graphics therein) Clearscan mode applied to 300 dpi input files is usually good enough for me, and it results in 2-4 MB pdf file per 100 pages i.e. 10-20 MB pdf for 500 page book.

I'd quickly crop double-paged scans and trim its margins using Briss or k2pdopt beforehand.

In the future, if you want even smaller and neater pdfs, you can use Scantailor before applying Clearscan OCR , to automatically crop double-paged scans, deskew and despeckle scanned images, trim the margins, remove the background etc.

https://www.youtube.com/watch?v=Edfs2_YJhx4

https://www.youtube.com/watch?v=dHZmTYTVL44

Last edited by Marinolino; 06-22-2020 at 12:56 PM.
Marinolino is offline   Reply With Quote
Old 06-22-2020, 12:59 AM   #9
Bookchin
Enthusiast
Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.Bookchin ought to be getting tired of karma fortunes by now.
 
Posts: 45
Karma: 2076068
Join Date: Apr 2017
Device: none
Quote:
Originally Posted by Marinolino View Post
You should experiment with 300 and 600 dpi downsampling, and see the quality and the conversion speed, using e.g. 30-40 sample pages from your 500 or 1000 pdf scan.

For my text books (without many pictures and graphics therein) Clearscan mode at 300 dpi is usually good enough for me, and it results in 2-4 MB pdf file per 100 pages i.e. 10-20 MB pdf for 500 page book.

I'd quickly crop double-paged scans and trim its margins using Briss or k2pdopt beforehand.

In the future, if you want even smaller and neater pdfs, you can use Scantailor before applying Clearscan OCR , to automatically crop double-paged scans, deskew and despeckle scanned images, trim the margins, remove the background etc.

https://www.youtube.com/watch?v=Edfs2_YJhx4

https://www.youtube.com/watch?v=dHZmTYTVL44

Thanks. I read the Adobe instructions for scanning that you linked to, and I'm still not too sure what the difference is between Clearscan and Searchable Image, other than it says Clearscan replaces the fonts with closely related ones, or something.

Yes, I think 20MB per 500 page book is good enough for me, as well. I don't need to drastically reduce the file size, otherwise I would probably just convert them to DJVU files.

What do you mean by 'double-paged' scans?
Bookchin is offline   Reply With Quote
Old 06-22-2020, 10:30 AM   #10
Marinolino
Groupie
Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.Marinolino ought to be getting tired of karma fortunes by now.
 
Posts: 152
Karma: 1042664
Join Date: Feb 2018
Device: Kobo A1, iPad(s), Kindle DX, Hanvon E930
Quote:
Originally Posted by Bookchin View Post
Thanks. I read the Adobe instructions for scanning that you linked to, and I'm still not too sure what the difference is between Clearscan and Searchable Image, other than it says Clearscan replaces the fonts with closely related ones, or something.

Yes, I think 20MB per 500 page book is good enough for me, as well. I don't need to drastically reduce the file size, otherwise I would probably just convert them to DJVU files.

What do you mean by 'double-paged' scans?
As I've suggested, the best way is to experiment yourself with different kind of books, e.g. using 30-40 sample pages (scanned at 300 dpi and 600 dpi), first with a few pictures and normal size letters therein, then some other book with more pictures/graphs and smaller sized letters etc., and applying clearscan and other styles with 300 and 600 dpi downsampling.

Here is clearscan mode compared to the exact image mode, offering significantly improved visual quality of the text and averaging three times smaller files at 300 dpi and seven times smaller at 600 for aproxim. the same processing time.

https://blogs.adobe.com/acrolaw/2009...rscan_is_smal/
R.Borstein's blog with Tips and Techniques
https://blogs.adobe.com/acrolaw/cate...r-recognition/

By 'double-paged' scan I mean the book scanned with its both pages applied (left and right) at once, not separately, as here from 2:30 min. in this useful Scantailor tutorial when those double-page scans are automatically being rotated and split in half.

Post-processing of scanned or photographed pages using Scan Tailor.
https://youtu.be/N1WK2J1Dr-s?t=144

https://vimeo.com/12524529

Quick and easy pdf splitting and cropping using Briss
https://superuser.com/questions/7910...e-pass/1303929

https://www.youtube.com/watch?v=4Wp4RIYUqC8
https://www.youtube.com/watch?v=TWfPWUf5y-s

Abbyy would split the scans automatically, and we can also split pages using Acrobat, Foxit or another pdf editor using Print/Tile function (as shown in the video below), but I'd rather use Briss, k2pdfopt or Scantailor for their additional cropping capabilities.

Page splitting using Print/Tile function
https://www.youtube.com/watch?v=PYlk3FtfZSU
Javascript-plugin-to-automatically-split-pages-in-adobe-acrobat
https://www.mobileread.com/forums/sh...56&postcount=3
https://www.mobileread.com/forums/sh...15&postcount=8

Pdf cropping in Acrobat, or some pdf editor with similar cropping options (all pages, just selected pages, or even & odd pages separatelly)
https://www.youtube.com/watch?v=-HIIqj-p3Kk

We can also use Acrobat's cropping tool for splitting the whole double-page pdf.
First we have to crop odd and even pages and save them as separate files, then we have to split those two files into separate pdfs using split command, but used together with the "output options" button and then with "add label - before original name" chosen therein, so that we can quickly and easily merge (concatenate) them ordered as in the original pdf, part_1_left.pdf, part_1_right.pdf, part_2_left.pdf etc. instead of a tedious manual way as for the 3rd step in the video below, or instead of using some 3rd party (renamer) app for the quick renaming of the files:

https://www.youtube.com/watch?v=FDDt0hQTPKk

Last edited by Marinolino; 06-24-2020 at 11:21 AM.
Marinolino is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Huffdic compression NNToan Kindle Formats 2 11-16-2014 03:33 PM
If I have ABBYY Finereader, do I need ABBYY PDF Transformer? graycyn PDF 2 06-12-2012 06:23 PM
.mobi compression? jpcapili Calibre 3 12-05-2011 09:44 PM
Does anyone know the Mobipocket compression? slayda Kindle Formats 8 03-29-2010 11:38 PM
ePub compression gonzule ePub 3 10-25-2008 03:35 PM


All times are GMT -4. The time now is 12:44 PM.


MobileRead.com is a privately owned, operated and funded community.