View Single Post
Old 08-02-2015, 07:08 PM   #9
markom
Banned
markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.
 
Posts: 488
Karma: 1080260
Join Date: Sep 2012
Device: sony prs t1 kindle dx ipad
I have tried a 14-day trial version of Debenu plug-in for Acrobat on several scanned pdfs (with OCR layer in the background, exact image and Clearscan)

https://www.youtube.com/watch?v=QaFL...wWlvmE0w1tCisz

https://www.youtube.com/user/Debenu/videos

We can just delete the watermarks stamped across every page by this demo version and use it as we would the normal version (we can quickly delete it all at once using Acrobat's Preflight tool and the layer-separation for knock-outs therein, or we can do that manually page by page with some other pdf editor).

http://download.cnet.com/Debenu-PDF-...-10317359.html

It is very easy to use and very quick in hyperlinking the index pages (creating the visible rectangles around the numbers or as not visible at all), we just have to specify the page range and in five-ten seconds if there are e.g. 30-40 index pages it will link thousand of entries, but the problem is that it links to the page number in pdf only (pagination notwithstanding), so if the page with arabic no.1 in the scanned book is the page no. 20 in pdf (because there are pages marked with roman numbers in front), there will be 19 pages mistake that cannot be easily rectified, because there is no such option there currently.
So, we have to simply adjust the page no.1 in pdf with arabic number 1 in the book, by deleting some front pages and then merging them back later after hyperlinking if we need them (I've tried and it worked that way).

There would still be problems with some scanned pdfs which contain pictures or empty pages that are not numbered in the scanned book (e.g. page 100 with text, then page with the picture, then page 101 with the text), I guess.

Also there were not infrequent mistakes with the recognition in some ClearScan-ed scans e.g. 120 as 1 and 20 separated, although OCR layer clearly states it as 120, whereas several scans OCR-ed as "exact image" were without such mistakes.

We can at the same time hyperlink the Index pages and TOC too, but if we also want to hyperlink the TOC text lines (not just the page numbers), we have to do that manually again, by drawing the rectangles in Acrobat.

Example: Hyperlinked TOC numbers (using the blue underlines with a medium width)

[Image violates Posting Guidelines for size - MODERATOR]


If there is already TOC in the pdf we can use it to automatically create bookmarks and vice-versa create clickable TOC page from the bookmarks.

https://www.youtube.com/watch?v=UqFDl36-o3g

---------

So, those without Adobe Acrobat, could send a pdf to some acquaintance who does have it, asking him to quickly install this or the newer trial Debenu plug-in and then after quickly linking the TOC pages and Index pages (using Create-Page-Links tool), also to delete (as quickly and easily) all the demo-watermarks using Acrobat's Preflight/Layer-separation tool.

Last edited by Dr. Drib; 08-21-2015 at 02:01 PM.
markom is offline   Reply With Quote