View Single Post
Old 03-15-2014, 05:17 AM   #10
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by AIDM2 View Post
So, there's a book that has sanskrit text on the bottom of almost every page (see image).
Also, here is a link to the Archive.org version:

https://archive.org/details/materiamedicahi00kinggoog

Quote:
Originally Posted by AIDM2 View Post
Would it be possible to program ABBYY FineReader to ignore every instance of that text?
You will have to manually intervene in those cases.

You can either (ranked from worst to best):
  • Ignore those sections and clean it in when you output
  • You can manually delete the garbage text from the Text Window (right side)
    Click image for larger version

Name:	SanskritDelete.png
Views:	297
Size:	154.5 KB
ID:	120256 Click image for larger version

Name:	SanskritDelete2.png
Views:	274
Size:	155.5 KB
ID:	120257
  • You can resize the "Text Recognition Box" to not include the footnote
    • Don't forget to "Read" the page again.
    Click image for larger version

Name:	SanskritResize.png
Views:	277
Size:	157.1 KB
ID:	120259
  • You can resize the text box, and then mark the footnote as an Image instead.
    • I would choose this method.
    • Don't forget to "Read" the page again.
    Click image for larger version

Name:	SanskritImage.png
Views:	269
Size:	162.5 KB
ID:	120258

Making the Sanskrit footnotes images will allow you to export them all, and perhaps you can then feed those images into an OCR tool that can read Sanskrit, you can more easily manually transcribe the text, or you can as a last resort, embed the sanskrit in the book as images (although I HIGHLY recommend against embedding text as images).

Last edited by Tex2002ans; 03-15-2014 at 06:53 AM.
Tex2002ans is offline   Reply With Quote