View Single Post
Old 05-29-2021, 08:38 AM   #4
Quoth
Still reading
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 14,414
Karma: 107078855
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
I mean the bad OCRed scans is the source of problem. Not copyright. The Open Library and other copyright shenanigans at Archive are nothing to do with ghastly mobi/epub quality. They have been scanning paper books themselves for about 12 years as well as source fro Google, Microsoft and uploaders. The problem is that none of it is human curated or proofed. It's automated.

I just set up Linux box with a 20 year old Epson Perfection1200 on SCSI and Tesseract and gocr* last night. The newish funky colour laser printer-copier-scanner is not obviously better and is also downstairs.
I have some 1890s to 1920s books, but likely I'm more interested in OCR of PD PDFs already scanned elsewhere.

Yes, I know about AbbyFineReader. But I don't have it.

I couldn't find any sort of SCSI adaptor for the laptop. I used to have a PCMCIA card and a laptop that could take them.

[* Xsane seems to want gocr, but 15 years ago I would have saved the scans, adjusted in PaintShopPro and used the OCR on files. I can't imagine why I do it from inside Xsane, even though I have a sheetfeeder]

Last edited by Quoth; 05-29-2021 at 08:44 AM.
Quoth is offline   Reply With Quote