MobileRead Forums - View Single Post

Quoth · 05-29-2021, 08:38 AM

I mean the bad OCRed scans is the source of problem. Not copyright. The Open Library and other copyright shenanigans at Archive are nothing to do with ghastly mobi/epub quality. They have been scanning paper books themselves for about 12 years as well as source fro Google, Microsoft and uploaders. The problem is that none of it is human curated or proofed. It's automated.

I just set up Linux box with a 20 year old Epson Perfection1200 on SCSI and Tesseract and gocr* last night. The newish funky colour laser printer-copier-scanner is not obviously better and is also downstairs.
I have some 1890s to 1920s books, but likely I'm more interested in OCR of PD PDFs already scanned elsewhere.

Yes, I know about AbbyFineReader. But I don't have it.

I couldn't find any sort of SCSI adaptor for the laptop. I used to have a PCMCIA card and a laptop that could take them.

[* Xsane seems to want gocr, but 15 years ago I would have saved the scans, adjusted in PaintShopPro and used the OCR on files. I can't imagine why I do it from inside Xsane, even though I have a sheetfeeder]

05-29-2021, 08:38 AM	#4
Quoth Still reading Posts: 15,425 Karma: 114319649 Join Date: Jun 2017 Location: Ireland Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper	I mean the bad OCRed scans is the source of problem. Not copyright. The Open Library and other copyright shenanigans at Archive are nothing to do with ghastly mobi/epub quality. They have been scanning paper books themselves for about 12 years as well as source fro Google, Microsoft and uploaders. The problem is that none of it is human curated or proofed. It's automated. I just set up Linux box with a 20 year old Epson Perfection1200 on SCSI and Tesseract and gocr* last night. The newish funky colour laser printer-copier-scanner is not obviously better and is also downstairs. I have some 1890s to 1920s books, but likely I'm more interested in OCR of PD PDFs already scanned elsewhere. Yes, I know about AbbyFineReader. But I don't have it. I couldn't find any sort of SCSI adaptor for the laptop. I used to have a PCMCIA card and a laptop that could take them. [* Xsane seems to want gocr, but 15 years ago I would have saved the scans, adjusted in PaintShopPro and used the OCR on files. I can't imagine why I do it from inside Xsane, even though I have a sheetfeeder] Last edited by Quoth; 05-29-2021 at 08:44 AM.