Quote:
Originally Posted by Karellen
|
The exact version of Scan Tailor Advanced I use is by 4lex4:
v1.0.16 was the latest (in 2018).
- - -
Side Note: In September 2019 there was an "Early Access" version, and then it seems like there hasn't been much activity since.
I think, since the 2019 stall, some other person created another fork of it here:
but I have no idea about that fork or what sorts of bugs/fixes have been done since.
- - -
Side Note #2: Looks like you linked to the original "Scan Tailor".
"Scan Tailor Advanced" took all the forks, pulled out all the best features, and combined them all into one super version.
The biggest features for me were:
- multi-core support (so it runs MUCH faster than the original)
- image formats besides TIFF
+ lots of other helpful things all listed on their Github.
- - -
Quote:
Originally Posted by Karellen
It seems quite old, [...]
|
Doesn't matter. It's only used as a middle, pre-processing stage where you are cleaning up the raw images.
I don't foresee
too much changing on that front any time soon.
- You feed it the raw photos/scans.
- it helps crop + fix the warping + normalize the B&W/grayscale/colors.
- then you shove those images into OCR.
You can see me apply it in:
where I quickly:
- took an Archive.org PDF
- ran it through Scan Tailor Advanced
- OCRed it in Finereader
- Exported as (EPUB) + ran my regex on it.
You can compare my
quickly-generated EPUB vs. the
auto-generated Archive.org "EPUB".
Still, nowhere near as good as a manually corrected version, but WAY better quality than just spitting out raw text right out of the PDF.
Quote:
Originally Posted by Karellen
[...] and you mention generational leaps to the OP, so I wonder if the same applies to this software.
|
Yes, generational leaps in the OCR.
Even on the free/open-source front, there's been a lot of action, but I haven't been following that too closely... Because those tools tended to:
- focus on generating only the raw plaintext, ditching all the important formatting!
- be commandline only.
- (Or have really crappy GUIs.)
- Okay if you are working on a small amount... but when you have to manually tweak/correct/mark pages, great GUI is key. It will save you so much pain further down the line.