I can only agree. Performance-wise there's a severe limitation on common hardware, measured as "tokens per second"... I remember some local models were painfully slow to process simple requests. And then there's that other limitation you mentioned on a similar thread "they can only be prompted with very limited amounts of text at the moment."
|