Quote:
Originally Posted by bjo
But isn't Calibre effectively a library of regexps for searching and replacing particular patterns?
|
No. Regexes are used when necessary but they are often avoid because they are slow. Tree processing, stream processing, looping techniques, are all used. That statement doesn't take into the fact that many formats are also binary files that need to be read and written. It only looks at it from the prospective of shifting text markup.
Quote:
Originally Posted by bjo
And couldn't it be 'opened up' so the user can preview each pattern Calibre thinks it's detected, and how it will be modified, and tweak this if it's not quite right? Or submit new patterns to its database?
|
calibre does most of the processing itself but also incorporates third party tools to help. PDF input for instance is first handled by pdftohtml. It would be possible to open it up more. However, it's an open source application and you can add your own processing plugins at various stages.