View Single Post
Old 07-26-2010, 06:22 PM   #5
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by bjo View Post
But isn't Calibre effectively a library of regexps for searching and replacing particular patterns?
No. Regexes are used when necessary but they are often avoid because they are slow. Tree processing, stream processing, looping techniques, are all used. That statement doesn't take into the fact that many formats are also binary files that need to be read and written. It only looks at it from the prospective of shifting text markup.

Quote:
Originally Posted by bjo View Post
And couldn't it be 'opened up' so the user can preview each pattern Calibre thinks it's detected, and how it will be modified, and tweak this if it's not quite right? Or submit new patterns to its database?
calibre does most of the processing itself but also incorporates third party tools to help. PDF input for instance is first handled by pdftohtml. It would be possible to open it up more. However, it's an open source application and you can add your own processing plugins at various stages.
user_none is offline   Reply With Quote