HTML and TXT are the most problematic.
but then again... what would a human do? start looking into the file and near the top you'd find: author... Title...
We could probably come up with a not too AI program that would try to do some pattern matching to figure out basic metadata
After that all is needed to throw them into a table, that's not a big deal.
|