So, I noticed another oddity in my FiMFiction collection after doing a bunch of metadata cleanup; a story that I knew was quite old was listed as having been first published only earlier this month. Specifically,
http://www.fimfiction.net/story/4464/ was listed as first published on August 25 2013 when the page says the first chapter went up almost a year ago on October 29 2012. I searched around and found some others that were way off, too, eg
http://www.fimfiction.net/story/14392/ (FFDL says August 2013, page says March 2012) or
http://www.fimfiction.net/story/2702/ (FFDL says August 3 2013, page says July 19 2012).
It looks like FiMFiction's API results have gone strange again. The API's metadata matches what FFDL reports, at least if you consider it as a UNIX timestamp (not sure what else it could be). So I went into adapter_fimfictionnet.py and figured out how to fix my own problem so it wouldn't be a hassle. I felt kind of embarrassed about how my previous bug report turned out to be a bug with Calibre itself rather than FFDL.
Code:
import dateutil.parser as dparser
...
oldestChapter = datetime.now()
#Scan all chapters to find the oldest, on FiMFiction it's possible for authors to insert
#new chapters out-of-order or change the dates of earlier ones by editing them
for chapterDate in soup.findAll('span', {'class':'date'}):
rawChapterDate = chapterDate.contents[1].strip()
chapterDate = dparser.parse(rawChapterDate)
if chapterDate < oldestChapter:
oldestChapter = chapterDate
self.story.setMetadata("datePublished", oldestChapter)
This appears to correctly dig out the oldest chapter posting date from the story's description page and use that as the date published. I updated the metadata on 800 FiMFiction stories in my collection with no errors or apparent weirdness.
And then, while I was high on Python, I went and added an extra bit of site-specific metadata I've been thinking about for a while:
Code:
import HTMLParser
...
rawGroupList = soup.find('ul', {'id':'story_group_list'})
if rawGroupList is not None:
for groupName in rawGroupList.findAll('a', {'href':re.compile('^/group/')}):
groupString = HTMLParser.HTMLParser().unescape(groupName.string)
if not isinstance(groupString,basestring):
groupString = unicode(groupString)
self.story.addToList("groups", groupString)
(don't know if that HTML unescape step is really necessary, there was a bunch of escaped HTML in the group titles of the test story I was playing with and I figured better safe than sorry).
Plus of course the addition of "groups" to plugin-defaults.ini's list of extra metadata for this site. This was tested with those 800 stories too and I didn't get any screwy group titles out of it or unhandled errors.
I hope these snippets are up to standard, and make up for my lazy bug reporting a few days back.