View Single Post
Old 09-05-2013, 04:31 AM   #1769
FaceDeer
Connoisseur
FaceDeer will become famous soon enoughFaceDeer will become famous soon enoughFaceDeer will become famous soon enoughFaceDeer will become famous soon enoughFaceDeer will become famous soon enoughFaceDeer will become famous soon enoughFaceDeer will become famous soon enough
 
Posts: 89
Karma: 706
Join Date: Nov 2012
Device: Kobo Touch
So, I noticed another oddity in my FiMFiction collection after doing a bunch of metadata cleanup; a story that I knew was quite old was listed as having been first published only earlier this month. Specifically, http://www.fimfiction.net/story/4464/ was listed as first published on August 25 2013 when the page says the first chapter went up almost a year ago on October 29 2012. I searched around and found some others that were way off, too, eg http://www.fimfiction.net/story/14392/ (FFDL says August 2013, page says March 2012) or http://www.fimfiction.net/story/2702/ (FFDL says August 3 2013, page says July 19 2012).

It looks like FiMFiction's API results have gone strange again. The API's metadata matches what FFDL reports, at least if you consider it as a UNIX timestamp (not sure what else it could be). So I went into adapter_fimfictionnet.py and figured out how to fix my own problem so it wouldn't be a hassle. I felt kind of embarrassed about how my previous bug report turned out to be a bug with Calibre itself rather than FFDL.

Code:
import dateutil.parser as dparser
...
        oldestChapter = datetime.now()
        #Scan all chapters to find the oldest, on FiMFiction it's possible for authors to insert
		#new chapters out-of-order or change the dates of earlier ones by editing them
        for chapterDate in soup.findAll('span', {'class':'date'}):
            rawChapterDate = chapterDate.contents[1].strip()
            chapterDate = dparser.parse(rawChapterDate)
            if chapterDate < oldestChapter:
                oldestChapter = chapterDate
        self.story.setMetadata("datePublished", oldestChapter)
This appears to correctly dig out the oldest chapter posting date from the story's description page and use that as the date published. I updated the metadata on 800 FiMFiction stories in my collection with no errors or apparent weirdness.

And then, while I was high on Python, I went and added an extra bit of site-specific metadata I've been thinking about for a while:

Code:
import HTMLParser
...
        rawGroupList = soup.find('ul', {'id':'story_group_list'})
		if rawGroupList is not None:
			for groupName in rawGroupList.findAll('a', {'href':re.compile('^/group/')}):
				groupString = HTMLParser.HTMLParser().unescape(groupName.string)
				if not isinstance(groupString,basestring):
					groupString = unicode(groupString)
				self.story.addToList("groups", groupString)
(don't know if that HTML unescape step is really necessary, there was a bunch of escaped HTML in the group titles of the test story I was playing with and I figured better safe than sorry).

Plus of course the addition of "groups" to plugin-defaults.ini's list of extra metadata for this site. This was tested with those 800 stories too and I didn't get any screwy group titles out of it or unhandled errors.

I hope these snippets are up to standard, and make up for my lazy bug reporting a few days back.
FaceDeer is offline