MobileRead Forums - View Single Post

FaceDeer · 09-05-2013, 04:31 AM

So, I noticed another oddity in my FiMFiction collection after doing a bunch of metadata cleanup; a story that I knew was quite old was listed as having been first published only earlier this month. Specifically, http://www.fimfiction.net/story/4464/ was listed as first published on August 25 2013 when the page says the first chapter went up almost a year ago on October 29 2012. I searched around and found some others that were way off, too, eg http://www.fimfiction.net/story/14392/ (FFDL says August 2013, page says March 2012) or http://www.fimfiction.net/story/2702/ (FFDL says August 3 2013, page says July 19 2012).

It looks like FiMFiction's API results have gone strange again. The API's metadata matches what FFDL reports, at least if you consider it as a UNIX timestamp (not sure what else it could be). So I went into adapter_fimfictionnet.py and figured out how to fix my own problem so it wouldn't be a hassle. I felt kind of embarrassed about how my previous bug report turned out to be a bug with Calibre itself rather than FFDL.

Code:

import dateutil.parser as dparser
...
        oldestChapter = datetime.now()
        #Scan all chapters to find the oldest, on FiMFiction it's possible for authors to insert
		#new chapters out-of-order or change the dates of earlier ones by editing them
        for chapterDate in soup.findAll('span', {'class':'date'}):
            rawChapterDate = chapterDate.contents[1].strip()
            chapterDate = dparser.parse(rawChapterDate)
            if chapterDate < oldestChapter:
                oldestChapter = chapterDate
        self.story.setMetadata("datePublished", oldestChapter)

This appears to correctly dig out the oldest chapter posting date from the story's description page and use that as the date published. I updated the metadata on 800 FiMFiction stories in my collection with no errors or apparent weirdness.

And then, while I was high on Python, I went and added an extra bit of site-specific metadata I've been thinking about for a while:

Code:

import HTMLParser
...
        rawGroupList = soup.find('ul', {'id':'story_group_list'})
		if rawGroupList is not None:
			for groupName in rawGroupList.findAll('a', {'href':re.compile('^/group/')}):
				groupString = HTMLParser.HTMLParser().unescape(groupName.string)
				if not isinstance(groupString,basestring):
					groupString = unicode(groupString)
				self.story.addToList("groups", groupString)

(don't know if that HTML unescape step is really necessary, there was a bunch of escaped HTML in the group titles of the test story I was playing with and I figured better safe than sorry).

Plus of course the addition of "groups" to plugin-defaults.ini's list of extra metadata for this site. This was tested with those 800 stories too and I didn't get any screwy group titles out of it or unhandled errors.

I hope these snippets are up to standard, and make up for my lazy bug reporting a few days back.

09-05-2013, 04:31 AM	#1769
FaceDeer Connoisseur Posts: 89 Karma: 706 Join Date: Nov 2012 Device: Kobo Touch	So, I noticed another oddity in my FiMFiction collection after doing a bunch of metadata cleanup; a story that I knew was quite old was listed as having been first published only earlier this month. Specifically, http://www.fimfiction.net/story/4464/ was listed as first published on August 25 2013 when the page says the first chapter went up almost a year ago on October 29 2012. I searched around and found some others that were way off, too, eg http://www.fimfiction.net/story/14392/ (FFDL says August 2013, page says March 2012) or http://www.fimfiction.net/story/2702/ (FFDL says August 3 2013, page says July 19 2012). It looks like FiMFiction's API results have gone strange again. The API's metadata matches what FFDL reports, at least if you consider it as a UNIX timestamp (not sure what else it could be). So I went into adapter_fimfictionnet.py and figured out how to fix my own problem so it wouldn't be a hassle. I felt kind of embarrassed about how my previous bug report turned out to be a bug with Calibre itself rather than FFDL. Code: import dateutil.parser as dparser ... oldestChapter = datetime.now() #Scan all chapters to find the oldest, on FiMFiction it's possible for authors to insert #new chapters out-of-order or change the dates of earlier ones by editing them for chapterDate in soup.findAll('span', {'class':'date'}): rawChapterDate = chapterDate.contents[1].strip() chapterDate = dparser.parse(rawChapterDate) if chapterDate < oldestChapter: oldestChapter = chapterDate self.story.setMetadata("datePublished", oldestChapter) This appears to correctly dig out the oldest chapter posting date from the story's description page and use that as the date published. I updated the metadata on 800 FiMFiction stories in my collection with no errors or apparent weirdness. And then, while I was high on Python, I went and added an extra bit of site-specific metadata I've been thinking about for a while: Code: import HTMLParser ... rawGroupList = soup.find('ul', {'id':'story_group_list'}) if rawGroupList is not None: for groupName in rawGroupList.findAll('a', {'href':re.compile('^/group/')}): groupString = HTMLParser.HTMLParser().unescape(groupName.string) if not isinstance(groupString,basestring): groupString = unicode(groupString) self.story.addToList("groups", groupString) (don't know if that HTML unescape step is really necessary, there was a bunch of escaped HTML in the group titles of the test story I was playing with and I figured better safe than sorry). Plus of course the addition of "groups" to plugin-defaults.ini's list of extra metadata for this site. This was tested with those 800 stories too and I didn't get any screwy group titles out of it or unhandled errors. I hope these snippets are up to standard, and make up for my lazy bug reporting a few days back.