View Single Post
Old 09-17-2010, 05:09 PM   #2746
Flexicat
Junior Member
Flexicat began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Aug 2010
Device: Kobo
Hello. Can someone give me some assistance in creating a recipe for a site that does not have an RSS feed?

The base url is "http://archiveofourown.org/tags/Sherlock%20(TV)/works" but the actual story titles seem to be located within HTML code that looks like this on the page;

Code:
  <!--title, author, fandom-->
    <div class="header module">
      <h4 title="title">
  	    <a href="/works/117685">Disorder</a>
   		  by
        <!-- do not cache -->
      </h4>
As a result, I cannot figure out how to extract the article ID number for use. I am guessing that I will have to parse the HTML code of the page but have never done that type of extraction before. I am not familiar with Python or Beautiful Soup.

Thanks.
Flexicat is offline