View Single Post
Old 08-23-2011, 10:23 PM   #4
rogerx
Enthusiast
rogerx doesn't litterrogerx doesn't litterrogerx doesn't litter
 
Posts: 29
Karma: 244
Join Date: Aug 2011
Location: North Pole, Alaska
Device: Kindle DXG
I've done some research, and it looks like the above snippet is leading me into a more undesirable complex recipe file.

Even though I get almost 100% good results with a basic news recipe, to just clip the date from one (1) undesirable line with this snippet looks to require manual rewriting all of the Calibre functions for the entire HTML fetching & rendering operations. (Similar to the New York Times recipe.)

As such, a simple regexp (ie. preprocess_regexps calibre function) should be able to clip the date from the following line of html tags: (Note, undesirable tags occur after the date and immediately following </span> tag.)

Code:
# I just need "story_item_date updated", trash the rest of the line!
    # <div class="signature_line"><span title="2011-08-22T10:35:58Z" class="story_item_date updated">Aug 22, 2011</span>&nbsp;|&nbsp;1463&nbsp;views&nbsp;|&nbsp;19&nbsp;<a href="/pages/full_story/push?article........class="signature_email_message"></span></div>

    #preprocess_regexps
(... well, I need to read into more detailed regexp later.)

Last edited by rogerx; 08-23-2011 at 10:24 PM. Reason: grammar
rogerx is offline   Reply With Quote