View Single Post
Old 04-16-2011, 05:14 AM   #4
aerodynamik
Enthusiast
aerodynamik doesn't litteraerodynamik doesn't litter
 
Posts: 43
Karma: 136
Join Date: Mar 2011
Device: Kindle Paperwhite
Ganymede is right. There is Spiegel Online, and then there is the actual magazine. Some of the actual magazine articles make it online but I think this is limited. In addition, IMO the writing for the magazine and the online version differ a lot sometimes in quality

I was looking into a recipe for the magazine and since I have not a lot of experience with recipes I would be happy to get some ideas on how to tackle this one.

The layout of the online edition is very close to the actual printed magazine, i.e. page 1 is on 1.html, page 2 on 2.html, etc.
If there is a page with an ad, the html page exists and shows a page, but has no text-content, only the image of the page.

There is a table of content which looks like this (I replaced the German naming of the classes with English)
Spoiler:
Code:
<ul>
	<li class=majorSection>
		<ul>
			<li class=article>
				<a href="http://wissen.spiegel.de/wissen/epaper/SP/2010/40/27.html" title="Artikel S. 27">
					<span class="contentPage">27</span>
					<span class="minorSection">TERRORISMUS</span>
					<span class="header">Zweifel an ...</span>
				</a>
			</li>
			<li>...</li>
		</ul>
	</li>
	<li class=majorSection>...</li>
</ul>


In addition, on the bottom of every page there is a link to the next and previous page. This navigation skips pages that have only ads. From the table of content it looks also like ads are not directly linked.

I could not find any URLs that actually match the content, e.g. spiegel.de/..../deutschland instead of the page number, which would allow me to use the standard feeds approach of BasicRecipe.

Is there a recipe that already parses a page similar like this? I.e., with page-number URLs or a similar table of content layout where I could peak?

Thanks in advance

Last edited by aerodynamik; 04-16-2011 at 05:17 AM. Reason: Fixed code section and added spoiler section for readability
aerodynamik is offline   Reply With Quote