View Single Post
Old 06-28-2011, 09:07 AM   #38
pini
Member
pini began at the beginning.
 
pini's Avatar
 
Posts: 18
Karma: 10
Join Date: Jun 2011
Location: Wageningen, The Netherlands
Device: PocketBook Pro 903, iPod touch
Cool rss pdf newspaper

@review

Hi, I'll stick to this thread although it's named a bit unrecognisable.
Since I've received my 903 I've been trying to read my newspaper on it and it works!
BUT, the rss is very short, say crippled, it doesn't come close to any completeness of a paper.
The full paper is accessible (I'm a subscriber) as pdf and quite readible I must say, but downloading all separate pages is a bit of pain, so there's my challenge!

I'd like to have an app (rss doesn't seem too appropriate), that simply downloads all pdf's of a day into a newly created folder.

I did look into the files myself:
the source of the 'online' version of the newspaper (http://www.volkskrant.nl/vk-online/) directs me to the 'current' day and has a number of lines like:

<option value="VKN01_001" selected="selected">001 Ten Eerste</option>
<option value="VKN01_002">002-003 Ten Eerste</option>
<option value="VKN01_004">004-005 Ten Eerste</option>
<option value="VKN01_006">006-007 Ten Eerste</option>
<option value="VKN01_008">008-009 Binnenland</option>
<option value="VKN02_001">001 V Cover</option>
<option value="VKN02_002">002-003 V Opening</option>
<option value="VKN02_016">016-017 V RTV</option>
<option value="VKN02_018">018-019 V Service</option>
<option value="VKN02_020">020 V DIDU</option>

these differ daily and contain the pages that are available THAT day; so these are 'parts' (VKN01, VKN02, might be many more, and VKM01 on saturdays) and page numbers (double pages, so only even numbered: 001, 002, 004, etc)

the current date is in the upper line of this section:
<option value="20110628">dinsdag 28 juni 2011</option>
<option value="20110627">maandag 27 juni 2011</option>
<option value="20110625">zaterdag 25 juni 2011</option>

combining these leads to the wanted URLs:
http://www.volkskrant.nl/vk-online/V...01_page001.pdf
http://www.volkskrant.nl/vk-online/V...01_page002.pdf
http://www.volkskrant.nl/vk-online/V...02_page020.pdf

I would probably be able to parse these out (then need to find-out how to actually load them etc.)

My real problem arises when I need to fill in my credentials. If not logged in, you get referred to a completely different page and I am lost...

However, when downloading these pages 'manually' using miniori, the website remembers my previous login and allows me to download no-problemo...
I could of course pm my name/pass, but only if this is an interesting next little project to work on or if you're interested in a Dutch newspaper yourself of course ;-)

hope to hear/read from you
cheers,

pini
pini is offline   Reply With Quote