11-30-2010, 03:11 AM | #1 |
Junior Member
Posts: 5
Karma: 10
Join Date: Mar 2009
Location: Kenya
Device: Kindle DX
|
Cablegate Wikileaks
Hi all,
was curious to see if anyone who has been following the Wikileaks brew-haha of recent days has found an RSS feed of the released documents at http://cablegate.wikileaks.org Looks like the docs are being released as a slow-trickle, and frankly I'd love to set up a Calibre recipe to capture these as they come through. However the RSS capabilities on the official site seem to be disabled. Should anyone have any success of either finding an RSS feed for the docs, or has found another ebook resource for them, please let me know. I did find one site that was converting each document to epub (http://www.iphoneworld.ca/news/2010/...e-epub-format/) but sadly they convert each document (currently 278) into individual epub's, making this a bit chaotic to keep organized. My preference would be to keep in either a single ebook with each document listed as a separate article. Cheers, JD |
12-02-2010, 09:22 PM | #2 |
Dances with penguins
Posts: 54
Karma: 10
Join Date: Oct 2010
Device: Sony PRS-350
|
This would be interesting to see
|
Advert | |
|
12-04-2010, 07:39 PM | #3 |
Junior Member
Posts: 7
Karma: 12
Join Date: Nov 2010
Location: Mexico
Device: Kindle
|
I thought it was a good idea so I made this
It will download the last two days (can be changed, just change the 'DAYS' variable) worth of released cables, and it uses a hardcoded ip address instead of the DNS name. EDIT: Now it uses one of a few mirrors... since apparently the IP is no longer working (DoSed?) EDIT: Get the latest version at https://github.com/leamsi/calibre_re...blegate.recipe I tried now to make the linebreaks of the cables more readable (still sucks in some things, but it should be better now). EDIT: Added karunaji's ideas (thanks!) to improve handling of linebreaks, as well as a couple other heuristics for the same thing. Last edited by leamsi; 02-06-2011 at 12:27 PM. Reason: Removed the (outdated) copy which was pasted here. Please use the github link |
12-05-2010, 04:48 PM | #4 |
Dances with penguins
Posts: 54
Karma: 10
Join Date: Oct 2010
Device: Sony PRS-350
|
Leamsi thank you for the recipe! I was hoping someone would make something.. i'm not so good with python myself. Does the job quite well.
|
12-06-2010, 12:02 AM | #5 |
Junior Member
Posts: 7
Karma: 12
Join Date: Nov 2010
Location: Mexico
Device: Kindle
|
Glad you like find it useful!
Even if this feels like reading gossip I find it fascinating. BTW, just did a quick change to add wikileaks.ch as the default host, since it doesn't seem to fail as often now. |
Advert | |
|
12-06-2010, 02:07 AM | #6 |
Dances with penguins
Posts: 54
Karma: 10
Join Date: Oct 2010
Device: Sony PRS-350
|
Careful doing that, some ISP's are actively blocking that one since its "official" i suppose.
|
12-06-2010, 06:16 PM | #7 |
Dances with penguins
Posts: 54
Karma: 10
Join Date: Oct 2010
Device: Sony PRS-350
|
Leamsi: I've noticed a bit of strange behavior with it howerver. I was wondering if you or someone else who knows more then me could address it.
The days variable seems to base itself on December 04, today being 06 i set it to download 1 day, and i got the news that was released on the fourth. Nothing about todays, the day before that or the fifth. Any ideas? |
12-06-2010, 06:56 PM | #8 | |
Junior Member
Posts: 7
Karma: 12
Join Date: Nov 2010
Location: Mexico
Device: Kindle
|
Quote:
Also it seems that no leaks have been released today? The latest in wikileaks.ch is 5th Dec... nothing on the 6th. Which mirror are you using? (I can take a closer look when I get home as I'm at work right now) |
|
12-06-2010, 07:00 PM | #9 | |
Dances with penguins
Posts: 54
Karma: 10
Join Date: Oct 2010
Device: Sony PRS-350
|
Quote:
|
|
12-11-2010, 03:49 AM | #10 |
Evangelist
Posts: 421
Karma: 1033566
Join Date: Mar 2010
Location: Latvia
Device: Kindle 3 Wifi, Bookeen Opus
|
I have suggestion how to improve line unwrapping a little bit. You only need to unwrap lines that are longer than certain threshold, for example, 50 chars. Shorter lines are probably headings, so do not remove line breaks for them.
Also, lines containing ------------ are used for underlining, so no removal of line breaks before and after them as well. Python is not my forte but here is an example how it looks like: cables_201012102105.epub |
12-11-2010, 03:57 PM | #11 |
Junior Member
Posts: 5
Karma: 10
Join Date: Mar 2009
Location: Kenya
Device: Kindle DX
|
a bit late back to the thread as I've been traveling. Happy to see that this thread took off, and appreciate the work of leamsi et. al.
Am off into the 'wilderness' for a few days and will be great to have these to read. Many thanks! |
12-11-2010, 04:53 PM | #12 |
Dances with penguins
Posts: 54
Karma: 10
Join Date: Oct 2010
Device: Sony PRS-350
|
Thanks for your work on this recipe, its a great help to have it in git now.
|
12-12-2010, 02:42 PM | #13 |
Junior Member
Posts: 1
Karma: 10
Join Date: Dec 2010
Device: none
|
There is an up-to-date RSS feed of cables as they are released at http://www.leakfeed.com/ It also has JSON and XML formats and a basic API for searching/querying cables.
|
12-15-2010, 11:11 PM | #14 |
Dances with penguins
Posts: 54
Karma: 10
Join Date: Oct 2010
Device: Sony PRS-350
|
Gits gone.
|
12-16-2010, 12:07 AM | #15 |
Junior Member
Posts: 7
Karma: 12
Join Date: Nov 2010
Location: Mexico
Device: Kindle
|
|