View Full Version : MapQuester World Atlas Conversion Tool


hacker
02-11-2005, 02:15 AM
As promised over here (http://www.mobileread.com/forums/showthread.php?t=2235), I've taken some time to clean up and cannibalize part of a spider I use daily for Plucker (http://www.plkr.org), and modified it to allow you to run it and spider MapQuest (http://www.mapquest.com/atlas/?region=index) to build yourself a World Atlas with images, country data, and lots of other bits.

The whole script is only 87 lines of actual code! (161 with comments and liberal spacing for readibility) I prefer writing clean, tight, well-commented code. My code is my business card, and this is no exception.

The script is written in Perl, my language of choice, but all modules used are either in core, or available via CPAN (http://search.cpan.org/) (perl -MCPAN -e 'install "Module::Name"'). It should be easy to run and figure out. I've commented it where required. My only requirement of using this, is that you don't rip off the code and claim you wrote it, or parts of it, and that you provide some feedback so I can improve it; good, bad, feature requests, bugs you find, whatever. I'd like to know!
Unfortunately, I cannot redistribute the completed version of the maps in mobile format, because that would violate MapQuest's copyright and Terms of Use, but you can see how good it looks in the screenshots below.

The entire script is attached below. Just grab the script and run it in an empty directory. It will spider and fetch the 238-or-so separate pages from MapQuest's World Atlas pages, strip out the unnecessary HTML, Javascript, stylesheets, and other non-visible bits, and write each country to its own file. All of the external links to country data is rewritten to reference the local copies. The only pieces fetched remotely are the images themselves.

When the spidering is complete, it outputs a top-level index file for you to point your mobile creation tool towards, so you can then spider the content yourself, and convert it to the format of your choice.

Hopefully many users will find this useful. Enjoy!

Chaos
02-11-2005, 02:36 AM
Oooooh... I think I'll play with this on the weekend. :)

Looks really nice.

Alexander Turcic
02-11-2005, 05:28 AM
Thank you David!

Colin Dunstan
02-11-2005, 05:46 AM
The script gave me the following error message:
$ perl mapquester.txt
Can't locate HTML/SimpleLinkExtor.pm in @INC (@INC contains: /usr/local/lib/perl5/site_perl/5.6.1/mach /usr/local/lib/perl5/site_perl/5.6.1 /usr/local/lib/perl5/site_perl /usr/local/lib/perl5/5.6.1/BSDPAN /usr/local/lib/perl5/5.6.1/mach /usr/local/lib/perl5/5.6.1 .) at mapquester.txt line 38.
BEGIN failed--compilation aborted at mapquester.txt line 38.
$
I then tried to install the missing module following your instruction, but came up with another error message:

# perl -MCPAN -e 'install "HTML:SimpleLinkExtor"'
Going to read /root/.cpan/sources/authors/01mailrc.txt.gz
Going to read /root/.cpan/sources/modules/02packages.details.txt.gz
Database was generated on Thu, 10 Feb 2005 22:38:20 GMT
CPAN: HTTP::Date loaded ok

There's a new CPAN.pm version (v1.76) available!
[Current version is v1.59_54]
You might want to try
install Bundle::CPAN
reload cpan
without quitting the current session. It should be a seamless upgrade
while we are running...


Going to read /root/.cpan/sources/modules/03modlist.data.gz
Warning: Cannot install HTML:SimpleLinkExtor, don't know what it is.
Try the command

i /HTML:SimpleLinkExtor/

to find objects with matching identifiers.
#

Any help would be appreciated!

Chaos
02-11-2005, 09:54 AM
That should be HTML::SimpleLinkExtor.

Always two colons.

hacker
02-11-2005, 09:55 AM
# perl -MCPAN -e 'install "HTML:SimpleLinkExtor"'

i /HTML:SimpleLinkExtor/Any help would be appreciated!You'll want to use two colons between the parent class and the sub-class, not just one as above. It should be executed as follows:perl -MCPAN -e 'install "HTML::SimpleLinkExtor"'Try that and see if it helps.

Colin Dunstan
02-11-2005, 10:09 AM
That worked! Thanks ;)

Colin Dunstan
02-11-2005, 10:25 AM
Unfortunately, I cannot redistribute the completed version of the maps in mobile format, because that would violate MapQuest's copyright and Terms of Use, but you can see how good it looks in the screenshots below.
But we could share the generated .html files here, couldn't we?

hacker
02-11-2005, 01:27 PM
I've just created three pre-compiled versions, for those users without the right Perl modules installed, using PAR (http://search.cpan.org/%7Eautrijus/PAR-0.87/lib/PAR.pm). These standalone executables contain all of the modules + the Perl stub to run it.

Just drop one of these files in an empty directory, and run it. No muss, no fuss.

Versions for FreeBSD, Linux and Windows are attached below. I haven't written any docs or README to go with it, but its self-explanatory. If it becomes popular enough, I'll repackage it as a "real" application in the same fashion with docs.

Enjoy!

albertc
03-04-2005, 04:39 AM
Versions for FreeBSD, Linux and Windows are attached below. I haven't written any docs or README to go with it, but its self-explanatory. If it becomes popular enough, I'll repackage it as a "real" application in the same fashion with docs.


Amazing! (as usual, David)

I have a request: could you please precompile a version for OS X, too?

Thanks
--
Albert

gmorgan_va
08-26-2009, 06:00 PM
Thanks for creating this great perl script. I needed proxy support and ended up adding my proxy manually by adding the line after your $ua->agent line:

$ua->proxy('http', 'http://your-http-proxy.com:proxyport/');

And it worked like a charm through the proxy. BTW, if you are behind a proxy it just writes the base continent files with 500 Not found errors in them. Not sure how to fix that so there is an error message instead.

Thanks for your beautiful script. Now to figure out how to iSilo these files.

nrapallo
08-26-2009, 11:05 PM
:chinscratch: Interesting thread.... ;)

gmorgan_va
08-27-2009, 09:28 AM
So for some reason MapQuest decided to put the Caribbean under North America. Not sure why but I'm also not a cartographer. I need to examine your script some more to see how it could be modified to gather a 2nd level index. In the short term, my hack to include the files for the Caribbean is to add it to the index. iSiloX's error log clued me into the fact that the Caribbean pages weren't being grabbed (had an error for each page...must be linked to from elsewhere in the site). Anyway, now the iSiloX conversion succeeded. Can't wait to see what this looks like on my PocketPC!