04-11-2009, 02:29 PM | #1 |
Sir Penguin of Edinburgh
Posts: 12,375
Karma: 23555235
Join Date: Apr 2007
Location: DC Metro area
Device: Shake a stick plus 1
|
Free Ebook: World Fact eBook
The CIA maintains a reference manual called the World Factbook. They used to release a new edition each year; recently they decided to only maintain the online edition. I found it to be an excellent source of information, and wanted to make an off line copy.
The result is what I call the World Fact eBook. It is currently only available in Mobipocket. I decided to focus on Mobipocket because the format has certain specialized html tags. This ebook has a search index for article title, keyword, country name, and flag. It can also be used as a dictionary by most versions of Mobipocket Reader. This means that if you are reading news on, for example the Kindle, you can look up a country name to learn more information. The current version, 0.7, can be downloaded here. Epub, IMP, and Sony LRF will be available soon. P.S. This was my first large project. The source material consisted of over 500 html files, and close to 800 pictures. Most of the content on the web pages had to be removed. I wrote a fair amount of code to automate the cleanup. I am looking for a new project where I can repeat the process. If you would like some other website converted into an ebook, please let me know. (Please consider the copyright situation before you ask.) |
04-11-2009, 02:55 PM | #2 |
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Great, this Nate the Great!
The conversions of your v0.6 .prc ebook are now available in the EPUB, IMP and LRF E-Book Uploads sections. You do know that the 2009 CIA World Factbook is due out this spring... Last edited by nrapallo; 04-11-2009 at 02:58 PM. Reason: added links |
Advert | |
|
04-11-2009, 03:11 PM | #3 |
Sir Penguin of Edinburgh
Posts: 12,375
Karma: 23555235
Join Date: Apr 2007
Location: DC Metro area
Device: Shake a stick plus 1
|
Yes and no. The Factbook is updated every 2 weeks. The source material is current as of 24 February 2009. But they ddi say they will release a major update soon.
|
04-11-2009, 03:24 PM | #4 |
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
I know what you mean about updates.
I just converted an ereader .pdb of the 2008 CIA World Factbook (updated as of March 19/09) that is available here. This CIA World Factbook 2008, includes Rank Order Pages and uses smaller sized images (640x400 max.). There's a point where an update is no longer an update... |
04-12-2009, 08:22 AM | #5 |
Wizard
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
First of all, thanks for the conversion. I had a (relatively quick) look at it, this is really a nicely done book.
I do have a few technical questions though. I have two books with (I assume) similar source material, they are reference books consisting of many html pages with images and some active content (lookup etc.). I know that I'll have to remove the active content, but beyond that I'm really pretty clueless as to what tools I should use for the conversion. I'd like to get a toc like yours, where you first select the character and then get a list of topics starting with that character, but I don't know how to do that (apart from manually writing the html page, but that would be a major pain in the ass). So, my questions are: - What did you use to parse the html files? I'm assuming some scripting language? - What program did you use to build the Mobi-file from the multiple html files? Thanks in advance for your answers. |
Advert | |
|
04-12-2009, 09:16 AM | #6 | |
Sir Penguin of Edinburgh
Posts: 12,375
Karma: 23555235
Join Date: Apr 2007
Location: DC Metro area
Device: Shake a stick plus 1
|
Quote:
For the finishing touches I used Textpad. It can use regular expressions for the search functions, as well as work on several hundred open files at once. The TOCs didn't quite have to be done by hand. One of the appendices already had one. After changing it to a form I prefer, I copied it to the other files. The anchor tags did have to be put in by hand, though. I then used Mobipocket Creator to make the ebook. The user interface leaves something to be desired, but given that it saves you the effort of manually creating the OPF file, it's not bad. |
|
04-12-2009, 11:03 AM | #7 | |
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Quote:
The side-effect is that all the <a name>'s inserted can be then be referenced from within the ebook. That may have to be done by hand (I semi-automated this) but half the task was done, the insertion of the <a name> (or <a id>) and assigment of unique id labels. This was the technique I used to add all those new hyperlinks to the Webster's Dictionary 1913 v2.0. (A version 2.1 with minor improvements will be uploaded soon ). Last edited by nrapallo; 04-12-2009 at 06:09 PM. Reason: Typo |
|
04-12-2009, 12:46 PM | #8 |
Wizard
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
Thanks for the help. I'll give it a go and see how it turns out.
|
04-14-2009, 02:58 PM | #9 |
Karmaniac
Posts: 2,553
Karma: 11499146
Join Date: Oct 2008
Location: Miami FL
Device: PRS-505, Jetbook, + Mini, +Color, Astak Ez Reader Pro, PPW1, Aura H2O
|
First off, that's a great idea!
Perhaps I'll be working on an LRF version of this book myself,for fun, and share it. I just wanted to remind you of the line in the copyright which states: Code:
"...The official seal of the CIA, however, may NOT be copied without permission as required by the CIA Act of 1949 (50 U.S.C. section 403m). Misuse of the official seal of the CIA could result in civil and criminal penalties...." So be sure you don't put the seal in your book! Otherwise thanks for the effort! Looks like an interesting book to assemble! |
04-14-2009, 03:02 PM | #10 |
Karmaniac
Posts: 2,553
Karma: 11499146
Join Date: Oct 2008
Location: Miami FL
Device: PRS-505, Jetbook, + Mini, +Color, Astak Ez Reader Pro, PPW1, Aura H2O
|
PS: Is the 2008 book edition an update to the 2007,or do they keep the originals of 2007, 2006,2005 etc somewhere?
It would be interesting to have one of their first books (eg '92) and compare it to a current release! |
04-14-2009, 03:12 PM | #11 | |
Sir Penguin of Edinburgh
Posts: 12,375
Karma: 23555235
Join Date: Apr 2007
Location: DC Metro area
Device: Shake a stick plus 1
|
Quote:
They just have the one current copy. |
|
04-14-2009, 07:39 PM | #12 |
Wizard
Posts: 3,442
Karma: 300001
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
|
You can probably pull some older copies from archive.org.
|
04-15-2009, 11:03 AM | #13 | |
Karmaniac
Posts: 2,553
Karma: 11499146
Join Date: Oct 2008
Location: Miami FL
Device: PRS-505, Jetbook, + Mini, +Color, Astak Ez Reader Pro, PPW1, Aura H2O
|
Quote:
but tell me, you took the printed version,just create one big HTML file,and add the flag + flag info to every country? Unfortunately in LRF I can't keep the original formatting.. I was thinking in lines of creating one chapter per country, subchapter being the flag, flaginfo, map, and following all the other information. What approach did you use? (I'd be interested to see how you did it). I was basically merging all info in one big file,cleaning it up a bit with notepad++ advanced search and replace, and then infusing all flag files manually (basically some copy paste work). I used the print version,because it's cleaner than the web version to work with. (oh,also remove the tables,that's a bit of a pain,I'm still looking into that. it's easy to remove them with Search&Replace, but I don't want to delete any valuable info, neither end up with broken HTML code). Then at the end I still need to add the appendixes and the rankorder directory (2001rank.html to 2211rank.html)... I'm still figuring out how to do that; analyzing the content thereof... Last edited by ProDigit; 04-15-2009 at 11:09 AM. |
|
04-15-2009, 11:36 AM | #14 | |
Sir Penguin of Edinburgh
Posts: 12,375
Karma: 23555235
Join Date: Apr 2007
Location: DC Metro area
Device: Shake a stick plus 1
|
Quote:
I started with the web version, but kept none of the original formatting. Instead, I replaced it with some very basic html tags. The formatting of each group is internally consistent. When you figure out what looks best on the Sony Reader, you can change it all at once. If you instead decided to copy everything in to one file, you will need to edit it in a linear fashion. It's going to take you at least 20 hours of work to get the source material to where I have it. Editing it one line at a time is really boring. |
|
04-15-2009, 05:39 PM | #15 |
Wizard
Posts: 3,442
Karma: 300001
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
|
Maybe you could upload your simplified HTML too?
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Links to lot of eBook shops (free & commercial ones) ebook-spot.de | ebook-spot.de | Deals and Resources (No Self-Promotion or Affiliate Links) | 0 | 11-23-2009 02:48 PM |
2,000,000 free e-books from the 4th Annual World eBook Fair | Sonist | Deals and Resources (No Self-Promotion or Affiliate Links) | 4 | 07-15-2009 11:31 PM |
World eBook Fair - Over 2 Million eBooks To Choose From For Free one month | Tdew | Deals and Resources (No Self-Promotion or Affiliate Links) | 5 | 07-13-2009 03:22 PM |
2008 World Fact Book Project | Nate the great | Workshop | 34 | 06-27-2009 12:14 AM |
Free Ebook on Kindle: World Wide Rave | koland | Deals and Resources (No Self-Promotion or Affiliate Links) | 7 | 04-17-2009 01:00 AM |