|  01-01-2013, 10:44 PM | #1 | 
| Connoisseur            Posts: 51 Karma: 9502 Join Date: Oct 2010 Location: California Device: Kindle 3 WiFi, Kindle 4 Touch | 
				
				Generate eBook from Bookmarks Archive
			 
			
			Hello all. I regularly bookmark articles of interest to me, for later reference or easy access later. However, my reference folder is depended on those pages staying online and unchanged. I'd like to not only be able to download them for my records (which I already do), but also compile them into an eBook for easier reading and convenience. Here's the layout: Abstract goal - turn some subset of my bookmarks into a logically structured ebook. Specific Solution 1. Export bookmarks to html, then use a program such wget or httrack to download all of the the raw html files 2. Filter the html files - removing comments, ads, and extra fluff. 3. Using the html index from (1), with each link pointing to a cleaned html file, compile an ebook with a TOC, possibly recreating the folder structure of my bookmarks. Main roadblocks: - for (2) - I'm not sure how to do this. Any ideas for tools? - for (3) - I need to find a way to preserve my folder structure. I'd love some comments on the overall plan, which tools to use, and general feedback. Thanks! | 
|   |   | 
|  01-02-2013, 02:38 AM | #2 | 
| Wizard            Posts: 4,520 Karma: 121692313 Join Date: Oct 2009 Location: Heemskerk, NL Device: PRS-T1, Kobo Touch, Kobo Aura | 
			
			What I would do, is upload those HTML files in Sigil and clean then up there. Layout/style is not to be retained probably anyway. I would also ditch the folder structure, since it has no meaning in an ebook. If you want to maintain it, I would advise creating it by hand. You can the information on the wiki or the site of jedisaber for that. It will be a good read anyway. | 
|   |   | 
| Advert | |
|  | 
|  01-02-2013, 08:39 AM | #3 | 
| Connoisseur            Posts: 95 Karma: 10072 Join Date: Apr 2008 Device: sony | 
			
			I would suggest starting with Instapaper or similar to help with the cleanup. I do not know what you mean by folders in a book sense. If you mean chapter and section, that should be easy in something like Sigil. If that is not what you mean, I agree with Toxaris that it is meaningless in an ebook.
		 | 
|   |   | 
|  01-05-2013, 04:17 PM | #4 | 
| Connoisseur            Posts: 51 Karma: 9502 Join Date: Oct 2010 Location: California Device: Kindle 3 WiFi, Kindle 4 Touch | 
				
				Updates
			 
			
			I've created an early alpha of a process that seems to work. 1. export bookmarks in html from firefox, leave only the links of interest (notepad++ is great) 2. clean using it the following regex expressions, using find and replace, replacing with empty blanks. a) <[^>^A]+> b) <A HREF=" c) " ADD_DATE="[0-9 ]+" LAST_MODIFIED="[0-9 ]+" d) >[^<]+<\/A> e) <H3 ADD_DATE="[0-9]+" LAST_MODIFIED="[0-9]+">.* 3. copy cleaned links to urls.txt 4. run this shell script: Code: #!/bin/sh for url in `cat urls.txt `; do title=`curl $url 2>&1 | grep -i '<title>.*</title>' | sed -e 's/<[^>]*>//g'` && echo $url | mail -s "$title" YOUR_EMAIL@instapaper.com ; done Limitations Seems that instapaper only exports the last 20 unread articles, so I've been looking in to using a Calibre recipe that would download the newest 20, archive them, and grab the next 20. This loop could be run until I have a pile of epubs, which would be later glued together using some software. Questions 1. Does anyone know of a prebuilt recipe that can do this? 2. Are there any programs that can automate the process of gluing together multiple eBooks, while also gluing together the TOC? | 
|   |   | 
|  01-05-2013, 05:58 PM | #5 | 
| Grand Sorcerer            Posts: 11,470 Karma: 13095790 Join Date: Aug 2007 Location: Grass Valley, CA Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7 | 
			
			For recipe questions I would suggest you post in the recipe subforum of the Calibre forum.
		 | 
|   |   | 
| Advert | |
|  | 
|  | 
| Tags | 
| bookmarks, html, parsing, scripting | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Generate eBook from Bookmarks Archive | vaniaspeedy | General Discussions | 0 | 01-01-2013 10:44 PM | 
| How to generate cover from the first page of an ebook? | purgatorios | Library Management | 2 | 11-17-2012 07:42 AM | 
| Could not find an ebook in the archive. | emilyf | Devices | 7 | 04-02-2011 06:10 PM | 
| Ebook archive in case of the apocalypse? | jblitereader | General Discussions | 52 | 08-22-2010 06:13 PM | 
| instapaper.com - Bookmarks service that generate epub and mobi books | celson | Ectaco jetBook | 3 | 03-13-2010 11:10 PM |