|
|
#1 |
|
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 51
Karma: 9502
Join Date: Oct 2010
Location: California
Device: Kindle 3 WiFi, Kindle 4 Touch
|
Generate eBook from Bookmarks Archive
Abstract goal - turn some subset of my bookmarks into a logically structured ebook. Specific Solution 1. Export bookmarks to html, then use a program such wget or httrack to download all of the the raw html files 2. Filter the html files - removing comments, ads, and extra fluff. 3. Using the html index from (1), with each link pointing to a cleaned html file, compile an ebook with a TOC, possibly recreating the folder structure of my bookmarks. Main roadblocks: - for (2) - I'm not sure how to do this. Any ideas for tools? - for (3) - I need to find a way to preserve my folder structure. I'd love some comments on the overall plan, which tools to use, and general feedback. Thanks!
__________________
http://www.eBookJuggler.com Tutorials on eBook conversion, device optimization, Calibre tricks, and more. |
|
|
|
|
|
#2 |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,108
Karma: 927511
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-300, PRS-T1
|
What I would do, is upload those HTML files in Sigil and clean then up there. Layout/style is not to be retained probably anyway.
I would also ditch the folder structure, since it has no meaning in an ebook. If you want to maintain it, I would advise creating it by hand. You can the information on the wiki or the site of jedisaber for that. It will be a good read anyway.
__________________
Creator and maintainer of the e-Book Tools Word add-in. Creator and maintainer of the Clean HTML macro for MS Word. |
|
|
|
|
Enthusiast
|
|
|
|
#3 |
|
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 93
Karma: 10072
Join Date: Apr 2008
Device: sony
|
I would suggest starting with Instapaper or similar to help with the cleanup. I do not know what you mean by folders in a book sense. If you mean chapter and section, that should be easy in something like Sigil. If that is not what you mean, I agree with Toxaris that it is meaningless in an ebook.
|
|
|
|
|
|
#4 |
|
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 51
Karma: 9502
Join Date: Oct 2010
Location: California
Device: Kindle 3 WiFi, Kindle 4 Touch
|
Updates
I've created an early alpha of a process that seems to work.
1. export bookmarks in html from firefox, leave only the links of interest (notepad++ is great) 2. clean using it the following regex expressions, using find and replace, replacing with empty blanks. a) <[^>^A]+> b) <A HREF=" c) " ADD_DATE="[0-9 ]+" LAST_MODIFIED="[0-9 ]+" d) >[^<]+<\/A> e) <H3 ADD_DATE="[0-9]+" LAST_MODIFIED="[0-9]+">.* 3. copy cleaned links to urls.txt 4. run this shell script: Code:
#!/bin/sh for url in `cat urls.txt `; do title=`curl $url 2>&1 | grep -i '<title>.*</title>' | sed -e 's/<[^>]*>//g'` && echo $url | mail -s "$title" YOUR_EMAIL@instapaper.com ; done Limitations Seems that instapaper only exports the last 20 unread articles, so I've been looking in to using a Calibre recipe that would download the newest 20, archive them, and grab the next 20. This loop could be run until I have a pile of epubs, which would be later glued together using some software. Questions 1. Does anyone know of a prebuilt recipe that can do this? 2. Are there any programs that can automate the process of gluing together multiple eBooks, while also gluing together the TOC?
__________________
http://www.eBookJuggler.com Tutorials on eBook conversion, device optimization, Calibre tricks, and more. |
|
|
|
|
|
#5 |
|
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,702
Karma: 3644259
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2
|
For recipe questions I would suggest you post in the recipe subforum of the Calibre forum.
__________________
Dale DePriest http://pages.suddenlink.net/dalede or http://daledepriest.wikispaces.com currently using an EZ Reader or a Literati or my iPad. |
|
|
|
![]() |
| Tags |
| bookmarks, html, parsing, scripting |
| Thread Tools | Search this Thread |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Generate eBook from Bookmarks Archive | vaniaspeedy | General Discussions | 0 | 01-01-2013 10:44 PM |
| How to generate cover from the first page of an ebook? | purgatorios | Library Management | 2 | 11-17-2012 07:42 AM |
| Could not find an ebook in the archive. | emilyf | Devices | 7 | 04-02-2011 06:10 PM |
| Ebook archive in case of the apocalypse? | jblitereader | General Discussions | 52 | 08-22-2010 06:13 PM |
| instapaper.com - Bookmarks service that generate epub and mobi books | celson | Ectaco jetBook | 3 | 03-13-2010 11:10 PM |