Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 01-01-2013, 10:44 PM   #1
vaniaspeedy
Connoisseur
vaniaspeedy knows what is on the back of the AURYN.vaniaspeedy knows what is on the back of the AURYN.vaniaspeedy knows what is on the back of the AURYN.vaniaspeedy knows what is on the back of the AURYN.vaniaspeedy knows what is on the back of the AURYN.vaniaspeedy knows what is on the back of the AURYN.vaniaspeedy knows what is on the back of the AURYN.vaniaspeedy knows what is on the back of the AURYN.vaniaspeedy knows what is on the back of the AURYN.vaniaspeedy knows what is on the back of the AURYN.vaniaspeedy knows what is on the back of the AURYN.
 
Posts: 51
Karma: 9502
Join Date: Oct 2010
Location: California
Device: Kindle 3 WiFi, Kindle 4 Touch
Generate eBook from Bookmarks Archive

Hello all. I regularly bookmark articles of interest to me, for later reference or easy access later. However, my reference folder is depended on those pages staying online and unchanged. I'd like to not only be able to download them for my records (which I already do), but also compile them into an eBook for easier reading and convenience. Here's the layout:

Abstract goal - turn some subset of my bookmarks into a logically structured ebook.

Specific Solution
1. Export bookmarks to html, then use a program such wget or httrack to download all of the the raw html files
2. Filter the html files - removing comments, ads, and extra fluff.
3. Using the html index from (1), with each link pointing to a cleaned html file, compile an ebook with a TOC, possibly recreating the folder structure of my bookmarks.

Main roadblocks:
- for (2) - I'm not sure how to do this. Any ideas for tools?
- for (3) - I need to find a way to preserve my folder structure.

I'd love some comments on the overall plan, which tools to use, and general feedback. Thanks!
vaniaspeedy is offline   Reply With Quote
Old 01-02-2013, 02:38 AM   #2
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
What I would do, is upload those HTML files in Sigil and clean then up there. Layout/style is not to be retained probably anyway.
I would also ditch the folder structure, since it has no meaning in an ebook.
If you want to maintain it, I would advise creating it by hand. You can the information on the wiki or the site of jedisaber for that. It will be a good read anyway.
Toxaris is offline   Reply With Quote
Old 01-02-2013, 08:39 AM   #3
dgillette.rm
Connoisseur
dgillette.rm knows the difference between 'who' and 'whom'dgillette.rm knows the difference between 'who' and 'whom'dgillette.rm knows the difference between 'who' and 'whom'dgillette.rm knows the difference between 'who' and 'whom'dgillette.rm knows the difference between 'who' and 'whom'dgillette.rm knows the difference between 'who' and 'whom'dgillette.rm knows the difference between 'who' and 'whom'dgillette.rm knows the difference between 'who' and 'whom'dgillette.rm knows the difference between 'who' and 'whom'dgillette.rm knows the difference between 'who' and 'whom'dgillette.rm knows the difference between 'who' and 'whom'
 
dgillette.rm's Avatar
 
Posts: 95
Karma: 10072
Join Date: Apr 2008
Device: sony
I would suggest starting with Instapaper or similar to help with the cleanup. I do not know what you mean by folders in a book sense. If you mean chapter and section, that should be easy in something like Sigil. If that is not what you mean, I agree with Toxaris that it is meaningless in an ebook.
dgillette.rm is offline   Reply With Quote
Old 01-05-2013, 04:17 PM   #4
vaniaspeedy
Connoisseur
vaniaspeedy knows what is on the back of the AURYN.vaniaspeedy knows what is on the back of the AURYN.vaniaspeedy knows what is on the back of the AURYN.vaniaspeedy knows what is on the back of the AURYN.vaniaspeedy knows what is on the back of the AURYN.vaniaspeedy knows what is on the back of the AURYN.vaniaspeedy knows what is on the back of the AURYN.vaniaspeedy knows what is on the back of the AURYN.vaniaspeedy knows what is on the back of the AURYN.vaniaspeedy knows what is on the back of the AURYN.vaniaspeedy knows what is on the back of the AURYN.
 
Posts: 51
Karma: 9502
Join Date: Oct 2010
Location: California
Device: Kindle 3 WiFi, Kindle 4 Touch
Updates

I've created an early alpha of a process that seems to work.

1. export bookmarks in html from firefox, leave only the links of interest (notepad++ is great)
2. clean using it the following regex expressions, using find and replace, replacing with empty blanks.
a) <[^>^A]+>
b) <A HREF="
c) " ADD_DATE="[0-9 ]+" LAST_MODIFIED="[0-9 ]+"
d) >[^<]+<\/A>
e) <H3 ADD_DATE="[0-9]+" LAST_MODIFIED="[0-9]+">.*

3. copy cleaned links to urls.txt
4. run this shell script:

Code:
#!/bin/sh

for url in `cat urls.txt `; do title=`curl $url 2>&1 | grep -i '<title>.*</title>' | sed -e 's/<[^>]*>//g'` &&  echo $url | mail -s "$title" YOUR_EMAIL@instapaper.com ; done
5. login to instapaper, download epub.

Limitations
Seems that instapaper only exports the last 20 unread articles, so I've been looking in to using a Calibre recipe that would download the newest 20, archive them, and grab the next 20. This loop could be run until I have a pile of epubs, which would be later glued together using some software.

Questions
1. Does anyone know of a prebuilt recipe that can do this?
2. Are there any programs that can automate the process of gluing together multiple eBooks, while also gluing together the TOC?
vaniaspeedy is offline   Reply With Quote
Old 01-05-2013, 05:58 PM   #5
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
For recipe questions I would suggest you post in the recipe subforum of the Calibre forum.
DaleDe is offline   Reply With Quote
Reply

Tags
bookmarks, html, parsing, scripting


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Generate eBook from Bookmarks Archive vaniaspeedy General Discussions 0 01-01-2013 10:44 PM
How to generate cover from the first page of an ebook? purgatorios Library Management 2 11-17-2012 07:42 AM
Could not find an ebook in the archive. emilyf Devices 7 04-02-2011 06:10 PM
Ebook archive in case of the apocalypse? jblitereader General Discussions 52 08-22-2010 06:13 PM
instapaper.com - Bookmarks service that generate epub and mobi books celson Ectaco jetBook 3 03-13-2010 11:10 PM


All times are GMT -4. The time now is 05:08 AM.


MobileRead.com is a privately owned, operated and funded community.