|
|
View Full Version : Idea for a "Bookit" Plugin -- Maybe Kovid?
dsuden 05-22-2008, 07:41 AM Since discovering Kovid's wonderful web2lrf tool, I look at web pages in a different way. If I come across the large, interesting article on a web page, I immediately think about converting it to an e-book for more comfortable reading at my convenience in my Sony Reader. The only problem is that it takes some considerable effort at present, since I have to save the page to html, then open a DOS window (Start - Run - type in CMD, CD to the directory, and then try and remember and type-in all the necessary syntax, paths, and filenames to launch the conversion).
This morning, as I ran across a big web article in Firefox, it hit me. What about a Firefox front-end for web2lrf?!
The way it'd work... If you run across a great online article or other document you want to read off line in your reader, you'd click a little icon in a corner of the browser that looks like a book or a little bookworm or something, and the document would be instantly and automatically converted into an e-book and saved to your hard drive. Right-clicking on the icon would allow you to set options such as the place where files should be saved, etc. Or, instead maybe when you left-click the icon, a popup window would appear, letting you choose the where, filename, link-following, and other options supported by web2lrf. The latter would seem like the more powerful method since you could vary things with each e-book.
Am I the only one who thinks this would be amazingly cool?
jplumey 05-22-2008, 08:53 AM I think this might be a really useful tool, but I just don't know enough about how firefox would interact with the command line. I agree, it would be a cool little utility to have. Right now on my Mac I have an Automator script that does the same thing and works for any URL that I copy to my clipboard. I copy the URL (from a browser or email or whatever) and then run an Automator app that downloads the file and then executes the command line to convert the file to an LRF. Now if only Kovid has a command line utility to import into the database...
harrynewman 05-22-2008, 09:02 AM It is indeed a great idea.
The Librie (prs-500 predecessor) actually came with a tool for IE that did just what you suggest. I used it quite often to convert html ebooks.
Mr. Goodbar 05-22-2008, 09:22 AM For anyone that every used iSilo on a palm or pocket pc it had a plugin for Internet Explorer that would do this. You also had the capability to set the link depth level as well as whether that applied to just that site or off site links as well. Was a very easy way to grab online magazines, newspapers etc.
dsuden 05-22-2008, 10:58 AM Hi jplumley
If the utility is capable of taking variables from the command line, it should also be capable, with some adaptation, to be fed variables by a GUI of the kind I've described. The beauty of the plugin approach is that it would allow users to work in a very intuitive manner, right from within their browser. :-)
Dane
kovidgoyal 05-22-2008, 11:45 AM Sure, this would be a great idea, except that I don't know anything about writing firefox plugins (I use konqueror myself), but if someone wants to do this, I'll be happy to provide whatever support is needed from the calibre end.
beowulf573 05-22-2008, 03:08 PM Hmm, at first glance it doesn't look like it would be that hard. My wife has a working dinner tonight, let me see if I can through together a quick prototype.
beowulf573 05-22-2008, 04:43 PM Ok, I've got a working prototype...well it works in Firefox 2.0 under XP. Let me figure out how to adjust the settings and I'll upload it to my website.
This first version will not be very configurable, but once I get basics in place adding a menu and configuration dialogs shouldn't be that hard.
One question though, I don't use web2lrf very much, how does the recursion setting work? I've tried -r 1 and --max-recursions=1 but it always seems to be following the links.
jplumey 05-22-2008, 04:52 PM Setting up the basic extension seems simple, after looking at some tutorials. Shall we have a race? First one to develop one gets a free Reader? lol just kidding.
jplumey 05-22-2008, 04:54 PM LOL looks like I already lost. Hehe. Let me know if you need a beta tester.
beowulf573 05-22-2008, 06:10 PM Try this. (http://www.heorot.org/wordpress/index.php/bookit/)
Very, very alpha, use at your own risk. I've never written a Mozilla plugin before and I'm sure there's something broken somewhere.
kovidgoyal 05-22-2008, 06:16 PM Set -r 0 to not have it follow links.
revfish 05-22-2008, 08:51 PM If this works, will it work with the GUI Calibre or just the command line version. I don't really know much in regards to command line stuff and prefer my GUI.
I can see this plugin being very, very useful.
beowulf573 05-22-2008, 09:10 PM Right now it just invokes web2lrf. Is there a way to invoke the gui to do the conversion?
I've done some more work on it, I'll upload a new version before I go to bed.
kovidgoyal 05-22-2008, 09:13 PM I can probably add some code so that if you call the GUI with a file argument, it will add the file to the database.
beowulf573 05-22-2008, 10:34 PM Updated (http://www.heorot.org/wordpress/index.php/bookit/)
This version has an icon on the status bar you double click to kick off web2lrf prompting you first. You can turn off the prompt via the minimal preferences dialog and update the settings to use.
Tomorrow I'll look at adding a context menu to the icon for per-document overrides on the web2lrf settings and getting a rdf feed working for updates.
If anyone has any better ideas for a name than Bookit let me know. I just called it that because of the original post.
beowulf573 05-22-2008, 10:35 PM I can probably add some code so that if you call the GUI with a file argument, it will add the file to the database.
That would be great. Can you do it in such a way as to not launch the gui but just add to the database and exit?
kovidgoyal 05-22-2008, 10:57 PM What's the point of not launching the GUI?
beowulf573 05-23-2008, 07:08 AM I guess it depends upon how you use it, if you're creating a bunch of files you may not want to launch the gui every time. However, I'm easy. It can always be made an option.
Is using web2lrf the right way do this or should we just launch the gui with the url and various options?
kovidgoyal 05-23-2008, 09:15 AM Using web2lrf is the right way. The GUI has a server, which means that once it's started up calling it repeatedly with file names means that it isn't reloaded each time. Also, since presumably the plugin works by you visiting a site, then clicking on the "Book It" button, it works a site at a time?
beowulf573 05-23-2008, 10:15 AM That's right, it invokes web2lrf with the current url and places the resulting lrf in an output directory specified in the preferences.
FYI, I did a quick update over breakfast, it now will update via the rdf mechanism, but you need to do one more manual uninstall and reinstall via my site. After that you'll be able to do a quick 'check for updates'.
kovidgoyal 05-23-2008, 10:39 AM The next release of calibre, later today will support passing an argument to calibre.exe to laod books into the database. Once you think the extension is ready let me know and I'll add a link to it on the calibre download pages.
beowulf573 05-23-2008, 01:30 PM Great, I should have an hour or two available this evening to do an update. BTW, I used an png version of one of your svg files for the status bar icon. If this is an issue please let me know.
dynabook 05-23-2008, 02:05 PM @kovid
I installed this plug-in on Firefox on my Mac. But I can't find the path to web2lrf to put in the preferences. Is it in the calibre.app folder? If so what would be the path to the executable? Thanks.
kovidgoyal 05-23-2008, 02:25 PM It is in /usr/bin but that should be on your path already, so just putting web2lrf there should do the trick.
stasys 05-23-2008, 02:54 PM beowulf573, can you please connect your plugin with ripit plugin, so program would not attempt to download all junky banners from sites.
beowulf573 05-23-2008, 03:02 PM I'm not actually doing any downloading, I'm just passing the url and other options to web2lrf. Once it's to a point the core code is working well, I'll add support for as many web2lrf options as I can.
stasys 05-23-2008, 03:14 PM Well, then it is question to Kovid - is it possible to somehow block banners and other junk? After I'm cleaning web page view in Firefox with ripit plugin from annoying links and banners, I'm seeing them again in lrf.
kovidgoyal 05-23-2008, 03:30 PM Not really, web2lrf (at least its command line interface) doesn't have options for cleaning up downloaded HTML. If you want to do it like that a better route is to save the HTML to disk and run html2lrf on it.
stasys 05-23-2008, 04:02 PM ... and run html2lrf on it.
How to run html2l with cleaning options? It has no GUI.
beowulf573 05-23-2008, 06:58 PM Ack, I won't be able to get an update out tonight afterall. I've redone the options dialog and started an override dialog for one time settings, but don't have time to finish it tonight. And I'll be out of town all weekend, but I'll try to finish it up and add the new "add to db option" Monday when I get home.
beowulf573 05-24-2008, 08:54 AM Ok, I lied, I got an hour to work over breakfast. I did a quick update that has a new options dialog and better behind the scenes code. You can now right click (or whatever on a Mac) to get a context menu on the icon. Options brings up the standard dialog, create will let you modify the options before launching web2lrf. Changes made during create are not persisted.
Monday I'll try to flesh out the rest of the available web2lrf options and then start testing in earnest and see about porting to Firefox 3.
dsuden 05-24-2008, 03:34 PM Wow, that is just great...it works like a charm, beowulf573. It's exactly what i pictured...thanks so much for helping bring this to the Sony Reader community! I'm glad you're going to implement the full range of options, and I'm grateful that in the latest version you can access the options by a right-click, while a double-click will launch the conversion with or without confirmation. You've got this thing nailed!
Very cool that you even used an image of the Sony Reader as the icon. :-)
The Old Man 05-24-2008, 05:47 PM Monday I'll try to flesh out the rest of the available web2lrf options and then start testing in earnest and see about porting to Firefox 3.
I've been following this - a great idea!
Before you worry about Firefox 3 you may want to read this- second paragraph:
http://news.yahoo.com/s/pcworld/20080524/tc_pcworld/146296
Valloric 05-24-2008, 06:24 PM Monday I'll try to flesh out the rest of the available web2lrf options and then start testing in earnest and see about porting to Firefox 3.
Oh, oh, oh! Firefox 3 support, yippee! I'm running FF3B5 (RC1 is not in the Ubuntu repos yet :() on this laptop, and it would be great if you could give us bleeding edge guys some love. :D
Whow!! I sure love this pluggin!!
One thing I miss: Is there a way to merge LRF files for books coming in separate chapters?
Can I use Calibre to edit the LRF file and change fonts, add cover, etc?
Valloric 05-25-2008, 07:39 AM Can I use Calibre to edit the LRF file and change fonts, add cover, etc?
You can do this with Book Designer.
Thanx Vall ;)
Never had the need to use it. I'll give it a try.
soilwork 05-26-2008, 03:51 AM beowulf573, thanks for the great plugin. So far, I have done the conversion using a simple batch file but this plugin makes conversion easier. :thanks:
I have one suggestion though. Could you provide the more options letting users override the default setting of web2lrf such as
- Author
- Title
- left/right/top/bottom margin?
I think it would be great if users can customize the title/author/margins of the resulting document.
BTW, how about making a separate thread about this plugin under 'Reader content' subforum? I think this program deserves to be there and get stickied. :)
beowulf573 05-27-2008, 08:10 AM Great, I'm glad it's actually working for folks. I needed a small project to divert me away from my "year long never finished it just keeps going" project at work.
I just uploaded another update that has a fleshed out options dialog. It has many more options in a tabbed format, it hides one tab when doing a create for options that are irrelevant, and has a free form edit field for options I didn't include.
I'm going to start looking at Firefox 3 support (mainly because I'm running Ubuntu and need it myself) and a few other things. For the next update I'll start a new thread under Reader Content and link to it here.
Eddie
jplumey 05-27-2008, 11:17 AM Great job on the plugin Eddie, it's great that you got it up and running so quickly.
stasys 05-27-2008, 11:41 AM beowulf573,
What are those additional parameters for?
beowulf573 05-27-2008, 01:26 PM The additional parameters field is for anything else you want to stick on the web2lrf command line that I didn't include an option for.
For example, I didn't include an option for keeping the downloaded files, but you can stick this:
--keep-downloaded-files
into the additional parameter field and it will be added as is to the command line.
I did a quick update over lunch; I forgot to update the version number in a few places so I fixed a few small bugs, updated the localization text, and updated as 0.2.5.
beowulf573 05-27-2008, 10:46 PM I've create a new thread (http://www.mobileread.com/forums/showthread.php?p=189783) for the bookit plugin and put a new release up.
alexxxm 05-28-2008, 01:23 AM the plugin works perfectly here (Linux, fedora fc8 + Firefox 2),
it's a splendid addition!
A couple of questions:
1) I don't understand the difference between "max recursions" and "link levels" in the options
2) in a page I downloaded as a test (Wikitravel for Lyon, http://wikitravel.org/en/Lyon), the text of the links disappears - so that e.g. the text:
"... French language courses are available at Inflexyon, Alliance Francaise, Lyon-Bleu, Ecole Interculturelle de Francais."
is rendered as:
"French language courses are available at , , , ."
Is it possible to keep the links text? I found out it's the same even putting max recursions=1 and link levels=1
bye!
alessandro
beowulf573 05-28-2008, 09:39 PM 1) To be honest I'm not 100% clear either, it has to do with how deep and how many links are followed to create the final file.
2) I see the same thing when executing web2lrf from the command line. I didn't see anything obviously unusual about the links.
kovidgoyal 05-28-2008, 10:53 PM max-recursions controls the number of levels of recursion when downloading and link-levels controls the number of elvels of recursion when converting from HTML to LRF
There are two options because there are actually two separate components under the hood that do the downloading and converting respectively
The wikipedia links show up fine for me
kovidgoyal 05-28-2008, 11:39 PM 2) I see the same thing when executing web2lrf from the command line. I didn't see anything obviously unusual about the links.
Actually, there was a regression introduced in 0.4.61 causing this. Fixed in 0.4.63
alexxxm 05-29-2008, 07:13 AM Actually, there was a regression introduced in 0.4.61 causing this. Fixed in 0.4.63
perfect, I just checked and now the link text appears - as it should be.
Unfortunately the test did not start correctly the 1st time, since I still had in the preferences the values "max recursions=1, link levels=1" I used yesterday for testing ... the result was an lrf file more than 6MB big!
Speaking about it - you said:
max-recursions controls the number of levels of recursion when downloading and link-levels controls the number of elvels of recursion when converting from HTML to LRF
I'm still not so sure I understand it - so let's try a typical test case: if I wanted to get an HTML page, plus all the others just 1 link away and no more, which values should I set?
Last point: one option I always find useful in webscrapers programs is the possibility to ask them to follow just links local to the starting website. This is still not possible in web2lrf, correct?
Thanks for the help
alessandro
kovidgoyal 05-29-2008, 07:55 AM For links just one level away -r 1 should do the trick. You can easily ask the scraper to follow only links of a certain type using the --match-regexp option
alexxxm 05-30-2008, 02:08 AM For links just one level away -r 1 should do the trick. You can easily ask the scraper to follow only links of a certain type using the --match-regexp option
I'm still asking you here even thou I just discovered the other thread on "content" - I'll move there once I'm clarified with this:
I'm trying what you said, put in the bookit options "Max recursions=1", but I'm having trouble with regexps:
from the site http://www.cityguide.travel-guides.com/city/72/city_guide/Europe/Lyon.html
I wanted to follow all the internal links having "72" in the address:
I tried putting Meta-data>Additional parameters
"--match-regexp 72", "--match-regexp=72", "--match-regexp *72*", "--match-regexp=*72*", but none worked: it just saves the original page and that's all
any hint?
thanks...
alessandro
kovidgoyal 05-30-2008, 02:12 AM --match-regexp ".*72.*"
alexxxm 05-30-2008, 03:11 AM --match-regexp ".*72.*"
thanks but not, still not working.
The executed line is:
/usr/bin/python /usr/bin/web2lrf -u http://www.cityguide.travel-guides.com/city/72/city_guide/Europe/Lyon.html -o /usr/src/bookit/Lyon City Guide _ Lyon City Break.lrf -t Lyon City Guide | Lyon City Break -a Bookit -r 1 --link-levels=0 --left-margin=0 --right-margin=0 --top-margin=0 --bottom-margin=0 --match-regexp .*72.* default
and it still does not follow any link (all those visible on the right of the page)
alessandro
igorsk 05-30-2008, 04:00 AM I think you need to escape the asterisks on command line.
kovidgoyal 05-30-2008, 10:53 AM Try .+72.+ to avoid the asterisk problem
|