07-23-2007, 07:51 PM | #1 |
Addict
Posts: 202
Karma: 692
Join Date: Oct 2006
Device: SONY reader
|
HTML merge tool needed
Hi
do you guys know a good tool to merge a bunch of HTML files to form a single document? What I'm willing to do is to download manuals or electronic books that are available online and convert them into either LRF, PDF or RTF. The conversion works best on a single file. An example of what I'm trying to convert is here: http://www.zeroc.com/doc/Ice-3.2.0/manual/ I guess there are HTML merge utilities out there... just I don't know where thanks |
07-23-2007, 09:02 PM | #2 | |
Resident Curmudgeon
Posts: 73,897
Karma: 128597114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
|
|
Advert | |
|
07-24-2007, 11:15 AM | #3 |
Addict
Posts: 202
Karma: 692
Join Date: Oct 2006
Device: SONY reader
|
Thanks, I know...
I tried Kovid's tools but am not always happy with the results. I guess it'll take some shell scripting sed/awk/cat to get the work done... |
07-24-2007, 11:40 AM | #4 |
creator of calibre
Posts: 43,845
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
May I ask why not?
|
07-24-2007, 11:51 AM | #5 |
Addict
Posts: 202
Karma: 692
Join Date: Oct 2006
Device: SONY reader
|
Hi Kovid
can you please try this: web2lrf http://www.zeroc.com/doc/Ice-3.2.0/manual/ I guess it's the style-sheet or could even be that the pages are not perfect XHTML... thanks |
Advert | |
|
07-24-2007, 12:37 PM | #6 |
creator of calibre
Posts: 43,845
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The correct commandline for that site should be
Code:
web2lrf --url http://www.zeroc.com/doc/Ice-3.2.0/manual/toc.html |
07-24-2007, 12:39 PM | #7 |
Addict
Posts: 202
Karma: 692
Join Date: Oct 2006
Device: SONY reader
|
yes... my bad.
|
07-24-2007, 01:18 PM | #8 |
creator of calibre
Posts: 43,845
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The site converted fine for me, what was the problem?
|
07-24-2007, 07:05 PM | #9 |
Addict
Posts: 202
Karma: 692
Join Date: Oct 2006
Device: SONY reader
|
The results are ugly.
e.g. formatting sych as code examples vs. "normal" text was lost. Another problem - which is not the fault of the html2lrf tool - is that it would be nice to remove the header and footer such as the Previous/Next links from every page. These are needed when the document is presented online, but are only noise when it gets converted for offline viewing. |
07-24-2007, 08:11 PM | #10 |
creator of calibre
Posts: 43,845
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
the code examples are in a monospace font, what other formatting do you mean? As for stripping the header/footers easily done by creating a profile for web2lrf.
|
07-24-2007, 08:26 PM | #11 |
Addict
Posts: 202
Karma: 692
Join Date: Oct 2006
Device: SONY reader
|
thanks
how do I create a profile? where can I find some info on that? did you use the web2lrf script to get the monospace? |
07-24-2007, 10:29 PM | #12 |
creator of calibre
Posts: 43,845
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Yeah the commandline i posted before gave me monospaced code samples. Unfortunately, at the moment the only way to create new profiles is by editing the source. I'll add an easier way when I get the time. If you're interested look at the web2lrf thread where I've posted the link to some example profiles.
|
07-25-2007, 03:34 AM | #13 |
Addict
Posts: 314
Karma: 1002965
Join Date: Mar 2006
Location: UK
Device: ILiad. Gen 3, PocketBook 360, Kobo Aura HD, Kindle Oasis 2
|
|
07-25-2007, 12:09 PM | #14 |
Addict
Posts: 202
Karma: 692
Join Date: Oct 2006
Device: SONY reader
|
thanks Kovid
I had a look at the python source. Cool stuff. I'm not a guru on python, am learning it... but I think I can figure it out. |
11-07-2007, 06:04 PM | #15 | |
Junior Member
Posts: 2
Karma: 10
Join Date: Nov 2007
Device: axim x51v
|
Quote:
Have you gotten it to work recently? I remeber running it years ago on my old computer. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
DR800 PDF annotation merge tool? | CoolDragon | iRex | 1 | 06-21-2010 02:56 PM |
Which Diff/Merge/Patch/Updater tool? | pdurrant | Kindle Formats | 10 | 12-17-2008 08:38 PM |
Tool to easily clean and refurbish html-text before conversion | Pulp | Workshop | 3 | 10-13-2008 10:16 AM |