Terrific job, Doogie!
In the interest of getting as much as possible in one place, I'm taking the liberty of reposting an item I did a while back on rewriting links, which is about the only thing Doogie missed. (I also corrected some minor errors in the original version.)
This is an example of how to rewrite links, with a couple of extra tricks thrown in as well.
Let's say you want to grab the columns by Chuck Colson from the Christianity Today website. You'll find them at:
Notice that this page, and each of the article pages, are loaded with ads, unwanted links, etc.
On the right, click the link for the printer version. When the printer-friendly box opens, right-click on it and select Properties from Internet Explorer or View Page Info from Firefox. You will find that the URL for the printer-friendly page is:
This is the URL to use in the URL/File field on the Main tab when you create the Sunrise XP document. You'll directly load the printer-friendly main page, eliminating the junk.
Now click on the link for a specfic column and you get something like this:
The exact URL depends on the article you clicked. Again open the printer version, right-click and select Properties or View Page Info. The printer-friendly URL is:
Now create your Sunrise XP document and create a link filter. Select "Regular Expression" for Match, "Filter all links" for Links, and "Rewrite links matching this pattern" for Filter.
Now, how do you turn the article link into the printable link?
Notice that they are identical up to ".com", then the printable link has some extra stuff (/global/printer.html?), then they end identically. If you check several articles, you'll see that the ending part is different for each article. You need to tell Sunrise to stick the extra text in ahead of the article-specific stuff no matter what it is. You start by specifying the part that is identical for all articles, then replace the rest with "(.*)", which essentially says, "match everything here no matter what it is". The result is:
but the "." is a special Perl character, so you must put a backslash in front of it when you want it to be taken literally. Now you have:
That's what goes in the Pattern field for the link filter. Not only will that match the link for any article, but the (.*) part will also grab all of the last part of the text and save it. Later, you can refer to it as "$1"
Now to rewrite the link, you want the part up to ".com", plus the extra stuff you need to insert, followed by the stuff saved as $1. You can write this as:
This is what you put in the Rewrite field.
In more complex cases, you may need to use more than one "(.*)". In such a case, when you do the rewrite, the first becomes $1, the second $2 and so on.
The link below will take you to a tutorial on Perl Regular Expressions:
An important rule to remember is that, if you use both "include" or "exclude" filters along with "rewrite" rules, Sunrise XP will rewrite the links before it applies the filters. A filter that would work on the original form of a link may not work on the rewritten form. Conversely, a filter that may exclude only the links you don't want if applied to their original form may exclude links you want when applied to the rewritten version.
For example, suppose you want to exclude a link to:
You could write a rule that excludes all links of the regular expression form:
This will work. But if you want to rewrite all of the good links to add "&printer" to the end, you might then look for:
and rewrite this as:
Again, this will work by itself. The problem, however, is that after the link you don't want is rewritten, it will be:
This will no longer match:
and the link will not be excluded.
You must either be more specific in what links you want to rewrite, so that the garbage link will not get rewritten, or you could change the exclude filter to look for:
Last edited by DTM; 04-15-2006 at 09:36 PM.
Reason: Added material about rewriting before filtering