View Single Post
Old 01-23-2015, 10:56 AM   #48
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
First of all, congrats on finally adding in the Reports functionality! I will have to mess around with it in the next few weeks. It is quite helpful on some of the extremely large projects I have been working on lately (Sigil chugs on these absolutely massive files).

Quote:
Originally Posted by kovidgoyal View Post
I have no plans to add a links report. The Check Book tool already checks for broken links and allows you to jump to them, and the editor autocompletes href attributes.
I have come up with 4 Use Cases off of the top of my head on why the Sigil Links Report is extremely helpful (and why it should probably be done in Calibre's Reports as well).

Use Case #1:

The Links Report is extremely helpful when you are cleaning up HTML files. I use it all the time when I pull a series of HTML articles off of a website to convert into an EPUB.

Let us say I wanted to strip all of the links in the book, or remove all of the amazon.com links, but keep the ones pointing to cde.com + xyz.com, I can easily sort + spot those and remove them.

Use Case #2:

Also, if you are working on newer books that were exported from OCR (Finereader), it tries to do its best to digitize the links from the original PDF (sometimes gets it wrong if it was broken across lines). So on the visual surface, the link looks perfectly fine, but the link itself is broken.

For example, a link might look like this:

Code:
<a href="http://www.sample.com/">http://www.sample.com/</a><a href="sample/sample.html">sample/sample.html</a>
You would be able to easily spot this error in the Links Report. (Ok ok, I know, I know, horrible sample I came up with! )

Here is four real life "OCR errors" I caught with the Links Report:

Code:
<p>———. 1936. Liquidity. Minnesota Bankers Assoc. Available at: <a href="http://www">http://www</a>.</p>

  <p>24hgold.com/viewcompanyarticle.aspx?langue = en&amp;articleId = 217737</p>

<p>Nobelprize.org. 2008. John Nash interview, September, 2004. Retrieved January 15, 2008 from <a href="http://nobelprize.org/mediaplayer/index.php?id">http://nobelprize.org/mediaplayer/index.php?id</a> = 429</p>

<p>Montaigne, Michel de. “The Profit of One Man is the Damage of Another.” <span style="font-style:italic;">Essays.</span> Chapter XXI. <a href="http://www.uoregon.edu/%7Erbear/montaigne/">http://www.uoregon.edu/%7Erbear/montaigne/</a>&nbsp;1xxi.htm</p>

<p>Development.” Free-Market News Network, February 14 and 15, at <a href="http://www.freemarketnews.com/Analysis/241/6939/notes.asp?wid">http://www.freemarketnews.com/Analysis/241/6939/notes.asp?wid</a>=241&amp;&nbsp;nid=6939 and http:// <a href="http://www.freemarketnews.com/">www.freemarketnews.com/</a> Analysis/241/6949/notes.asp?&nbsp;wid=241&amp;nid= 6949.</p>
Use Case #3:

It is also extremely helpful when catching inconsistencies in what text is actually wrapped up in the <a> tags. For example, I digitized an entire Journal, at the bottom, it might say something like:

Code:
<p>Please contact the <a href="http://samplesite.com">Sample Site</a>.</p>
and in another section of the book, it might say:

Code:
<p>Please contact the <a href="http://samplesite.com">Sample Sit</a>e.</p>
and:

Code:
<p>Please contact the <a href="http://samplesite.com">sample site</a>.</p>
If you sort the Links Report, you can also easily spot that something odd happened, because you would see "Sample Site" and "Sample Sit" and "sample site".

These are typically very hard to catch with just your naked eye, or even a quick perusal over the code, unless you knew EXACTLY what you were looking for (and even then, easy to miss).

Use Case #4:

It is VERY helpful in catching absolutely useless links. For example, Finereader exports a lot of phantom "bookmark##" links:

Click image for larger version

Name:	LinksReport.png
Views:	242
Size:	14.3 KB
ID:	134097

When you are cleaning out all the cruft, the Links Report makes it very easy (as you can see, Finereader also exports "footnote##" links). This is helpful when you want to get rid of as much useless code as possible, and to spot if you actually did remove it all.

Finereader 12 even introduced this cursed "caption#" class... which in all cases I have seen, is 100% worthless. Most of the time I forget to even look for it, and I just accidentally stumble on it when I am looking at the Links Report.

Last edited by Tex2002ans; 01-23-2015 at 11:40 AM.
Tex2002ans is offline   Reply With Quote