MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Plugins (https://www.mobileread.com/forums/forumdisplay.php?f=268)
-   -   URL Checker plugin (https://www.mobileread.com/forums/showthread.php?t=264848)

Doitsu 09-04-2015 07:22 AM

URL Checker plugin
 
1 Attachment(s)
[Plugin] URLChecker - Checks URLs

Updated: April 18, 2021
Current Version: "0.3.0"

Credits: The latest version was created by aokai. For more information, see his original post.

Installation:

To install the plugin open Sigil and select:

Plugins > Manage Plugins > Add Plugin > URLChecker_v0.3.0zip > OK.

Also check the Use Bundled Python check box, if it isn't already checked.

Usage:

To run the plugin select:

Plugins > Validation > URLChecker > Start.

If broken URLs were found, the plugin will display them in the Validation Results windows. Otherwise, it'll display a list of all working URLs. It'll also copy a log file to the Desktop.

Warning: Since this plugin mimics a web browser, each visited website might log the IP address of your machine and/or store cookies on it. Theoretically, it might also install drive-by malware on your machine if your machine is insufficiently protected.
For this reason, you might want to use this plugin only with ebooks that you've created yourself or obtained from trustworthy sources.

Note: The plugin might report working URLs as broken, if the URL was shortened or the website author has disabled web crawling for their site using robots.txt or user agent based counter measures.

License: GNU General Public License v3 (GPL-3)

xingenter 09-06-2015 07:50 AM

I was delighted when I saw this plugin and downloaded it to try. I have Python 2 AND 3 installed on my Linux Mint 17.2 PC. I am using version 0.8.6 of Sigil as created by DiapDealer. I followed the instructions to install 'requests' which I did for both versions of Python and I used the --upgrade parameter. So far so good.
When I ran the plugin on an ePub file with lots of URLs I got this message:

Status: failed

Traceback (most recent call last):
File "/usr/local/share/sigil/plugin_launchers//python/launcher.py", line 134, in launch
target_script = __import__(script_module)
File "/home/markb/.local/share/sigil-ebook/sigil/plugins/URLChecker/plugin.py", line 4, in <module>
from bs4 import BeautifulSoup
File "/home/markb/.local/share/sigil-ebook/sigil/plugins/URLChecker/bs4/__init__.py", line 30, in <module>
from .builder import builder_registry, ParserRejectedMarkup
File "/home/markb/.local/share/sigil-ebook/sigil/plugins/URLChecker/bs4/builder/__init__.py", line 4, in <module>
from bs4.element import (
File "/home/markb/.local/share/sigil-ebook/sigil/plugins/URLChecker/bs4/element.py", line 6, in <module>
from bs4.dammit import EntitySubstitution
File "/home/markb/.local/share/sigil-ebook/sigil/plugins/URLChecker/bs4/dammit.py", line 12, in <module>
from html.entities import codepoint2name
ImportError: No module named html.entities
Error: No module named html.entities

I assume I am missing some other dependency. Can anyone help me to get it working, please?

Thanks,

Mr B

Doitsu 09-06-2015 09:09 AM

I was able to replicate this error on my old Linux machine. It appears to have been caused either by an incorrectly set Sigil Python 3 binary path or problems caused by installing Python 3 alongside Python 2.

On my old machine, the Python 3 pat was set to /usr/bin/python, which was linked to the Python 2 executable.

After changing it to /usr/bin/python3 and deleting the Python 2 path, the plugin worked fine.

If that doesn't work for you try removing the embedded BeautifulSoup4 package and install it separately with pip or pip3.

1. Open Sigil and select Preferences from the Edit menu.
2. Click Open Preferences Location. This will open the Sigil preferences folder
3. Open the plugins folder, which contains the URLChecker folder.
4. Open the URLChecker folder and delete the BS4 folder.
5. Install the Python 2/3 versions of BeautifulSoup:

Code:

sudo pip install beautifulsoup4
Code:

sudo pip3 install beautifulsoup4
Then re-test the plugin with either the Python 2 or the Python 3 Sigil interpreter path empty. One of them should definitely work.

P.S. On my old machine I got an IncompleteError when I tried to install beautifulsoup4. If this happens to you, too, you'll have to re-install/update pip/pip3.

xingenter 09-11-2015 04:51 AM

Getting it working
 
:thanks:

I followed your advice. Removing the path to Python 2 made the URL checker work. Removing Python 3 and keeping Python 2 didn't.
Then I deleted the bs4 folder and used pip/pip3 to install Beautiful Soup again. Once I'd done that I was able to run the plugin with either version of Python. That's great because I have several plugins now and some need Python 2, some need 3 so I have to have both.

The plugin does make life a lot easier. Thank you for writing it. I really like how the errors appear in the validation window at the bottom, like they do when running FlightCrew.
What I would find helpful though is to either: keep the window open with all the links scanned so it can be cut & pasted into a document OR offer the option to save to a text file too. Being able to navigate directly to the broken links in Sigil is marvellous but it would be great to have a definitive list of all links. I format books for a small publisher and I could send a file of broken links that URL Checker found.
I hope that makes sense.

I really appreciate your help.

Mr B

eschwartz 09-11-2015 06:21 AM

Slightly :offtopic:

Nice plugin. And I imagine it was the inspiration for URL Checker that is now present (builtin tool) in the latest version of calibre Editor. :thumbsup:

Doitsu 09-11-2015 06:38 AM

Quote:

Originally Posted by evilmrb (Post 3168661)
What I would find helpful though is to either: keep the window open with all the links scanned so it can be cut & pasted into a document OR offer the option to save to a text file too.

I've updated the tool so that it writes a log file to the Desktop folder (or the home folder if your system doesn't have a Desktop folder.
The log file starts with "URLChecker" followed by the date and time and ".log." E.g. URLChecker_20150912-112720.log.

(I've attached the new version to the first post.)

KevinH 09-11-2015 09:10 AM

Hi Doitsu,

FYI, I just added this to the Sigil Plugin Index thread.

Thanks!
KevinH

eathan 09-15-2015 04:11 AM

Hi Doitsu,

Thank you for an interesting plug. Taking this opportunity, I would like to ask if you would be interested in adding extra features, which are described in a separate thread:

https://www.mobileread.com/forums/sho...d.php?t=262322

Doitsu 09-15-2015 04:58 AM

Quote:

Originally Posted by eathan (Post 3170865)
Taking this opportunity, I would like to ask if you would be interested in adding extra features, which are described in a separate thread:

At this time, I don't plan on adding any new features. Since the plugin uses the well-documented BeautifulSoup HTML parser, you could easily add the extra features that you're interested in yourself, since the plugin is rather simple and the source code is sufficiently commented.

xingenter 09-18-2015 04:07 AM

Quote:

Originally Posted by Doitsu (Post 3168702)
I've updated the tool so that it writes a log file to the Desktop folder (or the home folder if your system doesn't have a Desktop folder.
The log file starts with "URLChecker" followed by the date and time and ".log." E.g. URLChecker_20150912-112720.log.

(I've attached the new version to the first post.)

That's brilliant. Thank you so much!

Mr B

xingenter 09-18-2015 04:11 AM

Quote:

Originally Posted by eschwartz (Post 3168698)
Slightly :offtopic:

Nice plugin. And I imagine it was the inspiration for URL Checker that is now present (builtin tool) in the latest version of calibre Editor. :thumbsup:

I saw that too and thought it was a spooky coincidence. I tested both on an ePub file with lots of links in it. URL checker found a couple of bad links and Calibre found 30. However, I'm certain that there are not 30 bad links so I'd be wary of trusting the Calibre feature too much.

Doitsu 09-18-2015 04:44 AM

Quote:

Originally Posted by evilmrb (Post 3172908)
I saw that too and thought it was a spooky coincidence. I tested both on an ePub file with lots of links in it. URL checker found a couple of bad links and Calibre found 30. However, I'm certain that there are not 30 bad links so I'd be wary of trusting the Calibre feature too much.

Since Kovid Goyal is a professional programmer and I can barely write one line of Python without errors, it's more likely that there's a bug in my very simple plugin.
BTW, it only checks URLs that start with "http." For example, links that start with "file", "www", "ftp" or domain names (e.g. google.com) won't be checked.

Can you please re-check your file and let me know what kind of broken links Calibre Editor found that my plugin missed.

David Kudler 11-09-2015 12:52 PM

Installing beautifulsoup on Mac OS X
 
Thanks so much, Doitsu! :thumbsup:

BTW, it took me a couple of whacks to install beautifulsoup on my Mac. In case anyone else has the same problem, here's how I did it:
  1. Open Terminal
  2. Log in to an admin account in Terminal (if you're not already in one — but you shouldn't use an admin account for everyday work, right?):
    Code:

    login [admin username]
  3. Enter this command:
    Code:

    sudo easy_install beautifulsoup4
  4. Enter your password, sit back and relax.

Once the installation was done, the plugin worked beautifully.

One nice side benefit: it gives me a list of all of the external URLs used in the ebook — which is helpful for making sure all of them are correct. I've been having to search through one by one, which is a pain. (For example, you don't want any Amazon links in a file you're going to load to Apple!)

AlanHK 11-17-2017 10:17 AM

This will be useful. I have to put similar links in a series of books, this gives a simple way to collect and check them and then paste into the next.

However, it reported all my links as "broken", as Sigil is blocked by my firewall. Would prefer an option to just collect links, no checking. Also it might check if it has any connectivity at all (e.g. ping Google.com) and if not, report that first rather then alarming the user.

Doitsu 11-19-2017 06:09 AM

Quote:

Originally Posted by AlanHK (Post 3613265)
However, it reported all my links as "broken", as Sigil is blocked by my firewall.

Feel free to use Calibre Editor. It has a built-in URL checker. (Tools > External Links > Check external links.)

Quote:

Originally Posted by AlanHK (Post 3613265)
Would prefer an option to just collect links, no checking.

The plugin should generate a log file that starts with "URLChecker" followed by the date and time and ".log." E.g. URLChecker_20171119-110141.log in the Desktop folder or, if it can't be found, the Documents folder. (The location varies, depending on the OS.)

Quote:

Originally Posted by AlanHK (Post 3613265)
Also it might check if it has any connectivity at all (e.g. ping Google.com) and if not, report that first rather then alarming the user.

This plugin is intended for users who know that running a plugin that checks URLs on a machine with no Internet access is pointless.


All times are GMT -4. The time now is 08:24 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.