09-04-2015, 06:22 AM | #1 |
Grand Sorcerer
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
URL Checker plugin
[Plugin] URLChecker - Checks URLs
Updated: April 18, 2021 Current Version: "0.3.0" Credits: The latest version was created by aokai. For more information, see his original post. Installation: To install the plugin open Sigil and select: Plugins > Manage Plugins > Add Plugin > URLChecker_v0.3.0zip > OK. Also check the Use Bundled Python check box, if it isn't already checked. Usage: To run the plugin select: Plugins > Validation > URLChecker > Start. If broken URLs were found, the plugin will display them in the Validation Results windows. Otherwise, it'll display a list of all working URLs. It'll also copy a log file to the Desktop. Warning: Since this plugin mimics a web browser, each visited website might log the IP address of your machine and/or store cookies on it. Theoretically, it might also install drive-by malware on your machine if your machine is insufficiently protected. For this reason, you might want to use this plugin only with ebooks that you've created yourself or obtained from trustworthy sources. Note: The plugin might report working URLs as broken, if the URL was shortened or the website author has disabled web crawling for their site using robots.txt or user agent based counter measures. License: GNU General Public License v3 (GPL-3) Last edited by Doitsu; 04-18-2021 at 10:35 AM. Reason: New version with more detailed error messages |
09-06-2015, 06:50 AM | #2 |
Member Retired
Posts: 16
Karma: 10
Join Date: Apr 2013
Location: UK
Device: none
|
I was delighted when I saw this plugin and downloaded it to try. I have Python 2 AND 3 installed on my Linux Mint 17.2 PC. I am using version 0.8.6 of Sigil as created by DiapDealer. I followed the instructions to install 'requests' which I did for both versions of Python and I used the --upgrade parameter. So far so good.
When I ran the plugin on an ePub file with lots of URLs I got this message: Status: failed Traceback (most recent call last): File "/usr/local/share/sigil/plugin_launchers//python/launcher.py", line 134, in launch target_script = __import__(script_module) File "/home/markb/.local/share/sigil-ebook/sigil/plugins/URLChecker/plugin.py", line 4, in <module> from bs4 import BeautifulSoup File "/home/markb/.local/share/sigil-ebook/sigil/plugins/URLChecker/bs4/__init__.py", line 30, in <module> from .builder import builder_registry, ParserRejectedMarkup File "/home/markb/.local/share/sigil-ebook/sigil/plugins/URLChecker/bs4/builder/__init__.py", line 4, in <module> from bs4.element import ( File "/home/markb/.local/share/sigil-ebook/sigil/plugins/URLChecker/bs4/element.py", line 6, in <module> from bs4.dammit import EntitySubstitution File "/home/markb/.local/share/sigil-ebook/sigil/plugins/URLChecker/bs4/dammit.py", line 12, in <module> from html.entities import codepoint2name ImportError: No module named html.entities Error: No module named html.entities I assume I am missing some other dependency. Can anyone help me to get it working, please? Thanks, Mr B |
09-06-2015, 08:09 AM | #3 |
Grand Sorcerer
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
I was able to replicate this error on my old Linux machine. It appears to have been caused either by an incorrectly set Sigil Python 3 binary path or problems caused by installing Python 3 alongside Python 2.
On my old machine, the Python 3 pat was set to /usr/bin/python, which was linked to the Python 2 executable. After changing it to /usr/bin/python3 and deleting the Python 2 path, the plugin worked fine. If that doesn't work for you try removing the embedded BeautifulSoup4 package and install it separately with pip or pip3. 1. Open Sigil and select Preferences from the Edit menu. 2. Click Open Preferences Location. This will open the Sigil preferences folder 3. Open the plugins folder, which contains the URLChecker folder. 4. Open the URLChecker folder and delete the BS4 folder. 5. Install the Python 2/3 versions of BeautifulSoup: Code:
sudo pip install beautifulsoup4 Code:
sudo pip3 install beautifulsoup4 P.S. On my old machine I got an IncompleteError when I tried to install beautifulsoup4. If this happens to you, too, you'll have to re-install/update pip/pip3. Last edited by Doitsu; 09-06-2015 at 08:39 AM. |
09-11-2015, 03:51 AM | #4 |
Member Retired
Posts: 16
Karma: 10
Join Date: Apr 2013
Location: UK
Device: none
|
Getting it working
I followed your advice. Removing the path to Python 2 made the URL checker work. Removing Python 3 and keeping Python 2 didn't. Then I deleted the bs4 folder and used pip/pip3 to install Beautiful Soup again. Once I'd done that I was able to run the plugin with either version of Python. That's great because I have several plugins now and some need Python 2, some need 3 so I have to have both. The plugin does make life a lot easier. Thank you for writing it. I really like how the errors appear in the validation window at the bottom, like they do when running FlightCrew. What I would find helpful though is to either: keep the window open with all the links scanned so it can be cut & pasted into a document OR offer the option to save to a text file too. Being able to navigate directly to the broken links in Sigil is marvellous but it would be great to have a definitive list of all links. I format books for a small publisher and I could send a file of broken links that URL Checker found. I hope that makes sense. I really appreciate your help. Mr B |
09-11-2015, 05:21 AM | #5 |
Ex-Helpdesk Junkie
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Slightly
Nice plugin. And I imagine it was the inspiration for URL Checker that is now present (builtin tool) in the latest version of calibre Editor. |
09-11-2015, 05:38 AM | #6 | |
Grand Sorcerer
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
The log file starts with "URLChecker" followed by the date and time and ".log." E.g. URLChecker_20150912-112720.log. (I've attached the new version to the first post.) Last edited by Doitsu; 09-12-2015 at 05:17 AM. |
|
09-11-2015, 08:10 AM | #7 |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi Doitsu,
FYI, I just added this to the Sigil Plugin Index thread. Thanks! KevinH |
09-15-2015, 03:11 AM | #8 |
Junior Member
Posts: 7
Karma: 10
Join Date: Sep 2013
Device: Kindle paperwhite
|
Hi Doitsu,
Thank you for an interesting plug. Taking this opportunity, I would like to ask if you would be interested in adding extra features, which are described in a separate thread: https://www.mobileread.com/forums/sho...d.php?t=262322 |
09-15-2015, 03:58 AM | #9 | |
Grand Sorcerer
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
|
|
09-18-2015, 03:07 AM | #10 | |
Member Retired
Posts: 16
Karma: 10
Join Date: Apr 2013
Location: UK
Device: none
|
Quote:
Mr B |
|
09-18-2015, 03:11 AM | #11 |
Member Retired
Posts: 16
Karma: 10
Join Date: Apr 2013
Location: UK
Device: none
|
I saw that too and thought it was a spooky coincidence. I tested both on an ePub file with lots of links in it. URL checker found a couple of bad links and Calibre found 30. However, I'm certain that there are not 30 bad links so I'd be wary of trusting the Calibre feature too much.
|
09-18-2015, 03:44 AM | #12 | |
Grand Sorcerer
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
BTW, it only checks URLs that start with "http." For example, links that start with "file", "www", "ftp" or domain names (e.g. google.com) won't be checked. Can you please re-check your file and let me know what kind of broken links Calibre Editor found that my plugin missed. |
|
11-09-2015, 11:52 AM | #13 |
Enthusiast
Posts: 48
Karma: 10000
Join Date: Apr 2011
Device: iPad
|
Installing beautifulsoup on Mac OS X
Thanks so much, Doitsu!
BTW, it took me a couple of whacks to install beautifulsoup on my Mac. In case anyone else has the same problem, here's how I did it:
Once the installation was done, the plugin worked beautifully. One nice side benefit: it gives me a list of all of the external URLs used in the ebook — which is helpful for making sure all of them are correct. I've been having to search through one by one, which is a pain. (For example, you don't want any Amazon links in a file you're going to load to Apple!) |
11-17-2017, 09:17 AM | #14 |
Guru
Posts: 668
Karma: 929286
Join Date: Apr 2014
Device: PW-3, iPad, Android phone
|
This will be useful. I have to put similar links in a series of books, this gives a simple way to collect and check them and then paste into the next.
However, it reported all my links as "broken", as Sigil is blocked by my firewall. Would prefer an option to just collect links, no checking. Also it might check if it has any connectivity at all (e.g. ping Google.com) and if not, report that first rather then alarming the user. |
11-19-2017, 05:09 AM | #15 | |
Grand Sorcerer
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
The plugin should generate a log file that starts with "URLChecker" followed by the date and time and ".log." E.g. URLChecker_20171119-110141.log in the Desktop folder or, if it can't be found, the Documents folder. (The location varies, depending on the OS.) This plugin is intended for users who know that running a plugin that checks URLs on a machine with no Internet access is pointless. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
url link checker needed | lindaw2396 | ePub | 2 | 01-22-2013 12:30 PM |
Print friendly url unrelated to regular url (and javascript) | sleepless | Recipes | 3 | 12-03-2011 10:43 AM |
epub checker | drMerry | Development | 3 | 06-17-2011 02:04 PM |
Spell checker | crutledge | Sigil | 31 | 12-29-2010 01:31 PM |