Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 09-04-2015, 06:22 AM   #1
Doitsu
Wizard
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 3,893
Karma: 10764058
Join Date: Dec 2010
Device: Kindle PW2
URL Checker plugin

[Plugin] URLChecker - Checks URLs

Updated: March 20, 2016
Current Version: "0.2"

I personally think that URLs in eBooks are an abomination , but since an increasing number of authors are using them in their ebooks, I've decided to write a simple no-frills URL checker.

Installation:

Note: Since Sigil 0.9.0 (and higher) comes with Python 3 and beautifulsoup4, you only need to install Python or beautifulsoup4 if you're still using Sigil 0.8.x.

If you're still using Sigil 0.8.x and haven't already installed Python, you'll need to install either Python 2 or Python 3.
You'll also need to install an additional Python library, beautifulsoup4, as an administrator.

To install this Python library open an admin command prompt and enter the following command:

Code:
pip install beautifulsoup4
To install the plugin open Sigil and select:

Plugins > Manage Plugins > Add Plugin > URLChecker_v0.1.6.zip > OK.

Also check the Use Bundled Python check box, if it isn't already checked.

Usage:

To run the plugin select:

Plugins > Validation > URLChecker > Start.

If broken URLs were found, the plugin will display them in the Validation Results windows. Otherwise it'll display a list of all working URLs. It'll also copy a log file to the Desktop.

Warning: Since this plugin mimics a web browser, each visited website might log the IP address of your machine and/or store cookies on it. Theoretically, it might also install drive-by malware on your machine if your machine is insufficiently protected.
For this reason, you might want to use this plugin only with ebooks that you've created yourself or obtained from trustworthy sources.

Troubleshooting:

a) pip command not found

If you've installed the Windows version of Python, you'll have to change to the directory that pip is located in before entering pip commands:

Code:
cd C:\Python27\Scripts
or

Code:
cd C:\Python34\Scripts
b) Python NameError

If you've installed Python 2.7 and you're getting the following error message:

Code:
NameError: name 'HTMLParser' is not defined
you might have installed the html Python 2.7 site package, which interferes with execution of this plugin. To uninstall the html site package enter the following Python 2.7 pip command as an administrator:

Code:
pip uninstall html
If this command doesn't fix the problem, you'll have to install Python 3.4 and delete the Python 2.7 path in the Manage Plugins dialog box.

c) Shortening service URLs or URLs with web crawling counter measures

The tool might erroneusly report working URLs as broken, if the the URL was shortened or the website author has disabled web crawling for their site using robots.txt or user agent based counter measures.

License: GNU General Public License v3 (GPL-3)
Attached Files
File Type: zip URLChecker_v0.2.zip (1.9 KB, 383 views)

Last edited by Doitsu; 08-16-2017 at 05:10 AM. Reason: Updated Python 3 code to reduce the number of false positives
Doitsu is offline   Reply With Quote
Advert
Old 09-06-2015, 06:50 AM   #2
evilmrb
Member
evilmrb began at the beginning.
 
Posts: 16
Karma: 10
Join Date: Apr 2013
Location: UK
Device: none
I was delighted when I saw this plugin and downloaded it to try. I have Python 2 AND 3 installed on my Linux Mint 17.2 PC. I am using version 0.8.6 of Sigil as created by DiapDealer. I followed the instructions to install 'requests' which I did for both versions of Python and I used the --upgrade parameter. So far so good.
When I ran the plugin on an ePub file with lots of URLs I got this message:

Status: failed

Traceback (most recent call last):
File "/usr/local/share/sigil/plugin_launchers//python/launcher.py", line 134, in launch
target_script = __import__(script_module)
File "/home/markb/.local/share/sigil-ebook/sigil/plugins/URLChecker/plugin.py", line 4, in <module>
from bs4 import BeautifulSoup
File "/home/markb/.local/share/sigil-ebook/sigil/plugins/URLChecker/bs4/__init__.py", line 30, in <module>
from .builder import builder_registry, ParserRejectedMarkup
File "/home/markb/.local/share/sigil-ebook/sigil/plugins/URLChecker/bs4/builder/__init__.py", line 4, in <module>
from bs4.element import (
File "/home/markb/.local/share/sigil-ebook/sigil/plugins/URLChecker/bs4/element.py", line 6, in <module>
from bs4.dammit import EntitySubstitution
File "/home/markb/.local/share/sigil-ebook/sigil/plugins/URLChecker/bs4/dammit.py", line 12, in <module>
from html.entities import codepoint2name
ImportError: No module named html.entities
Error: No module named html.entities

I assume I am missing some other dependency. Can anyone help me to get it working, please?

Thanks,

Mr B
evilmrb is offline   Reply With Quote
Old 09-06-2015, 08:09 AM   #3
Doitsu
Wizard
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 3,893
Karma: 10764058
Join Date: Dec 2010
Device: Kindle PW2
I was able to replicate this error on my old Linux machine. It appears to have been caused either by an incorrectly set Sigil Python 3 binary path or problems caused by installing Python 3 alongside Python 2.

On my old machine, the Python 3 pat was set to /usr/bin/python, which was linked to the Python 2 executable.

After changing it to /usr/bin/python3 and deleting the Python 2 path, the plugin worked fine.

If that doesn't work for you try removing the embedded BeautifulSoup4 package and install it separately with pip or pip3.

1. Open Sigil and select Preferences from the Edit menu.
2. Click Open Preferences Location. This will open the Sigil preferences folder
3. Open the plugins folder, which contains the URLChecker folder.
4. Open the URLChecker folder and delete the BS4 folder.
5. Install the Python 2/3 versions of BeautifulSoup:

Code:
sudo pip install beautifulsoup4
Code:
sudo pip3 install beautifulsoup4
Then re-test the plugin with either the Python 2 or the Python 3 Sigil interpreter path empty. One of them should definitely work.

P.S. On my old machine I got an IncompleteError when I tried to install beautifulsoup4. If this happens to you, too, you'll have to re-install/update pip/pip3.

Last edited by Doitsu; 09-06-2015 at 08:39 AM.
Doitsu is offline   Reply With Quote
Old 09-11-2015, 03:51 AM   #4
evilmrb
Member
evilmrb began at the beginning.
 
Posts: 16
Karma: 10
Join Date: Apr 2013
Location: UK
Device: none
Getting it working



I followed your advice. Removing the path to Python 2 made the URL checker work. Removing Python 3 and keeping Python 2 didn't.
Then I deleted the bs4 folder and used pip/pip3 to install Beautiful Soup again. Once I'd done that I was able to run the plugin with either version of Python. That's great because I have several plugins now and some need Python 2, some need 3 so I have to have both.

The plugin does make life a lot easier. Thank you for writing it. I really like how the errors appear in the validation window at the bottom, like they do when running FlightCrew.
What I would find helpful though is to either: keep the window open with all the links scanned so it can be cut & pasted into a document OR offer the option to save to a text file too. Being able to navigate directly to the broken links in Sigil is marvellous but it would be great to have a definitive list of all links. I format books for a small publisher and I could send a file of broken links that URL Checker found.
I hope that makes sense.

I really appreciate your help.

Mr B
evilmrb is offline   Reply With Quote
Old 09-11-2015, 05:21 AM   #5
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,223
Karma: 83049275
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Slightly

Nice plugin. And I imagine it was the inspiration for URL Checker that is now present (builtin tool) in the latest version of calibre Editor.
eschwartz is offline   Reply With Quote
Advert
Old 09-11-2015, 05:38 AM   #6
Doitsu
Wizard
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 3,893
Karma: 10764058
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by evilmrb View Post
What I would find helpful though is to either: keep the window open with all the links scanned so it can be cut & pasted into a document OR offer the option to save to a text file too.
I've updated the tool so that it writes a log file to the Desktop folder (or the home folder if your system doesn't have a Desktop folder.
The log file starts with "URLChecker" followed by the date and time and ".log." E.g. URLChecker_20150912-112720.log.

(I've attached the new version to the first post.)

Last edited by Doitsu; 09-12-2015 at 05:17 AM.
Doitsu is offline   Reply With Quote
Old 09-11-2015, 08:10 AM   #7
KevinH
Wizard
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 2,550
Karma: 772404
Join Date: Nov 2009
Device: many
Hi Doitsu,

FYI, I just added this to the Sigil Plugin Index thread.

Thanks!
KevinH
KevinH is offline   Reply With Quote
Old 09-15-2015, 03:11 AM   #8
eathan
Junior Member
eathan began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Sep 2013
Device: Kindle paperwhite
Hi Doitsu,

Thank you for an interesting plug. Taking this opportunity, I would like to ask if you would be interested in adding extra features, which are described in a separate thread:

https://www.mobileread.com/forums/sho...d.php?t=262322
eathan is offline   Reply With Quote
Old 09-15-2015, 03:58 AM   #9
Doitsu
Wizard
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 3,893
Karma: 10764058
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by eathan View Post
Taking this opportunity, I would like to ask if you would be interested in adding extra features, which are described in a separate thread:
At this time, I don't plan on adding any new features. Since the plugin uses the well-documented BeautifulSoup HTML parser, you could easily add the extra features that you're interested in yourself, since the plugin is rather simple and the source code is sufficiently commented.
Doitsu is offline   Reply With Quote
Old 09-18-2015, 03:07 AM   #10
evilmrb
Member
evilmrb began at the beginning.
 
Posts: 16
Karma: 10
Join Date: Apr 2013
Location: UK
Device: none
Quote:
Originally Posted by Doitsu View Post
I've updated the tool so that it writes a log file to the Desktop folder (or the home folder if your system doesn't have a Desktop folder.
The log file starts with "URLChecker" followed by the date and time and ".log." E.g. URLChecker_20150912-112720.log.

(I've attached the new version to the first post.)
That's brilliant. Thank you so much!

Mr B
evilmrb is offline   Reply With Quote
Old 09-18-2015, 03:11 AM   #11
evilmrb
Member
evilmrb began at the beginning.
 
Posts: 16
Karma: 10
Join Date: Apr 2013
Location: UK
Device: none
Quote:
Originally Posted by eschwartz View Post
Slightly

Nice plugin. And I imagine it was the inspiration for URL Checker that is now present (builtin tool) in the latest version of calibre Editor.
I saw that too and thought it was a spooky coincidence. I tested both on an ePub file with lots of links in it. URL checker found a couple of bad links and Calibre found 30. However, I'm certain that there are not 30 bad links so I'd be wary of trusting the Calibre feature too much.
evilmrb is offline   Reply With Quote
Old 09-18-2015, 03:44 AM   #12
Doitsu
Wizard
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 3,893
Karma: 10764058
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by evilmrb View Post
I saw that too and thought it was a spooky coincidence. I tested both on an ePub file with lots of links in it. URL checker found a couple of bad links and Calibre found 30. However, I'm certain that there are not 30 bad links so I'd be wary of trusting the Calibre feature too much.
Since Kovid Goyal is a professional programmer and I can barely write one line of Python without errors, it's more likely that there's a bug in my very simple plugin.
BTW, it only checks URLs that start with "http." For example, links that start with "file", "www", "ftp" or domain names (e.g. google.com) won't be checked.

Can you please re-check your file and let me know what kind of broken links Calibre Editor found that my plugin missed.
Doitsu is offline   Reply With Quote
Old 11-09-2015, 11:52 AM   #13
David Kudler
Enthusiast
David Kudler knows the difference between 'who' and 'whom'David Kudler knows the difference between 'who' and 'whom'David Kudler knows the difference between 'who' and 'whom'David Kudler knows the difference between 'who' and 'whom'David Kudler knows the difference between 'who' and 'whom'David Kudler knows the difference between 'who' and 'whom'David Kudler knows the difference between 'who' and 'whom'David Kudler knows the difference between 'who' and 'whom'David Kudler knows the difference between 'who' and 'whom'David Kudler knows the difference between 'who' and 'whom'David Kudler knows the difference between 'who' and 'whom'
 
David Kudler's Avatar
 
Posts: 31
Karma: 10000
Join Date: Apr 2011
Device: iPad
Installing beautifulsoup on Mac OS X

Thanks so much, Doitsu!

BTW, it took me a couple of whacks to install beautifulsoup on my Mac. In case anyone else has the same problem, here's how I did it:
  1. Open Terminal
  2. Log in to an admin account in Terminal (if you're not already in one — but you shouldn't use an admin account for everyday work, right?):
    Code:
    login [admin username]
  3. Enter this command:
    Code:
    sudo easy_install beautifulsoup4
  4. Enter your password, sit back and relax.

Once the installation was done, the plugin worked beautifully.

One nice side benefit: it gives me a list of all of the external URLs used in the ebook — which is helpful for making sure all of them are correct. I've been having to search through one by one, which is a pain. (For example, you don't want any Amazon links in a file you're going to load to Apple!)
David Kudler is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
url link checker needed lindaw2396 ePub 2 01-22-2013 12:30 PM
Print friendly url unrelated to regular url (and javascript) sleepless Recipes 3 12-03-2011 10:43 AM
epub checker drMerry Development 3 06-17-2011 02:04 PM
Spell checker crutledge Sigil 31 12-29-2010 01:31 PM


All times are GMT -4. The time now is 01:09 AM.


MobileRead.com is a privately owned, operated and funded community.