View Single Post
Old 02-05-2023, 11:56 PM   #1
slowsmile
Witchman
slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.
 
Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
[Plugin] RemoveHTMLTags

Converts epub HTML to several different formats including plain text.

Requirements
Plugin Type: Edit
MIT Licence(OSI)
Minimum Sigil requirement: v0.9.3 or higher
Python Requirements: Python 3.4+ (Bundled or External)
OS Requirements: Windows, Linux or OSX
*** Tested on Windows 7, 8 & 10 only ***
Current Version: "0.1.3"

Installation
* Select Manage Plugins from the Plugins menu. In the dialog box, select either the Bundled Python or External Python(Python 3.8+ should be installed on your computer to run this plugin externally).

* Click Add Plugin and select RemoveHTMLTags_vXXX.zip. This will load and install the plugin into Sigil, which you can then run by selecting Plugins > Edit > RemoveHTMLTags

Description
This plugin converts epub html giving three different text outputs. When this plugin is run:
  • This plugin gives two different outputs or views of the epub text in Sigil depending on how the appropriate flag is set in JSON prefs(see Preferences for details). When the plugin is run, by default this will produce plain text between the html <body></body> tags. However if that JSON flag is not set then basic html will be generated whereby unstyled <p> tags will surround every independent line, heading or paragraph in each epub section in the Book Browser – much like how your text would look after you load a simple text file into Sigil.
  • By default, this plugin will also automatically export all xhtml sections from the Book Browser into a single plain text file saved to the Desktop. In the file, all text will be laid out as plain text in separately defined sections and in blocktext format with added spacing between paragraphs for easier reading or manipulation.
  • This plugin can be used on valid Epub2 and Epub3 files.
  • This plugin is really a simpler Python 3.8 version of kbanelas’ smoothRemove plugin, which only runs on Python 2.7.
Note: If you prefer completely unformatted plain text(i.e. with no vertical spacing or line breaks) then just run Tools > Reformat > Mend and Prettify… in Sigil after running this plugin. Doing that will automatically give you an unformatted blob of plain text in all epub sections.

Preferences
The editable JSON prefs for this plugin – showing default values -- is shown below:

Code:
{
  "convert_to_plain_text": true,
  "save_plain_text_to_file": true,
  "save_file_path": "C:\\Users\\BILL\\Desktop\\textfile.txt",
  "remove_unused_files": false
}
convert_to_plain_text
When this flag is set to “true”, all epub HTML tags in between the <body></body> tags will be removed leaving plain text only. If this flag is set to “false”, all body text will be formatted in basic HTML, using unstyled <p> tags only. The default setting for this flag is “true”.

save_plain_text_to_file
This flag allows the plugin user to save or not save a text only version of their epub HTML to a single external file containing just plain text with no HTML. The default setting for this flag is “true”.

save_file_path
Allows the user to edit and change the path or change the saved file name. The default file name is “textfile.txt”, which will be saved to your Desktop folder by default.

remove_unused_files
When this is set to “true” and the plugin is run, all files will be automatically removed from the Styles, Images and Fonts directories in Sigil’s Book Browser. If this flag is set to “false” then those files will not be removed. The default setting is “false”.

Plugin Run
Just load an epub into Sigil and run the plugin as described above, after which you will be able to edit or disable certain settings to your own liking in the JSON prefs file. And after you run this plugin it would also be advisable to run Tools > Restructure Epub to Sigil Norm in order to avoid any hidden structural problems.

Changes:
Spoiler:
v0.1.0 -- Initial Release.
v0.1.1 -- Fixed an update notifier problem.
v0.1.2 -- Improved/stabilized layout results for both plain text and simple html displays.
v0.1.3 -- Added a saved file notification whenever a plain text file is saved to the Desktop.
Attached Files
File Type: zip RemoveHTMLTags_v013.zip (120.4 KB, 1035 views)

Last edited by slowsmile; 02-14-2023 at 07:10 PM.
slowsmile is offline   Reply With Quote