MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Plugins (https://www.mobileread.com/forums/forumdisplay.php?f=268)
-   -   Plugin to import text files (https://www.mobileread.com/forums/showthread.php?t=285771)

CalibUser 04-23-2017 09:41 AM

Plugin to import text files
 
2 Attachment(s)
Purpose
This plugin will import a text file into an ePub and format it.

Using the plugin
The plugin is very basic. When run, you will be presented with a file dialog box.

Navigate to the directory that contains the file that you want to import, select the required text file and click OK. The text file will be imported into your ePub as a new xhtml section. This new section will be given the same name as your text file.

Technical note

As some ePub readers do not seem to use the tag set <p></p> to show a blank paragraph, when a blank paragraph is required in the ePub file this will be included in the new xhtml file as : <p>&nbsp;</p>

The encoding of a text file (utf-8, ascii, etc) can vary from one system to another. This plugin assumes that the text file was saved by the system running Sigil; the plugin attempts to identify the encoding that is likely to have been used on that system and will open the text file using that encoding.

Update
This version of the plugin will:
  • enable you to import multiple text files from a single folder without having to import them individually.
  • convert the '&' symbol from a text file to its html equivalent when a text file is imported to prevent Sigil showing an error message when importing files containg this symbol
  • includes an icon for the toolbar (if anybody wants to provide an improved icon then that would be good as I am not very artistic!)

DiapDealer 04-23-2017 10:17 AM

Thanks for this. I'll get it added to the plugin index.

As far as encoding detection goes, remember that Sigil's bundled python includes the chardet module. It should be able to assist in detecting a file's character encoding.

Doitsu 04-23-2017 01:43 PM

@CalibUser: I tested the plugin with a UTF-8 text file and it didn't decode it correctly.
Since Sigil comes with bs4, I'd recommend using soup.original_encoding to detect the original encoding.

For example:

Code:

from sigil_bs4 import BeautifulSoup

def run(bk):
    # more code...
    with open(fHandle.name, "rb") as binary_file:
        data = binary_file.read()
        soup = BeautifulSoup(data)
        print(soup.original_encoding)
        return -1
        # more code...

The above code correctly identified my UTF-8 test file. Of course, if you use bs4 as a filter, you might as well use str(soup) to convert an input file with unknown encoding to UTF-8. :)

KevinH 04-23-2017 03:36 PM

FWIW, Sigil only uses utf-8 for all text files. If an epub is opened using any other encoding for its xhtml files, those files are converted to utf-8 upon load and from then on always saved in that format.

CalibUser 04-24-2017 03:16 PM

Thanks for all the feedback.

I tried to use the internet to determine which of the suggested methods for detecting the encoding of the text file was most appropriate - either chardet or beautiful soup. As I could not find a definitive answer to this I decided to use Doisu's code for quickness

The plugin has been updated in the first post of this thread.

KevinH 04-24-2017 06:36 PM

FWIW, soup uses chardet internally if it is availabe so using soup is a good idea.

teh603 04-25-2017 11:18 PM

Exactly what I was looking for. Thanks.

Edit: There's a bit of a problem when I try importing text files with angle brackets- it doesn't convert them and Sigil interprets that as a bad tag. Can you add a line or two to correct angle brackets, please?

DiapDealer 04-26-2017 10:29 PM

Since it's just text being imported, it should be fairly easy to escape the text data with something like:

Code:

from xml.sax.saxutils import escape as xmlescape
.
.
.
data = xmlescape(data)

... somewhere in the plugin after the text file is read in. That should take care of ampersands and angle brackets.

Doitsu 04-26-2017 11:08 PM

Since escape ignores straight quotes, it couldn't hurt to proactively replace them with entities:
Code:

data = xmlescape(data).replace('"', '&quot;')

CalibUser 04-27-2017 06:23 AM

I have updated the plugin in the first post so that it manages angular brackets that contain recognised HTML tags.

I tested it with variations of the following text file:

This is a line of text
This is <HELLO> <i>another</i> line <b>of</b> text
Did you know that 4>3?
This <i> tag is not matched.


Sigil generates an error when this code is imported because the <I> tag does not have its matching pair </I> and (I assume) <HELLO> is not a recognised html tag. However, if you click Yes when Sigil asks 'Are you sure you want to continue?' then the textfile is imported with these incorrect tags.


@DiapDealer and Doitsu: Thanks for your tips for a solution; I only saw these after I went to the site to upload my solution to this problem. If there are any further problems with angular brackets then I will consider these solutions.

It will be necessary to replace < and > with &lt; and &gt; if the tags do not enclose an html tag to view the page normally.

DiapDealer 04-27-2017 06:41 AM

To be honest, I don't think I'd bother trying to parse any potential markup in the text files at all. I'd just escape it all and be done.

teh603 04-27-2017 08:05 PM

Gah. Right, sorry. Guess I should've explained myself better. I don't type my own markup, so what I was asking is that the plugin either do or have the option to convert angle brackets (which in HTML are reserved for markup) into the appropriate tags.

CalibUser 04-30-2017 01:15 PM

TextImporter plugin updated to version 0.1.0.4

When you run the plugin it will check for updates; if an update is available you will have the option of either opening the webpage where the plugin is found on www.mobileread.com or to be reminded again when you next run the plugin.

If the plugin is up to date then it will check for a new update after 24 hours has elapsed.

The main window has a checkbox marked 'Convert angular brackets to html code'. By ticking this box angular brackets will be converted to html code.

The main window also has two buttons. One is marked 'Close' and if you click this button the plugin will close and nothing will be imported. The other button is marked 'Get text file' and if you select this button you will be asked to select the required text file. This will be imported into your ePub and the plugin will close.

If you do not tick the box marked 'Convert angular brackets to html code' then if you import a text file that contains angular brackets Sigil will produce a warning message. However, you can ignore the warning and the text will be imported. however, you may need to amend the imported file to ensure it can be read.

I decided to use the code provided by Doitsu to determine the encoding used for the textfile.

I have posted the updated plugin in the first post of this thread

teh603 05-05-2017 09:51 PM

Woot, thanks!

Sorry about not replying sooner. I've been busy with work and writing.

CalibUser 05-06-2017 10:46 AM

@teh603: No problem

CalibUser 05-07-2017 10:38 AM

I have updated the plugin TextImporter in the first post of this thread to version 0.1.0.5.

The previous version of the plugin would allow only one text file to be imported each time you ran the plugin; this version will allow you to import several text files before the plugin closes. It also has a new facility for checking for updates and this allows you to set the time interval between checks for new versions.

I have also uploaded a user guide for the plugin.

chagushu 06-09-2020 12:21 AM

It's a helpful plugin to me, it would be better If you can create an option that we can import multiple files in 1 time?

CalibUser 06-09-2020 08:29 AM

Quote:

Originally Posted by chagushu (Post 3997961)
It's a helpful plugin to me, it would be better If you can create an option that we can import multiple files in 1 time?

I have made a few changes to the plugin that includes the ability to import multiple text files from a single folder without having to import them individually.

When the file dialog opens, navigate to the folder containing your text files and CTRL-click on each file that you want to import and then click the OK button on the file dialog. All the selected files will be imported into Sigil when you click the plugin's Close button.

Alternatively you can use SHIFT click in the usual manner to select a group of text files for importing into your ePub.

isaacbh 08-28-2021 08:39 AM

Can this plugin work on Linux? I get this error when trying to run it:

NameError: name 'ttk' is not defined

KevinH 08-28-2021 06:00 PM

Your python implementation is missing its Tkinter/ttk support.

Quote:

Originally Posted by isaacbh (Post 4149796)
Can this plugin work on Linux? I get this error when trying to run it:

NameError: name 'ttk' is not defined


isaacbh 08-29-2021 03:55 AM

Quote:

Originally Posted by KevinH (Post 4149868)
Your python implementation is missing its Tkinter/ttk support.

I had to change this line in plugin.py:

Code:

import tkinter.ttk as tkinter_ttk
to

Code:

import tkinter.ttk as ttk
Now it works.


All times are GMT -4. The time now is 08:46 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.