![]() |
Plugin to import text files
2 Attachment(s)
Purpose
This plugin will import a text file into an ePub and format it. Using the plugin The plugin is very basic. When run, you will be presented with a file dialog box. Navigate to the directory that contains the file that you want to import, select the required text file and click OK. The text file will be imported into your ePub as a new xhtml section. This new section will be given the same name as your text file. Technical note As some ePub readers do not seem to use the tag set <p></p> to show a blank paragraph, when a blank paragraph is required in the ePub file this will be included in the new xhtml file as : <p> </p> The encoding of a text file (utf-8, ascii, etc) can vary from one system to another. This plugin assumes that the text file was saved by the system running Sigil; the plugin attempts to identify the encoding that is likely to have been used on that system and will open the text file using that encoding. Update This version of the plugin will:
|
Thanks for this. I'll get it added to the plugin index.
As far as encoding detection goes, remember that Sigil's bundled python includes the chardet module. It should be able to assist in detecting a file's character encoding. |
@CalibUser: I tested the plugin with a UTF-8 text file and it didn't decode it correctly.
Since Sigil comes with bs4, I'd recommend using soup.original_encoding to detect the original encoding. For example: Code:
from sigil_bs4 import BeautifulSoup |
FWIW, Sigil only uses utf-8 for all text files. If an epub is opened using any other encoding for its xhtml files, those files are converted to utf-8 upon load and from then on always saved in that format.
|
Thanks for all the feedback.
I tried to use the internet to determine which of the suggested methods for detecting the encoding of the text file was most appropriate - either chardet or beautiful soup. As I could not find a definitive answer to this I decided to use Doisu's code for quickness The plugin has been updated in the first post of this thread. |
FWIW, soup uses chardet internally if it is availabe so using soup is a good idea.
|
Exactly what I was looking for. Thanks.
Edit: There's a bit of a problem when I try importing text files with angle brackets- it doesn't convert them and Sigil interprets that as a bad tag. Can you add a line or two to correct angle brackets, please? |
Since it's just text being imported, it should be fairly easy to escape the text data with something like:
Code:
from xml.sax.saxutils import escape as xmlescape |
Since escape ignores straight quotes, it couldn't hurt to proactively replace them with entities:
Code:
data = xmlescape(data).replace('"', '"') |
I have updated the plugin in the first post so that it manages angular brackets that contain recognised HTML tags.
I tested it with variations of the following text file: This is a line of text This is <HELLO> <i>another</i> line <b>of</b> text Did you know that 4>3? This <i> tag is not matched. Sigil generates an error when this code is imported because the <I> tag does not have its matching pair </I> and (I assume) <HELLO> is not a recognised html tag. However, if you click Yes when Sigil asks 'Are you sure you want to continue?' then the textfile is imported with these incorrect tags. @DiapDealer and Doitsu: Thanks for your tips for a solution; I only saw these after I went to the site to upload my solution to this problem. If there are any further problems with angular brackets then I will consider these solutions. It will be necessary to replace < and > with < and > if the tags do not enclose an html tag to view the page normally. |
To be honest, I don't think I'd bother trying to parse any potential markup in the text files at all. I'd just escape it all and be done.
|
Gah. Right, sorry. Guess I should've explained myself better. I don't type my own markup, so what I was asking is that the plugin either do or have the option to convert angle brackets (which in HTML are reserved for markup) into the appropriate tags.
|
TextImporter plugin updated to version 0.1.0.4
When you run the plugin it will check for updates; if an update is available you will have the option of either opening the webpage where the plugin is found on www.mobileread.com or to be reminded again when you next run the plugin. If the plugin is up to date then it will check for a new update after 24 hours has elapsed. The main window has a checkbox marked 'Convert angular brackets to html code'. By ticking this box angular brackets will be converted to html code. The main window also has two buttons. One is marked 'Close' and if you click this button the plugin will close and nothing will be imported. The other button is marked 'Get text file' and if you select this button you will be asked to select the required text file. This will be imported into your ePub and the plugin will close. If you do not tick the box marked 'Convert angular brackets to html code' then if you import a text file that contains angular brackets Sigil will produce a warning message. However, you can ignore the warning and the text will be imported. however, you may need to amend the imported file to ensure it can be read. I decided to use the code provided by Doitsu to determine the encoding used for the textfile. I have posted the updated plugin in the first post of this thread |
Woot, thanks!
Sorry about not replying sooner. I've been busy with work and writing. |
@teh603: No problem
|
I have updated the plugin TextImporter in the first post of this thread to version 0.1.0.5.
The previous version of the plugin would allow only one text file to be imported each time you ran the plugin; this version will allow you to import several text files before the plugin closes. It also has a new facility for checking for updates and this allows you to set the time interval between checks for new versions. I have also uploaded a user guide for the plugin. |
It's a helpful plugin to me, it would be better If you can create an option that we can import multiple files in 1 time?
|
Quote:
When the file dialog opens, navigate to the folder containing your text files and CTRL-click on each file that you want to import and then click the OK button on the file dialog. All the selected files will be imported into Sigil when you click the plugin's Close button. Alternatively you can use SHIFT click in the usual manner to select a group of text files for importing into your ePub. |
Can this plugin work on Linux? I get this error when trying to run it:
NameError: name 'ttk' is not defined |
Your python implementation is missing its Tkinter/ttk support.
Quote:
|
Quote:
Code:
import tkinter.ttk as tkinter_ttkCode:
import tkinter.ttk as ttk |
| All times are GMT -4. The time now is 08:46 PM. |
Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.