Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 04-23-2017, 08:41 AM   #1
CalibUser
Addict
CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.
 
Posts: 201
Karma: 62362
Join Date: Jul 2015
Device: Sony
Plugin to import text files

Purpose
This plugin will import a text file into an ePub and format it.

Using the plugin
The plugin is very basic. When run, you will be presented with a file dialog box.

Navigate to the directory that contains the file that you want to import, select the required text file and click OK. The text file will be imported into your ePub as a new xhtml section. This new section will be given the same name as your text file.

Technical note

As some ePub readers do not seem to use the tag set <p></p> to show a blank paragraph, when a blank paragraph is required in the ePub file this will be included in the new xhtml file as : <p>&nbsp;</p>

The encoding of a text file (utf-8, ascii, etc) can vary from one system to another. This plugin assumes that the text file was saved by the system running Sigil; the plugin attempts to identify the encoding that is likely to have been used on that system and will open the text file using that encoding.

Updated to version 0.1.0.6
This version of the plugin will:
  • enable you to import multiple text files from a single folder without having to import them individually.
  • convert the '&' symbol from a text file to its html equivalent when a text file is imported to prevent Sigil showing an error message when importing files containing this symbol
  • includes an icon for the toolbar (if anybody wants to provide an improved icon then that would be good as I am not very artistic!)

Updated to version 0.1.0.7
Has a small bug fix for Windows 11
Attached Files
File Type: epub TextImporter_v0.1.0.6.epub (221.7 KB, 1511 views)
File Type: zip TextImporter_v0.1.0.7.zip (10.9 KB, 2321 views)

Last edited by CalibUser; 03-03-2022 at 04:07 AM. Reason: Plugin updated to version 0.1.0.7
CalibUser is offline   Reply With Quote
Old 04-23-2017, 09:17 AM   #2
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Thanks for this. I'll get it added to the plugin index.

As far as encoding detection goes, remember that Sigil's bundled python includes the chardet module. It should be able to assist in detecting a file's character encoding.
DiapDealer is offline   Reply With Quote
Advert
Old 04-23-2017, 12:43 PM   #3
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
@CalibUser: I tested the plugin with a UTF-8 text file and it didn't decode it correctly.
Since Sigil comes with bs4, I'd recommend using soup.original_encoding to detect the original encoding.

For example:

Code:
from sigil_bs4 import BeautifulSoup

def run(bk):
    # more code...
    with open(fHandle.name, "rb") as binary_file:
        data = binary_file.read()
        soup = BeautifulSoup(data) 
        print(soup.original_encoding)
        return -1
        # more code...
The above code correctly identified my UTF-8 test file. Of course, if you use bs4 as a filter, you might as well use str(soup) to convert an input file with unknown encoding to UTF-8.
Doitsu is offline   Reply With Quote
Old 04-23-2017, 02:36 PM   #4
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,647
Karma: 5433388
Join Date: Nov 2009
Device: many
FWIW, Sigil only uses utf-8 for all text files. If an epub is opened using any other encoding for its xhtml files, those files are converted to utf-8 upon load and from then on always saved in that format.
KevinH is online now   Reply With Quote
Old 04-24-2017, 02:16 PM   #5
CalibUser
Addict
CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.
 
Posts: 201
Karma: 62362
Join Date: Jul 2015
Device: Sony
Thanks for all the feedback.

I tried to use the internet to determine which of the suggested methods for detecting the encoding of the text file was most appropriate - either chardet or beautiful soup. As I could not find a definitive answer to this I decided to use Doisu's code for quickness

The plugin has been updated in the first post of this thread.
CalibUser is offline   Reply With Quote
Advert
Old 04-24-2017, 05:36 PM   #6
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,647
Karma: 5433388
Join Date: Nov 2009
Device: many
FWIW, soup uses chardet internally if it is availabe so using soup is a good idea.
KevinH is online now   Reply With Quote
Old 04-25-2017, 10:18 PM   #7
teh603
Autism Spectrum Disorder
teh603 ought to be getting tired of karma fortunes by now.teh603 ought to be getting tired of karma fortunes by now.teh603 ought to be getting tired of karma fortunes by now.teh603 ought to be getting tired of karma fortunes by now.teh603 ought to be getting tired of karma fortunes by now.teh603 ought to be getting tired of karma fortunes by now.teh603 ought to be getting tired of karma fortunes by now.teh603 ought to be getting tired of karma fortunes by now.teh603 ought to be getting tired of karma fortunes by now.teh603 ought to be getting tired of karma fortunes by now.teh603 ought to be getting tired of karma fortunes by now.
 
teh603's Avatar
 
Posts: 1,212
Karma: 6244877
Join Date: Sep 2011
Location: Coastal Texas
Device: Android Phone
Exactly what I was looking for. Thanks.

Edit: There's a bit of a problem when I try importing text files with angle brackets- it doesn't convert them and Sigil interprets that as a bad tag. Can you add a line or two to correct angle brackets, please?

Last edited by teh603; 04-25-2017 at 10:26 PM.
teh603 is offline   Reply With Quote
Old 04-26-2017, 09:29 PM   #8
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Since it's just text being imported, it should be fairly easy to escape the text data with something like:

Code:
from xml.sax.saxutils import escape as xmlescape
.
.
.
data = xmlescape(data)
... somewhere in the plugin after the text file is read in. That should take care of ampersands and angle brackets.
DiapDealer is offline   Reply With Quote
Old 04-26-2017, 10:08 PM   #9
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Since escape ignores straight quotes, it couldn't hurt to proactively replace them with entities:
Code:
data = xmlescape(data).replace('"', '&quot;')
Doitsu is offline   Reply With Quote
Old 04-27-2017, 05:23 AM   #10
CalibUser
Addict
CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.
 
Posts: 201
Karma: 62362
Join Date: Jul 2015
Device: Sony
I have updated the plugin in the first post so that it manages angular brackets that contain recognised HTML tags.

I tested it with variations of the following text file:

This is a line of text
This is <HELLO> <i>another</i> line <b>of</b> text
Did you know that 4>3?
This <i> tag is not matched.


Sigil generates an error when this code is imported because the <I> tag does not have its matching pair </I> and (I assume) <HELLO> is not a recognised html tag. However, if you click Yes when Sigil asks 'Are you sure you want to continue?' then the textfile is imported with these incorrect tags.


@DiapDealer and Doitsu: Thanks for your tips for a solution; I only saw these after I went to the site to upload my solution to this problem. If there are any further problems with angular brackets then I will consider these solutions.

It will be necessary to replace < and > with &lt; and &gt; if the tags do not enclose an html tag to view the page normally.

Last edited by CalibUser; 04-27-2017 at 05:36 AM.
CalibUser is offline   Reply With Quote
Old 04-27-2017, 05:41 AM   #11
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
To be honest, I don't think I'd bother trying to parse any potential markup in the text files at all. I'd just escape it all and be done.
DiapDealer is offline   Reply With Quote
Old 04-27-2017, 07:05 PM   #12
teh603
Autism Spectrum Disorder
teh603 ought to be getting tired of karma fortunes by now.teh603 ought to be getting tired of karma fortunes by now.teh603 ought to be getting tired of karma fortunes by now.teh603 ought to be getting tired of karma fortunes by now.teh603 ought to be getting tired of karma fortunes by now.teh603 ought to be getting tired of karma fortunes by now.teh603 ought to be getting tired of karma fortunes by now.teh603 ought to be getting tired of karma fortunes by now.teh603 ought to be getting tired of karma fortunes by now.teh603 ought to be getting tired of karma fortunes by now.teh603 ought to be getting tired of karma fortunes by now.
 
teh603's Avatar
 
Posts: 1,212
Karma: 6244877
Join Date: Sep 2011
Location: Coastal Texas
Device: Android Phone
Gah. Right, sorry. Guess I should've explained myself better. I don't type my own markup, so what I was asking is that the plugin either do or have the option to convert angle brackets (which in HTML are reserved for markup) into the appropriate tags.
teh603 is offline   Reply With Quote
Old 04-30-2017, 12:15 PM   #13
CalibUser
Addict
CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.
 
Posts: 201
Karma: 62362
Join Date: Jul 2015
Device: Sony
TextImporter plugin updated to version 0.1.0.4

When you run the plugin it will check for updates; if an update is available you will have the option of either opening the webpage where the plugin is found on www.mobileread.com or to be reminded again when you next run the plugin.

If the plugin is up to date then it will check for a new update after 24 hours has elapsed.

The main window has a checkbox marked 'Convert angular brackets to html code'. By ticking this box angular brackets will be converted to html code.

The main window also has two buttons. One is marked 'Close' and if you click this button the plugin will close and nothing will be imported. The other button is marked 'Get text file' and if you select this button you will be asked to select the required text file. This will be imported into your ePub and the plugin will close.

If you do not tick the box marked 'Convert angular brackets to html code' then if you import a text file that contains angular brackets Sigil will produce a warning message. However, you can ignore the warning and the text will be imported. however, you may need to amend the imported file to ensure it can be read.

I decided to use the code provided by Doitsu to determine the encoding used for the textfile.

I have posted the updated plugin in the first post of this thread

Last edited by CalibUser; 04-30-2017 at 12:17 PM.
CalibUser is offline   Reply With Quote
Old 05-05-2017, 08:51 PM   #14
teh603
Autism Spectrum Disorder
teh603 ought to be getting tired of karma fortunes by now.teh603 ought to be getting tired of karma fortunes by now.teh603 ought to be getting tired of karma fortunes by now.teh603 ought to be getting tired of karma fortunes by now.teh603 ought to be getting tired of karma fortunes by now.teh603 ought to be getting tired of karma fortunes by now.teh603 ought to be getting tired of karma fortunes by now.teh603 ought to be getting tired of karma fortunes by now.teh603 ought to be getting tired of karma fortunes by now.teh603 ought to be getting tired of karma fortunes by now.teh603 ought to be getting tired of karma fortunes by now.
 
teh603's Avatar
 
Posts: 1,212
Karma: 6244877
Join Date: Sep 2011
Location: Coastal Texas
Device: Android Phone
Woot, thanks!

Sorry about not replying sooner. I've been busy with work and writing.

Last edited by teh603; 05-05-2017 at 09:10 PM.
teh603 is offline   Reply With Quote
Old 05-06-2017, 09:46 AM   #15
CalibUser
Addict
CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.
 
Posts: 201
Karma: 62362
Join Date: Jul 2015
Device: Sony
@teh603: No problem
CalibUser is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[GUI Plugin] Import List kiwidude Plugins 613 04-01-2024 08:36 AM
Plugin to run on import of all file types annoywife Plugins 3 02-01-2015 06:08 PM
[GUI Plugin] WebOS Kindle-Import CranstD Plugins 0 01-24-2012 03:36 PM
No Module name Tkinter on plugin import foghat Plugins 1 11-11-2010 07:11 PM
Run plugin before import dremo Plugins 6 01-09-2009 12:40 PM


All times are GMT -4. The time now is 03:29 PM.


MobileRead.com is a privately owned, operated and funded community.