Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 02-02-2017, 12:25 AM   #1
apoiata
Enthusiast
apoiata began at the beginning.
 
Posts: 34
Karma: 10
Join Date: Feb 2017
Device: none
Unable to open htm-file.

Hi,

Friend of mine wrote a book in Word. I converted it as suggested in User Guide File->Save As Filtered HTML.

I was able to open it in version 0.7.1 but today I installed new version 0.9.7 and failed to open.

The error is:
---------------------------------
The following file was not loaded due to invalid content or not well formed XML:
[full path file name] (line 786: @787:43: That tag isn't allowed here Currently open tags: html, body, div..)

Try setting the Clean Source preference to Mend XTML Source Code on Open and reloading the file.
----------------------------------

1. First of all there is no "Clean Source" but "Mend XTML Source Code On"
2. I try to open file with and without check "Open" - same result, file is not opened and there is only "Close" button.
3. I don't understand why older version is able to open the file but the latest one is not.
4. There are a lot of errors in converted file and I am willing to clean it but how I can do it if I am not able to open in Sigil?

Sigil
Version: 0.9.7
Loaded Qt: 5.6.1
Build time: 2016.10.29 15:56:51 UTC

Thanks
apoiata is offline   Reply With Quote
Old 02-02-2017, 02:05 AM   #2
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
Apparently Sigil thinks it is a XML file instead of HTML.

Have you considered using my Word add-in? It contains various tools and also enables you to create an ePUB directly from Word. The code it produces is clean in itself. If you want, the ePUB can be opened in Sigil automatically after being saved.
Toxaris is offline   Reply With Quote
Advert
Old 02-02-2017, 02:32 AM   #3
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,568
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by Toxaris View Post
Apparently Sigil thinks it is a XML file instead of HTML.

Have you considered using my Word add-in? It contains various tools and also enables you to create an ePUB directly from Word. The code it produces is clean in itself. If you want, the ePUB can be opened in Sigil automatically after being saved.


Or use the Import DOCX plugin for Sigil
Or convert the DOCX to EPUB via calibre (GUI or command line)
Or import the DOCX into the calibre book editor

Assuming you have Word 2007 or later, converting the DOCX via any of the above (including ePub Tools) is almost always a better place to start than Filtered HTML.

And there's a lot your friend can do in Word to make conversion easier. Such as : avoiding the use of 'white space' to align text (horizontally and vertically), and using Word Styles in an attached Template instead of inline styles.

BR
BetterRed is offline   Reply With Quote
Old 02-02-2017, 03:09 AM   #4
slowsmile
Witchman
slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.
 
Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
The solution to your problem was actually in the error message. You can open any html in Sigil that is derived from a Word doc(filtered html), Word docx(filtered html), AbiWord html, ODF html or even Google Doc html using the following simple procedure in Sigil.

* Open Sigil 0.9.7 and go to Edit > Preferences > General. Then set Mend XHTML Source Code on: to Open and save.

* Now if you open your Word filtered html doc in Sigil you should have no problems. Sigil 'mends' the html by replacing the XMLNS header with the correct version for the epub standard.

Last edited by slowsmile; 02-02-2017 at 03:18 AM.
slowsmile is offline   Reply With Quote
Old 02-02-2017, 11:35 AM   #5
Notjohn
mostly an observer
Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.
 
Posts: 1,515
Karma: 987654
Join Date: Dec 2012
Device: Kindle
Rather than let Word interpret my docs, I run them through Word2CleanHtml.com online.
Notjohn is offline   Reply With Quote
Advert
Old 02-02-2017, 11:45 AM   #6
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,549
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by Notjohn View Post
Rather than let Word interpret my docs, I run them through Word2CleanHtml.com online.
Which always seemed a bit silly (and way too resource-intensive) to me, when there are so many solutions to get the same--if not better-- results without leaving the Word/Sigil environment. Most of them have been mentioned in this thread (and none of them involve having to upload your entire novel to someone else's servers).
DiapDealer is offline   Reply With Quote
Old 02-02-2017, 10:55 PM   #7
apoiata
Enthusiast
apoiata began at the beginning.
 
Posts: 34
Karma: 10
Join Date: Feb 2017
Device: none
It didn't help.

Quote:
Originally Posted by slowsmile View Post
* Now if you open your Word filtered html doc in Sigil you should have no problems. Sigil 'mends' the html by replacing the XMLNS header with the correct version for the epub standard.
As it is indicated in my initial post it didn't help to open the file.
apoiata is offline   Reply With Quote
Old 02-03-2017, 09:13 AM   #8
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,549
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
I've never really come across an html file that Sigil's new parser (Google's Gumbo, by the way ... Sigil 0.7.x used htmlTidy which was prone to destroying data) flat out refused to open. Is there any way you can produce a non-copyright violating, small sample file that exhibits this issue? I'd like to see for myself what Word is puttng out that Gumbo can't handle.
DiapDealer is offline   Reply With Quote
Old 02-03-2017, 06:39 PM   #9
slowsmile
Witchman
slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.
 
Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
@NotJohn...I've used and tested the online Word2CleanHTML app. This app does indeed clean up the HTML but unfortunately it also zaps most of the styling in the HTML. That sounds rather dumb to me because that means that you must re-style your HTML file from scratch in Sigil. So you're actually duplicating your work and creating more unnecessary work for yourself. Is that sensible?

There are much better ways to clean up your html while, at the same time, also preserving all the styling. For instance, as one of its tasks, my OpenDocHTMLImport plugin will first thoroughly clean out and reformat both your html file and CSS file for you and will leave you with a working html file that you wont have to restyle from scratch in Sigil. I use mostly bs4 for cleaning out the proprietary dross from the html file and this works quite well.

I would also second DiapDealer's comment. Sigil's Mend HTML on Open facility is also a very useful way of loading in html files derived from Word(as Web page filtered html), AbiWord, Google and OpenDoc into Sigil. I've also found that, so far, the only html file that it won't load or accept is a Word doc saved as just a Web Page(not filtered html).

From the above, I'm also guessing that the OP probably just saved his Word doc as a 'Web Page' which wont work. But if he had saved his Word doc as 'Web Page Filtered html' and set 'Mend XHTML file on Open' in Sigil Preferences as advised then it would probably load into Sigil without no problems. I've also just followed this procedure using a Word filtered htm file and it loaded into Sigil without any problems at all.

Last edited by slowsmile; 02-03-2017 at 09:17 PM.
slowsmile is offline   Reply With Quote
Old 02-03-2017, 11:14 PM   #10
apoiata
Enthusiast
apoiata began at the beginning.
 
Posts: 34
Karma: 10
Join Date: Feb 2017
Device: none
Quote:
Originally Posted by DiapDealer View Post
I've never really come across an html file that Sigil's new parser (Google's Gumbo, by the way ... Sigil 0.7.x used htmlTidy which was prone to destroying data) flat out refused to open. Is there any way you can produce a non-copyright violating, small sample file that exhibits this issue? I'd like to see for myself what Word is puttng out that Gumbo can't handle.
Sure. I copied from open p-tag till closing p-tag. Error indicated on last line of code which I copied here. Position 43 is right before "</p>" in the last line.

<p class=podpis style='text-indent:21.3pt'>

<table cellpadding=0 cellspacing=0>
<tr>
<td width=196 height=0></td>
</tr>
<tr>
<td></td>
<td><img width=328 height=156 src="RoberMelamedBook_files/image002.jpg"
alt="links/menashe.jpg"></td>
</tr>
</table>

<br clear=ALL>
Menashe people with Rabbi Avichail (right)</p>
apoiata is offline   Reply With Quote
Old 02-04-2017, 11:00 AM   #11
Mark Nord
2B || !2B
Mark Nord can program the VCR without an owner's manual.Mark Nord can program the VCR without an owner's manual.Mark Nord can program the VCR without an owner's manual.Mark Nord can program the VCR without an owner's manual.Mark Nord can program the VCR without an owner's manual.Mark Nord can program the VCR without an owner's manual.Mark Nord can program the VCR without an owner's manual.Mark Nord can program the VCR without an owner's manual.Mark Nord can program the VCR without an owner's manual.Mark Nord can program the VCR without an owner's manual.Mark Nord can program the VCR without an owner's manual.
 
Posts: 851
Karma: 194010
Join Date: Feb 2010
Location: Austria
Device: Sony PRS505/650/T1/tolino vision 5
Hi,
your code isn't valid html. (table is not allowed in p, as both are block elements.)

But I had a very interesting finding.
Spoiler:
First I nested your code in standard Sigil XML "wrapper"
Code:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html>
<head></head>

<body> 
.....
</body>
</html>
Imported ok.

Then I tried:
Code:
<html>
<head></head>
<body>
....
</body>
</html>
Didn't load

Next I changed your outer p-tags to div,
Code:
<html">
<head>

</head>
<div class=podpis style='text-indent:21.3pt'>
....
</div>
</body>
</html>
This opens fine,

also just the code in DIV's, without any HTML, HEAD, BODY a.s.o will open,
but if the html starts with a
Code:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
line, it will NOT open.


I'am on Sigil 0.9.1 on Windows
Attached Files
File Type: xml Sigil xml.xml (529 Bytes, 160 views)
File Type: xml html p.xml (378 Bytes, 175 views)
File Type: xml html div.xml (415 Bytes, 176 views)
File Type: xml div.xml (333 Bytes, 501 views)
File Type: xml doctype div.xml (538 Bytes, 170 views)

Last edited by Mark Nord; 02-04-2017 at 03:02 PM. Reason: Set Spoiler, as issue is resolved
Mark Nord is offline   Reply With Quote
Old 02-04-2017, 12:13 PM   #12
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,549
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Good information. Thanks! Are you saying that your doctype div.xml file WON'T open with 0.9.1? Because I have no problem opening that one (after renaming to .html) with Sigil 0.9.7.

The only one I have trouble opening (or adding via add existing file) with Sigil v0.9.7 is the "html p.xml" file.
DiapDealer is offline   Reply With Quote
Old 02-04-2017, 12:29 PM   #13
apoiata
Enthusiast
apoiata began at the beginning.
 
Posts: 34
Karma: 10
Join Date: Feb 2017
Device: none
Word add-in

I installed Word add-in and received this errors. Pictures are attached. Again it's table conversion error.
Attached Thumbnails
Click image for larger version

Name:	WordVersion.JPG
Views:	149
Size:	21.0 KB
ID:	154741   Click image for larger version

Name:	WordVersion1.JPG
Views:	193
Size:	32.9 KB
ID:	154742   Click image for larger version

Name:	WordVersion2.JPG
Views:	150
Size:	21.1 KB
ID:	154743  
apoiata is offline   Reply With Quote
Old 02-04-2017, 12:38 PM   #14
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,549
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Did you let Sigil try to automatically fix the error?
DiapDealer is offline   Reply With Quote
Old 02-04-2017, 12:59 PM   #15
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
DiapDealer, I received the same results as you did by renaming the files to .html I was able to load them all but the html_p one.

The problem with the html_p.html file is the lack of a DOCTYPE on the file itself. It seems sigil-gumbo actually repairs differently depending on the DOCTYPE. This was something I did not know but now makes sense.

With no DOCTYPE on the html_p.html file, Sigil literally needs to clean the file twice to get it to a proper clean state. The first pass cleans up a bunch of garbage but not the table in p issue, but it does add the proper DOCTYPE at the end (our Sigil code does that). But without a clear recognized DOCTYPE, gumbo cleans only to heavily transitional html (very weak cleaning).

The second pass will see the DOCTYPE the first pass added, and then proceed to clean up the table in p problem.

If I simply edit html_p.html and add a <!DOCTYPE html> or the epub2 version of that, at the top of the file before trying to load it, gumbo will properly clean everything in one pass.

So it appears that I will need to check for and add in the DOCTYPE inside CleanSource::Mend before passing anything to gumbo so that gumbo will properly repair the whole mess in one pass.

I will keep playing around with this.

Thanks for the test cases.
KevinH is online now   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Sigil unable to open a file Andjety Sigil 39 03-20-2017 11:08 PM
Unable to open file Toreth Sigil 25 03-16-2015 06:36 PM
unable to open database file mihal.v Calibre 3 08-16-2014 09:44 AM
Unable to open database file JulieR Calibre 2 04-24-2009 04:40 AM
Unable to open file that is 8MB timyap Sony Reader 12 05-09-2008 09:51 AM


All times are GMT -4. The time now is 07:21 PM.


MobileRead.com is a privately owned, operated and funded community.