Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 03-22-2010, 10:24 PM   #1
walter2
Junior Member
walter2 began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Mar 2010
Device: none
Exclamation Sigil shows a blank document when importing valid HTML

This has me mystified, I have a fine html doc (which as per the instructions I used "save as" from my original word doc) with no tricky things in it. It views perfectly by itself. when I open it in Sigil, the code window shows some file data (author, etc.), but the book view is totally blank. one single blank page.

I can't even see any way for me to have done this incorrectly, but I am stumped by this, and can't get any further, nor can I find any threads or help that even mentions this kind of problem. any ideas out there?

all help deeply appreciated!
-walter
walter2 is offline   Reply With Quote
Old 03-23-2010, 09:05 AM   #2
paulpeer
Zealot
paulpeer is on a distinguished road
 
paulpeer's Avatar
 
Posts: 147
Karma: 56
Join Date: Dec 2009
Location: Antwerpen
Device: iPhone, Sony PRS-505, EPUBreader
Can you upload your file here, Walter? It's hard to give advice without seeing the document.
paulpeer is offline   Reply With Quote
Old 03-23-2010, 11:38 AM   #3
walter2
Junior Member
walter2 began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Mar 2010
Device: none
file sample

I can email the file to you if you can send em an email address. my email is: walter2@sphere.bc.ca

regards,
walter
walter2 is offline   Reply With Quote
Old 03-23-2010, 12:51 PM   #4
Valloric
Created Sigil, FlightCrew
Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.
 
Valloric's Avatar
 
Posts: 1,978
Karma: 350515
Join Date: Feb 2008
Device: Sony Reader PRS 505
Don't upload it to the forum, attach it to your issue on the tracker.

Read the reporting issues wiki page.

There is a correct way to report problems, and then there's every other way.
Valloric is offline   Reply With Quote
Old 03-23-2010, 09:13 PM   #5
walter2
Junior Member
walter2 began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Mar 2010
Device: none
It's pretty clear to me that using the "save as HTML" is NOT the answer for importing Word docs. studying the output HTML files reveals the usual MS forest of un-needed tags and a lot of javascript even for the simplest things. As was pointed out to me here, this can't be read by any ebook reader.

my file has some spacing, drop caps, and underlines. all of which turned into literally pages of un-wanted tags. fine. I hand stripped everything out, saved as HTML, and made sure there was no javascript in the resulting code. but, I STILL got the empty white page of death in sigil.

no problem, I tired save as RTF. Nope, again, the white page of death in Sigil. now this was pretty frustrating, as I just couldn't see where the problem could possibly be hiding.

fine, I saved just as text. this worked, and sigil did import the text in fine, but I then had to go back in and fix the style issues in html and some on screen edits. so far so good. at least I have a working document.

however I think it is fair to make these observations:

Sigil should NOT recommend use of the Word's internal HTML conversion to make an import file for sigil as the "best way". I tired many different and simple text sniplets, all crash when attempting to load into sigil as HTML. since RTF also didn't work for me, I think you should be changing the suggested word export technique to plain text, as that would have saved me hours of work and many inexplicable problems. there are no doubt examples of Word files that can somehow work in a higher level export format, but there are so many issues with even simple files, that I just can't see it as the "recommended" way, especially since there is no guidance at all as to what can go wrong or why it does so in sigil.

other than that time-wasting input format nightmare, I have to say sigil worked pretty well, although two problem are still making me crazy:

1. how do I get paragraphs to indent automatically? the default is left aligned blocks of text, not very attractive. I see no way to fix it. i tried altering a P tag in the CSS area but I could only get the inter-paragraph spaces to go away, not get a leading indent.

2. why on earth does the entire document reload at the very start whenever you change anything in the code window? talk about irritating...especially in a 249 page document...there's no quick way to return.

I also noted that when saving, the program automatically appends .sgf to the file name, this makes saving as an epub file impossible. you have to go in and edit the file name to get rid of this quirk before saving as an epub file.

one last thing that remains a mystery to me, does the TOC ever appear anywhere in the document? I have my entries in it, but within sigil, I can't see it or use it for navigation at all. how on earth do you make it actually appear? The wiki tutorial says zip on this topic.

many thanks,
walter
walter2 is offline   Reply With Quote
Old 03-24-2010, 07:40 AM   #6
Valloric
Created Sigil, FlightCrew
Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.
 
Valloric's Avatar
 
Posts: 1,978
Karma: 350515
Join Date: Feb 2008
Device: Sony Reader PRS 505
Oh boy...

Quote:
Originally Posted by walter2 View Post
It's pretty clear to me that using the "save as HTML" is NOT the answer for importing Word docs.
While Word's HTML output is horrible, a lot of people are importing it into Sigil with great success, myself included.

Quote:
Originally Posted by walter2 View Post
my file has some spacing, drop caps, and underlines. all of which turned into literally pages of un-wanted tags. fine. I hand stripped everything out, saved as HTML, and made sure there was no javascript in the resulting code. but, I STILL got the empty white page of death in sigil.

no problem, I tired save as RTF. Nope, again, the white page of death in Sigil.
How about attaching your HTML file to a new issue on the tracker? I've suggested this to you already. If you add the "Private" tag, no one but you and me will be able to see it.

You should never see a white page after importing.

Quote:
Originally Posted by walter2 View Post
I tired many different and simple text sniplets, all crash when attempting to load into sigil as HTML.
Try Sigil 0.2.0β3.

Quote:
Originally Posted by walter2 View Post
1. how do I get paragraphs to indent automatically? the default is left aligned blocks of text, not very attractive. I see no way to fix it. i tried altering a P tag in the CSS area but I could only get the inter-paragraph spaces to go away, not get a leading indent.
Try this CSS code:

Code:
p {
    text-indent:30px;
}
Feel free to change the pixel value.

Quote:
Originally Posted by walter2 View Post
2. why on earth does the entire document reload at the very start whenever you change anything in the code window? talk about irritating...especially in a 249 page document...there's no quick way to return.
Try Sigil 0.2.0β3.

Quote:
Originally Posted by walter2 View Post
I also noted that when saving, the program automatically appends .sgf to the file name, this makes saving as an epub file impossible. you have to go in and edit the file name to get rid of this quirk before saving as an epub file.
Try Sigil 0.2.0β3.

Quote:
Originally Posted by walter2 View Post
one last thing that remains a mystery to me, does the TOC ever appear anywhere in the document? I have my entries in it, but within sigil, I can't see it or use it for navigation at all. how on earth do you make it actually appear? The wiki tutorial says zip on this topic.
An epub is not a DOC file. While DOCs have an "inline" TOC (a TOC in the very text of the document), an epub does not. Here, the TOC is external and placed inside the NCX file inside the epub archive.

People reading your epub book will be able to access the TOC through an always available menu entry. This is "the epub way".

Of course, you can also make an inline TOC with links by hand, but I personally suggest you don't. The NCX TOC is there for a reason, and is more usable than an inline one to which you have to manually scroll etc. It also displays according to the UX of the Reading System: the Sony PRS-505 shows it as a menu, ADE shows it in a tab on the left of the screen etc.

Last edited by Valloric; 03-24-2010 at 02:01 PM. Reason: typo
Valloric is offline   Reply With Quote
Old 03-24-2010, 01:06 PM   #7
walter2
Junior Member
walter2 began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Mar 2010
Device: none
Exclamation More follow-up to this "blank page" problem, with examples!

For anybody that would like to test this problem for themselves, I have the three versions of the file, HTML (rar'd to fit the upload limits), RTF and TXT.
I am using the current release, 0.1.9

I just downloaded the new beta of Sigil, to test, do I have to remove my old version first, or can I just install the beta on top of it?

just out of curiosity, in the end, does the epub format somehow bundle the images used, or do they travel along as individual files, as with html?

many thanks for all the excellent help and suggestions, I am getting
very close to a fully working file!
best regards,
walter
sphere research corp.
http://www.sphere.bc.ca
walter2 is offline   Reply With Quote
Old 03-24-2010, 02:12 PM   #8
Dave_S
What Title ?
Dave_S ought to be getting tired of karma fortunes by now.Dave_S ought to be getting tired of karma fortunes by now.Dave_S ought to be getting tired of karma fortunes by now.Dave_S ought to be getting tired of karma fortunes by now.Dave_S ought to be getting tired of karma fortunes by now.Dave_S ought to be getting tired of karma fortunes by now.Dave_S ought to be getting tired of karma fortunes by now.Dave_S ought to be getting tired of karma fortunes by now.Dave_S ought to be getting tired of karma fortunes by now.Dave_S ought to be getting tired of karma fortunes by now.Dave_S ought to be getting tired of karma fortunes by now.
 
Posts: 1,325
Karma: 1856232
Join Date: Jan 2009
Location: Bavaria Germany
Device: HTC Sensation 4G (KK), Nexus 7b (KK)
Quote:
Originally Posted by walter2 View Post
For anybody that would like to test this problem for themselves, I have the three versions of the file, HTML (rar'd to fit the upload limits), RTF and TXT.
I gave the HTML version of your test file a try just for fun, as I am definitely NOT an expert on HTML. I tried the file with two versions of Sigil (0.1.3 and 0.2B3) and saw the same results that you did. Then I tried running the file through HTML Tidy, but Tidy gave up after over 4000 warnings and over 2000 errors. So next I loaded your HTML file in OpenOffice and then just simply resaved it as HTML again. The OpenOffice HTML file went through Tidy with only a few warnings and no errors. The OpenOffice version of your HTML file also loads fine in both versions of Sigil that I tried. I think that I recall that Sigil automatically tries to clean up an HTML file with Tidy, so it appears that the fact that Tidy aborts while trying to fix your HTML file may be the reason that Sigil ends up with a blank document?

FWIW, the cleaned up file looks fine in Sigil except for some extra large spacing between paragraphs, but then I am not the author so I do not really know what to expect. There are obviously a lot of thing that I could have missed in my quick look. In any case the experience was amusing so thanks for the sample to play with, and I hope my experience in trying out your file may help point you toward a solution.
Dave_S is offline   Reply With Quote
Old 03-24-2010, 02:15 PM   #9
Valloric
Created Sigil, FlightCrew
Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.
 
Valloric's Avatar
 
Posts: 1,978
Karma: 350515
Join Date: Feb 2008
Device: Sony Reader PRS 505
Quote:
Originally Posted by walter2 View Post
For anybody that would like to test this problem for themselves, I have the three versions of the file, HTML (rar'd to fit the upload limits), RTF and TXT.
I just tried loading the HTML file with Sigil 0.2.0beta3, and yes, only a white page is shown. The load fails.

But that's some awful HTML. I don't even think you can call that HTML.

I then opened the file in Word 2007 and saved it as "Web page, filtered" and opened that file just fine with Sigil. The layout is the same in Sigil and Word (as far as I can tell from a quick glance).

You should always use the filtered HTML option when saving HTML from Word, no matter what application you want to use to open the resulting file.

Quote:
Originally Posted by walter2 View Post
I just downloaded the new beta of Sigil, to test, do I have to remove my old version first, or can I just install the beta on top of it?
You can install it on top, or side-by-side in a different folder. They can coexist just fine.

Quote:
Originally Posted by walter2 View Post
just out of curiosity, in the end, does the epub format somehow bundle the images used, or do they travel along as individual files, as with html?
The way you phrased that question, I'd answer "yes" to both. They are stored as individual files inside the epub archive. An epub is just a ZIP archive with specific contents. Word 2007's new DOCX format works in a similar way (it's also a ZIP archive).

Last edited by Valloric; 03-24-2010 at 02:18 PM.
Valloric is offline   Reply With Quote
Old 03-24-2010, 03:06 PM   #10
walter2
Junior Member
walter2 began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Mar 2010
Device: none
Well, sad to report that I have word 2000, not 2007, and it has only one sad flavor of HTML export (grossly over-done and incomprehensible). I do have Open Office, however, and the idea of rinsing it though there has some appeal for other docs

what is this mysterious HTML Tidy application?

many thanks,
walter
walter2 is offline   Reply With Quote
Old 03-24-2010, 03:19 PM   #11
Dave_S
What Title ?
Dave_S ought to be getting tired of karma fortunes by now.Dave_S ought to be getting tired of karma fortunes by now.Dave_S ought to be getting tired of karma fortunes by now.Dave_S ought to be getting tired of karma fortunes by now.Dave_S ought to be getting tired of karma fortunes by now.Dave_S ought to be getting tired of karma fortunes by now.Dave_S ought to be getting tired of karma fortunes by now.Dave_S ought to be getting tired of karma fortunes by now.Dave_S ought to be getting tired of karma fortunes by now.Dave_S ought to be getting tired of karma fortunes by now.Dave_S ought to be getting tired of karma fortunes by now.
 
Posts: 1,325
Karma: 1856232
Join Date: Jan 2009
Location: Bavaria Germany
Device: HTC Sensation 4G (KK), Nexus 7b (KK)
Quote:
Originally Posted by walter2 View Post
what is this mysterious HTML Tidy application?
It checks HTML documents for correctness, and tries to clean up what it can. It is a command line application, but there is also a GUI for it if needed.
http://tidy.sourceforge.net/
Dave_S is offline   Reply With Quote
Old 03-24-2010, 08:50 PM   #12
st_albert
Fanatic
st_albert calls his or her ebook reader Vera.st_albert calls his or her ebook reader Vera.st_albert calls his or her ebook reader Vera.st_albert calls his or her ebook reader Vera.st_albert calls his or her ebook reader Vera.st_albert calls his or her ebook reader Vera.st_albert calls his or her ebook reader Vera.st_albert calls his or her ebook reader Vera.st_albert calls his or her ebook reader Vera.st_albert calls his or her ebook reader Vera.st_albert calls his or her ebook reader Vera.
 
Posts: 543
Karma: 64420
Join Date: Feb 2010
Device: none
Quote:
Originally Posted by walter2 View Post
Well, sad to report that I have word 2000, not 2007, and it has only one sad flavor of HTML export (grossly over-done and incomprehensible). I do have Open Office, however, and the idea of rinsing it though there has some appeal for other docs

what is this mysterious HTML Tidy application?

many thanks,
walter
It's been a while, but I seem to recall there used to be a utility specifically designed to clean up Word HTML. (Kind of like Tidy, but with a stronger stomach? ) Maybe a google search can find it.

But in the long run, loading the original .doc or .rtf or whatever into OpenOffice, then saving using the Writer2xhtml plugin is probably the best way to go.
st_albert is offline   Reply With Quote
Old 03-24-2010, 11:34 PM   #13
yekim54
What the Dog Saw
yekim54 ought to be getting tired of karma fortunes by now.yekim54 ought to be getting tired of karma fortunes by now.yekim54 ought to be getting tired of karma fortunes by now.yekim54 ought to be getting tired of karma fortunes by now.yekim54 ought to be getting tired of karma fortunes by now.yekim54 ought to be getting tired of karma fortunes by now.yekim54 ought to be getting tired of karma fortunes by now.yekim54 ought to be getting tired of karma fortunes by now.yekim54 ought to be getting tired of karma fortunes by now.yekim54 ought to be getting tired of karma fortunes by now.yekim54 ought to be getting tired of karma fortunes by now.
 
yekim54's Avatar
 
Posts: 305
Karma: 981684
Join Date: Jul 2008
Location: Dunn Loring
Device: Sony PRS-505, PRS-650, Asus TF101
Quote:
Originally Posted by walter2 View Post
Well, sad to report that I have word 2000, not 2007, and it has only one sad flavor of HTML export (grossly over-done and incomprehensible).
You might want to download Microsoft's HTML Filter 2.0 and install it into your Word 2000 application to see if it helps. It seems to work well in my Word 2000.

http://support.microsoft.com/?kbid=236967

Last edited by yekim54; 03-24-2010 at 11:47 PM.
yekim54 is offline   Reply With Quote
Old 03-25-2010, 12:23 AM   #14
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 1,404
Karma: 846401
Join Date: Jan 2009
Device: KoboGlo
about OpenOffice and Writer2xhtml, you may have a look here
http://www.mobileread.com/forums/sho...77&postcount=7
roger64 is offline   Reply With Quote
Old 03-25-2010, 01:16 AM   #15
paulpeer
Zealot
paulpeer is on a distinguished road
 
paulpeer's Avatar
 
Posts: 147
Karma: 56
Join Date: Dec 2009
Location: Antwerpen
Device: iPhone, Sony PRS-505, EPUBreader
Quote:
Originally Posted by walter2 View Post
I do have Open Office, however, and the idea of rinsing it though there has some appeal for other docs
In my opinion it's a very good practice opening a DOC file in OpenOffice and saving it as an ODT file before exporting to HTML. As you might have seen, a DOC file is normally three or more times as big as an ODT file. So Open Office does a lot of cleaning work.
paulpeer is offline   Reply With Quote
Reply

Tags
html conversion is blank, html problems, input doc failure

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Importing Open Office HTML in Sigil paulpeer Sigil 17 03-18-2010 04:23 AM
HTML importing problem PaladinBL Sigil 13 03-16-2010 05:03 PM
Blank spaces on the side of cover when importing from epubs Dopedangel Calibre 6 02-09-2010 12:15 AM
Sigil 1.6 - deleting blank line very slow lol Sigil 2 12-24-2009 11:54 AM
Importing HTML Files Shadowlane Calibre 1 12-19-2009 03:04 PM


All times are GMT -4. The time now is 10:01 AM.


MobileRead.com is a privately owned, operated and funded community.