Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 12-17-2010, 05:43 AM   #1
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 11,503
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
Speakin' of weird: Linux build eating accented chars.

Hey, kids (Valloric!):

Don't try THIS at home. One of my guys is running Xubuntu 10.04. VM is XP SP3. 64-bit. When he imports perfectly good html into Sigil, what gets exported vis-a-vis accented characters is that the accented chars get eaten, entirely...so that "Déjà vu" comes out as "dj vu." It happened on other words, like resume, cafe, etc., so that you get resum, caf, and the like. Oddly enough, my authors are a bit perturbed about this, the ingrates.

FWIW, the html editor he uses outputs Windows-1252 (another convert to NoteTab!), but I don't think that's playing into it, since I use the same editor.

We've tested it with both of using the same html file...and it works fine in my Sigil, which is plain jane Windows 32-bit, but the Linux is just sucking the tasty accented vowels right outta there.

Anyone else experiencing this? Anyone else even notice it? We're going to test some more snippets before we put in a bug report, but I didn't see anything else that suggested that it had been noticed by anyone else.

It's actually problematic for me...some of my sub-contractors use Linux, and I can't be giving them books if I have to get ulcers over this, which isn't going to help their income stream, either. Thoughts? Confirms?

Hitch
Hitch is offline   Reply With Quote
Old 12-17-2010, 09:20 AM   #2
Valloric
Created Sigil, FlightCrew
Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.
 
Valloric's Avatar
 
Posts: 1,982
Karma: 350515
Join Date: Feb 2008
Device: Kobo Clara HD
The file probably doesn't specify an encoding. Never import such a file; Sigil (or any other application for that matter) has no idea what encoding it's supposed to use so it falls back to the system default encoding. On your machine that matches the encoding of the file (purely by accident!), but on his machine it does not.

Make sure your HTML files specify an encoding, either through a <meta> tag, a BOM mark or in the <xml> declaration.

I plan on adding a subsystem that uses heuristics to guess the encoding in such situations, but that subsystem would best be described as an airbag; even when you have one, it's still a good idea to avoid smashing into walls at 80mph.

Last edited by Valloric; 12-17-2010 at 09:23 AM.
Valloric is offline   Reply With Quote
Advert
Old 12-17-2010, 01:24 PM   #3
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 11,503
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
Maybe but not likely

Quote:
Originally Posted by Valloric View Post
The file probably doesn't specify an encoding. Never import such a file; Sigil (or any other application for that matter) has no idea what encoding it's supposed to use so it falls back to the system default encoding. On your machine that matches the encoding of the file (purely by accident!), but on his machine it does not.

Make sure your HTML files specify an encoding, either through a <meta> tag, a BOM mark or in the <xml> declaration.

I plan on adding a subsystem that uses heuristics to guess the encoding in such situations, but that subsystem would best be described as an airbag; even when you have one, it's still a good idea to avoid smashing into walls at 80mph.
Valloric:

It's entirely possible that I would do that, because my Fu can be, hmmm, hectic; but not this guy. He would no more output html without a specified encoding than the Pope would get married tomorrow in St. Peter's to a male partner. That's not what's going on. If he uses the same file and opens it in Sigil in Windows in his VM, it works fine; if he opens it in Linux, it strips the extended characters.

We'll certainly test it some more--and, as I said, were it I we were discussing, I'd agree that the possibility is good--but it just seems wildly unlikely with this particular guy, who's so OCD he argues with me about using workarounds to accommodate Kindle in the subsequent epub-to-mobi conversion because the coding isn't ideal. {shrug}.

EDIT: Un-be-liebable. He did actually screw up, because we were working in a draft, and that's what happened--he imported sans encoding. At least if it happens again, I'll know what to look for. Hell, we have it (the encoding text) built into our NT clips--still not sure HOW it could have happened--but at least we do know WHAT it was, so I can stop having cardiac arrest about other books.


Thanks,

Hitch

Last edited by Hitch; 12-18-2010 at 02:44 AM. Reason: Holy CRAP! I was {gasp!} wrong!
Hitch is offline   Reply With Quote
Reply

Tags
accent, linux, remove characters, sigil, xubuntu


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Calibre 0.6.40 source - segfault after successful build on Linux Megatron-UK Calibre 7 02-20-2010 04:29 AM
unicode chars in epubs after flashing hakim Sony Reader 4 10-12-2009 08:33 AM
Calibre will build but won't install on bluewhite64 linux distro angevin Calibre 5 10-18-2008 10:20 PM
Replacing Chars in URL DAiki Calibre 5 10-13-2008 09:25 AM
I can't get calibre to build/install on 64 bit linux angevin Calibre 8 10-08-2008 04:10 PM


All times are GMT -4. The time now is 08:41 AM.


MobileRead.com is a privately owned, operated and funded community.