06-26-2010, 11:02 AM | #1 |
Member
Posts: 22
Karma: 10
Join Date: Jun 2010
Location: Toronto, Ontario
Device: kobo
|
Character conversion: "—" --> "—"
Hi,
I didn't read the whole Sigil forum, so I don't know if this problem was brought up before. It has to do with you certain characters are converted into a series of ugly ascii characters. For exmaple, if the original file had the following text: this was—before after editing the file with Sigil, it becomes: this was—before. The character conversion is permanent, that is, if you safe the file (after you made some changes), the series of ascii characters "—" is everywhere where "—" is supposed to be. The problematic "—" is not a regular dash, but it would be nice to keep it as is. There are probably other characters that cause similar problem, and I believe the probelem is related to how the characters are encoded. [Edit] I did a little bit of search, and found out that the character in question is a unicode dash U+002D. My guess is that Sigil doesn't handle unicode, but I'm sure this particular dash will be found in many books. Last edited by Ivo; 06-26-2010 at 11:10 AM. |
06-26-2010, 11:12 AM | #2 |
Created Sigil, FlightCrew
Posts: 1,982
Karma: 350515
Join Date: Feb 2008
Device: Kobo Clara HD
|
Is this an HTML file you're importing or a TXT file? If it's an HTML file, then it's probably specifying the wrong encoding, or none at all.
If it's a TXT file, Sigil recognizes all UTF variants based on BOM presence, and if none is present, falls back to UTF-8. So if you're using a TXT file with a non-Unicode encoding, convert it to UTF-8/16 first. |
Advert | |
|
06-26-2010, 11:16 AM | #3 |
Member
Posts: 22
Karma: 10
Join Date: Jun 2010
Location: Toronto, Ontario
Device: kobo
|
The original file was HTML, and I already converted it to ePub using calibre (the intermediate step was to import html into a rtf file, maybe it was not necessary). I wanted to do some minor changes, and ran into this problem, so the only solution was to unzip the files, and use vi.
[Edit] To restate the problem, if you have an epub file that has this character, and try to edit it, you lose all the unicode dashes. Last edited by Ivo; 06-26-2010 at 11:18 AM. |
06-26-2010, 11:22 AM | #4 |
Created Sigil, FlightCrew
Posts: 1,982
Karma: 350515
Join Date: Feb 2008
Device: Kobo Clara HD
|
Hm, create a new issue on the tracker with the epub file. Calibre often incorrectly specifies two different encodings, but I've recently worked around that.
Which version of Sigil are you using? |
06-26-2010, 11:26 AM | #5 |
Member
Posts: 22
Karma: 10
Join Date: Jun 2010
Location: Toronto, Ontario
Device: kobo
|
Valloric, I saw this problem with 0.2.2 first, and it is still there with 0.2.3. I don't believe there is any problem with calibre. It is plain simple. The epub file correctly displays the long dash when you open it, and then when you try to edit that file with Sigil, the problem happens. If you have problems reproducing this problem then I can certainly create some dummy epub file that properly displays on any ebook reader, and then when you open it with Sigil you will see the problem.
|
Advert | |
|
06-26-2010, 11:38 AM | #6 |
Member
Posts: 22
Karma: 10
Join Date: Jun 2010
Location: Toronto, Ontario
Device: kobo
|
It looks like you are right, Valloric. I used the latest calibe, 0.7.5, added a whole bunch of unicode characters, and there was no problem when I used Sigil for editing. So the problem only occurs with files that were probably incorrectly encoded with calibre. Thanks.
|
06-26-2010, 12:37 PM | #7 |
Created Sigil, FlightCrew
Posts: 1,982
Karma: 350515
Join Date: Feb 2008
Device: Kobo Clara HD
|
I'd still like to see the epub file you were having problems with if you are willing to provide it.
|
06-26-2010, 01:37 PM | #8 |
Member
Posts: 22
Karma: 10
Join Date: Jun 2010
Location: Toronto, Ontario
Device: kobo
|
I have gotten around this problem - there are many ways on how to solve this problem, but since I've started this question, I've sent you the file. No matter how I try, I cannot reproduce it with any other file.
There are two reasons why I started using Sigil. One is to fix the font problem with Kobo device, and the second one is that calibre doesn't build a proper TOC even when you try to set proper headers in RFT file (at least it didn't work for me). So I would go, edit the epub file with Sigil, and set the Part/Chapter structure properly, which recreates TOC. Don't know why chapter do not work well (for me) with calibre and RTF files, I tried Atlantis and it does an excellent job. In the end, I think Sigil is going to be an excellent tool, once it reaches higher version. And I'm happy that I can use it under linux. The only problem is that it is a bit slow. |
06-26-2010, 01:57 PM | #9 | |
Created Sigil, FlightCrew
Posts: 1,982
Karma: 350515
Join Date: Feb 2008
Device: Kobo Clara HD
|
Quote:
In case you're curious, it was caused by the XML declaration using single quotes instead of double quotes. The regex failed to account for single quotes. |
|
06-26-2010, 02:01 PM | #10 |
Member
Posts: 22
Karma: 10
Join Date: Jun 2010
Location: Toronto, Ontario
Device: kobo
|
Thanks, I am impressed with the speed of your (re)action!
|
06-26-2010, 08:32 PM | #11 |
Wizard
Posts: 1,196
Karma: 1281258
Join Date: Sep 2009
Device: PRS-505
|
It's been an excellent tool for quite some time, since it's the only readily-available program to allow post-creation editing of epubs. Valloric and I may have had disagreements in the past, but Sigil occupies a unique, and very valuable, position in the epub ecosystem.
|
06-26-2010, 09:15 PM | #12 |
Member
Posts: 22
Karma: 10
Join Date: Jun 2010
Location: Toronto, Ontario
Device: kobo
|
Sorry, I take it back. It is an excellent tool! I wish it was faster a bit (e.g., if I open an archive with 7-zip, edit the file and save it back while still in the archive mode - which is possible, things go much faster).
There are few minor things not work reporting, and given that it comes for free I certainly like it a lot. |
06-27-2010, 07:55 AM | #13 |
Created Sigil, FlightCrew
Posts: 1,982
Karma: 350515
Join Date: Feb 2008
Device: Kobo Clara HD
|
|
06-27-2010, 08:23 AM | #14 | |
Addict
Posts: 281
Karma: 52007
Join Date: Jun 2010
Device: nook
|
Quote:
I had a regrettably large number of epubs with spelling mistakes and other editorial gaffes. Sigil not only made fixing these possible, it made it easy. I also have fun re-inserting diagrams and illustrations present in the dead-tree version of the document but missing from the epub. I really like Sigil, and it is not just because at my day job I'm a fan of TrollTech's Qt framework. |
|
06-27-2010, 10:32 PM | #15 |
You kids get off my lawn!
Posts: 4,220
Karma: 73492664
Join Date: Aug 2007
Location: Columbus, Ohio
Device: Oasis 2 and Libra H2O and half a dozen older models I can't let go of
|
Is this the same issue as seen in the two attachments (I wish I could remember how to copy these so they're always visible, but I never can)...
It always happened occasionally, but it seems like lately it's happening to almost every ebook I view. These are existing ePubs that I add a cover and blurb to in Calibre. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
"Settings," then "311" - Int'l Kindle searches for wireless providers in the area | Dr. Drib | Amazon Kindle | 2 | 08-28-2011 10:27 AM |
Yep. It's official. Sony Reader has "ruined" books for me. A final "review." | WilliamG | Sony Reader | 48 | 01-14-2011 03:49 AM |
"Balanced copyright" and feedback from real people (not just corporate "persons") | llreader | News | 16 | 02-15-2010 08:27 AM |
"Zeit-Odyssee"-Trilogie droht das "dunkle Turm"-Schicksal | ThR | E-Books | 4 | 02-10-2010 05:18 AM |
Question - Does iLiab have the "search" & "annotation, highlighting" features? | HiSoC8Y | iRex | 5 | 07-01-2009 04:37 PM |