MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Sigil (https://www.mobileread.com/forums/forumdisplay.php?f=203)
-   -   Elipsis character displays in code view but not in book view (https://www.mobileread.com/forums/showthread.php?t=234944)

magmanpi 02-27-2014 09:31 PM

Elipsis character displays in code view but not in book view
 
I'm trying to fix a book that contains many sentences that have no closing punctuation -- in this case an elipsis. These sentences all end a paragraph, and in book view there is no elipsis visible. When I look at the sentences in code view, though, there is a three-dot character that looks like an elipsis, but it's not the exact same elipsis you get when you click special characters. There's more space between the special character elipsis.

I thought it would be a simple fix to have Sigil search for the rouge elipses and replace them with the special character elipses. But when I copy the rogue elipsis out of code view and do a search for them, Sigil doesn't find them. Of course when I try to paste the code into this forum post, the rouge character also doesn't appear, which I guess is why the character doesn't appear in Sigil's book view.

The code looks like this:

<p class="calibre6">"But I would not be able to get another job, with no academic record, no past work history"</p>

In code view, there are three little dots after the word history. The sentences in question aren't broken paragraphs, though. They're supposed to end with an elipsis because the speaker has been interrupted by another person in the book, and his quote follows.

If anyone knows what the elipsis-looking characters are and how to replace them, I'd be most grateful. Thanks!

theducks 02-27-2014 09:46 PM

Quote:

Originally Posted by magmanpi (Post 2774935)
I'm trying to fix a book that contains many sentences that have no closing punctuation -- in this case an elipsis. These sentences all end a paragraph, and in book view there is no elipsis visible. When I look at the sentences in code view, though, there is a three-dot character that looks like an elipsis, but it's not the exact same elipsis you get when you click special characters. There's more space between the special character elipsis.

I thought it would be a simple fix to have Sigil search for the rouge elipses and replace them with the special character elipses. But when I copy the rogue elipsis out of code view and do a search for them, Sigil doesn't find them. Of course when I try to paste the code into this forum post, the rouge character also doesn't appear, which I guess is why the character doesn't appear in Sigil's book view.

The code looks like this:

<p class="calibre6">"But I would not be able to get another job, with no academic record, no past work history"</p>

In code view, there are three little dots after the word history. The sentences in question aren't broken paragraphs, though. They're supposed to end with an elipsis because the speaker has been interrupted by another person in the book, and his quote follows.

If anyone knows what the elipsis-looking characters are and how to replace them, I'd be most grateful. Thanks!

What font are you using? It may be missing that glyph.
& hellip; or & #8230; (no space in either)

magmanpi 02-27-2014 10:58 PM

Quote:

Originally Posted by theducks (Post 2774942)
What font are you using? It may be mising that glyph.
& hellip; or & #8230 (no space in either)

Thanks for the response. I'm not very experienced at this, so please bear with me. I'm not sure how to check the font, but when I go to the font folder in Sigil, it's empty.

& hellip; and & #8230 are html codes for an ellipsis, correct?

theducks 02-28-2014 12:18 AM

Quote:

Originally Posted by magmanpi (Post 2774975)
Thanks for the response. I'm not very experienced at this, so please bear with me. I'm not sure how to check the font, but when I go to the font folder in Sigil, it's empty.

& hellip; and & #8230 are html codes for an ellipsis, correct?

yes, they are two ways of representing the SAME character
(and they usually show)

No fonts in the font folder just means the fonts are not EMBEDED

The line in the stylesheet
font-family: "Times New Roman", serif;

Anything other than serif or sans-serif indicates a specific font face call
When the font is not embeded, it is supplied by the system.
If the system does not have it, the generic alternate is used (serif or sans-serif part)

Toxaris 02-28-2014 03:14 AM

I would be amazed if there is a standard font on a reader that does not have the ellips on board. What you can do is extract the XTHML and open it in a hex editor to check the actual (hex)code of the ellips. That should help in solving this.

magmanpi 02-28-2014 08:34 AM

Quote:

Originally Posted by theducks (Post 2775000)
yes, they are two ways of representing the SAME character
(and they usually show)

No fonts in the font folder just means the fonts are not EMBEDED

The line in the stylesheet
font-family: "Times New Roman", serif;

Anything other than serif or sans-serif indicates a specific font face call
When the font is not embeded, it is supplied by the system.
If the system does not have it, the generic alternate is used (serif or sans-serif part)

When I check the stylesheet, the font-family is "Times New Roman", serif;

What now? Thanks!

magmanpi 02-28-2014 08:38 AM

Quote:

Originally Posted by Toxaris (Post 2775065)
I would be amazed if there is a standard font on a reader that does not have the ellips on board. What you can do is extract the XTHML and open it in a hex editor to check the actual (hex)code of the ellips. That should help in solving this.

Sorry for not being very familiar with this stuff, but how do I extract the XTHML, and where do I find a hex editor? Thanks for your patience.

Toxaris 02-28-2014 08:56 AM

Right click on the document in the Book Browser on the left. Select 'Save as'. There are various free hex editors out there. I used XVI32 in the past.

magmanpi 02-28-2014 10:25 AM

Quote:

Originally Posted by Toxaris (Post 2775229)
Right click on the document in the Book Browser on the left. Select 'Save as'. There are various free hex editors out there. I used XVI32 in the past.

Toxaris, I've found phantom em dashes that also don't appear in book view, the same as the ellipses. I downloaded a hex editor, and this is what the phantom em dash looks like:

tell me—"

And the corresponding hex numbers:

74 65 6C 6C 20 6D 65 C2 97 22

What now? Thanks!

theducks 02-28-2014 12:35 PM

Quote:

Originally Posted by magmanpi (Post 2775306)
Toxaris, I've found phantom em dashes that also don't appear in book view, the same as the ellipses. I downloaded a hex editor, and this is what the phantom em dash looks like:

tell me—"

And the corresponding hex numbers:

74 65 6C 6C 20 6D 65 C2 97 22

What now? Thanks!

Ah! that looks like there is a Character (cp1252) encoding issue with your document.

that & Agrave; should have been a & rdquo;

Hitch 02-28-2014 07:57 PM

Quote:

Originally Posted by theducks (Post 2775474)
Ah! that looks like there is a Character (cp1252) encoding issue with your document.

that & Agrave; should have been a & rdquo;

Actually, it looks kinda like the rdquo is there, but the emdash isn't. Does anyone else think this looks like Wordperfect?

Hitch

theducks 02-28-2014 07:58 PM

Quote:

Originally Posted by Hitch (Post 2775931)
Actually, it looks kinda like the rdquo is there, but the emdash isn't. Does anyone else think this looks like Wordperfect?

Hitch

:smack:
Hitch caught me again :o

magmanpi 02-28-2014 08:53 PM

Quote:

Originally Posted by theducks (Post 2775474)
Ah! that looks like there is a Character (cp1252) encoding issue with your document.

that & Agrave; should have been a & rdquo;

Thanks for the information. I'm afraid, though, you're talking above my lowly pay grade. :blink: I think &agrave; and &rdquo are unicode, but I'm still not sure what I need to do to fix my document. Is there a way to get Sigil to do a search and replace to find and fix the bad code? Sorry for the dumb questions and thanks for your patience.

magmanpi 02-28-2014 08:55 PM

Oops! I posted the above reply before I saw the latest from Hitch and theducks. I'm still lost, though.

theducks 02-28-2014 09:53 PM

Quote:

Originally Posted by magmanpi (Post 2775970)
Oops! I posted the above reply before I saw the latest from Hitch and theducks. I'm still lost, though.

Those are characters.

But Hitch may be closer (She does do this for a living), with the idea that this is a Wordperfect artifact.

Did she get that correct? (WP had it's own way of encoding some characters)

Toxaris 03-01-2014 02:55 AM

Hmm, it might indeed be a non-unicode conversion issue. What is the source and how did you make an ePUB from it?

Hitch 03-01-2014 04:36 AM

Quote:

Originally Posted by Toxaris (Post 2776102)
Hmm, it might indeed be a non-unicode conversion issue. What is the source and how did you make an ePUB from it?

I'm sure that's exactly what it is, but it feels like Wordperfect to me. Possibly a Pages-->Doc conversion, but WP feels right. (Or...hell, it could be something out of one of those crappy "save your PDF to Word!" websites).

@magmanpi:

You say you're "fixing" this book? Presumably for someone? Who's going to, what, publish this? And you're using Calibre to fix it?

Hitch

magmanpi 03-02-2014 10:47 PM

Quote:

Originally Posted by Toxaris (Post 2776102)
Hmm, it might indeed be a non-unicode conversion issue. What is the source and how did you make an ePUB from it?

The source is an html file that I converted to ePub with Calibre. The conversion is pretty good except for the multitude of run-together words apparently caused by the circumflex characters that are visible only in Sigil's code view and not book view. But even though the circumflex characters are visible in code view, Sigil doesn't find them when I copy them into Sigil's search field.

As you suggested, I opened the book in a hex editor, which allowed me to successfully do a search and replace for the circumflex characters. After correcting the errors -- always a missing ellipsis or emdash that caused the words on each side of it to run together, I copied and pasted the corrected file back into Sigil and deleted the original file. The book appears to read fine now.

I'm still not sure what caused the rogue characters to appear in the first place, but at least now I have a readable book and I'll know how to fix the problem if it occurs in the future.

Thanks, everyone, for all the help! :thanks:

Hitch 03-03-2014 05:24 AM

Quote:

Originally Posted by magmanpi (Post 2777456)
The source is an html file that I converted to ePub with Calibre. The conversion is pretty good except for the multitude of run-together words apparently caused by the circumflex characters that are visible only in Sigil's code view and not book view. But even though the circumflex characters are visible in code view, Sigil doesn't find them when I copy them into Sigil's search field.

Yes, but what we re all asking is, "HTML file made from WHAT, and how?" An HTML file is (generally) the output of a program--Word, wordperfect, Pages, AbbyyFineReader, etc. Do you have any idea what the source was, just out of curiosity?

Quote:

As you suggested, I opened the book in a hex editor, which allowed me to successfully do a search and replace for the circumflex characters. After correcting the errors -- always a missing ellipsis or emdash that caused the words on each side of it to run together, I copied and pasted the corrected file back into Sigil and deleted the original file. The book appears to read fine now.

I'm still not sure what caused the rogue characters to appear in the first place, but at least now I have a readable book and I'll know how to fix the problem if it occurs in the future.

Thanks, everyone, for all the help! :thanks:
The conversion from word-processing file (or scanned file, etc.) to HTML, is the likely cause, and some lack of attention to the file encoding when it was subsequently uploaded to Sigil is what caused it. We'd all like to know what your source file was-at least, I would--just because that's the type of stuff we like to know.

Moreover, there's really no reason for this to occur "again in the future" once you understand what caused it, and what you need to do to prevent that from happening. Which might motivate you to tell us what that source was, so someone here can tell you how to get around the issue of all of it appearing in the first place. Particularly if, as I infer from your penultimate paragraph, you're planning on cleaning or fixing or making ePUBs as an ongoing concern.

Hitch

Toxaris 03-03-2014 05:45 AM

Yup, probably the export did not specify to use UTF-8. I know for my add-in that I do that very specific to avoid issues.


All times are GMT -4. The time now is 10:08 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.