MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Sigil (https://www.mobileread.com/forums/forumdisplay.php?f=203)
-   -   Elipsis character displays in code view but not in book view (https://www.mobileread.com/forums/showthread.php?t=234944)

Toxaris 03-01-2014 02:55 AM

Hmm, it might indeed be a non-unicode conversion issue. What is the source and how did you make an ePUB from it?

Hitch 03-01-2014 04:36 AM

Quote:

Originally Posted by Toxaris (Post 2776102)
Hmm, it might indeed be a non-unicode conversion issue. What is the source and how did you make an ePUB from it?

I'm sure that's exactly what it is, but it feels like Wordperfect to me. Possibly a Pages-->Doc conversion, but WP feels right. (Or...hell, it could be something out of one of those crappy "save your PDF to Word!" websites).

@magmanpi:

You say you're "fixing" this book? Presumably for someone? Who's going to, what, publish this? And you're using Calibre to fix it?

Hitch

magmanpi 03-02-2014 10:47 PM

Quote:

Originally Posted by Toxaris (Post 2776102)
Hmm, it might indeed be a non-unicode conversion issue. What is the source and how did you make an ePUB from it?

The source is an html file that I converted to ePub with Calibre. The conversion is pretty good except for the multitude of run-together words apparently caused by the circumflex characters that are visible only in Sigil's code view and not book view. But even though the circumflex characters are visible in code view, Sigil doesn't find them when I copy them into Sigil's search field.

As you suggested, I opened the book in a hex editor, which allowed me to successfully do a search and replace for the circumflex characters. After correcting the errors -- always a missing ellipsis or emdash that caused the words on each side of it to run together, I copied and pasted the corrected file back into Sigil and deleted the original file. The book appears to read fine now.

I'm still not sure what caused the rogue characters to appear in the first place, but at least now I have a readable book and I'll know how to fix the problem if it occurs in the future.

Thanks, everyone, for all the help! :thanks:

Hitch 03-03-2014 05:24 AM

Quote:

Originally Posted by magmanpi (Post 2777456)
The source is an html file that I converted to ePub with Calibre. The conversion is pretty good except for the multitude of run-together words apparently caused by the circumflex characters that are visible only in Sigil's code view and not book view. But even though the circumflex characters are visible in code view, Sigil doesn't find them when I copy them into Sigil's search field.

Yes, but what we re all asking is, "HTML file made from WHAT, and how?" An HTML file is (generally) the output of a program--Word, wordperfect, Pages, AbbyyFineReader, etc. Do you have any idea what the source was, just out of curiosity?

Quote:

As you suggested, I opened the book in a hex editor, which allowed me to successfully do a search and replace for the circumflex characters. After correcting the errors -- always a missing ellipsis or emdash that caused the words on each side of it to run together, I copied and pasted the corrected file back into Sigil and deleted the original file. The book appears to read fine now.

I'm still not sure what caused the rogue characters to appear in the first place, but at least now I have a readable book and I'll know how to fix the problem if it occurs in the future.

Thanks, everyone, for all the help! :thanks:
The conversion from word-processing file (or scanned file, etc.) to HTML, is the likely cause, and some lack of attention to the file encoding when it was subsequently uploaded to Sigil is what caused it. We'd all like to know what your source file was-at least, I would--just because that's the type of stuff we like to know.

Moreover, there's really no reason for this to occur "again in the future" once you understand what caused it, and what you need to do to prevent that from happening. Which might motivate you to tell us what that source was, so someone here can tell you how to get around the issue of all of it appearing in the first place. Particularly if, as I infer from your penultimate paragraph, you're planning on cleaning or fixing or making ePUBs as an ongoing concern.

Hitch

Toxaris 03-03-2014 05:45 AM

Yup, probably the export did not specify to use UTF-8. I know for my add-in that I do that very specific to avoid issues.


All times are GMT -4. The time now is 10:08 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.