Quote:
Originally Posted by magmanpi
The source is an html file that I converted to ePub with Calibre. The conversion is pretty good except for the multitude of run-together words apparently caused by the circumflex characters that are visible only in Sigil's code view and not book view. But even though the circumflex characters are visible in code view, Sigil doesn't find them when I copy them into Sigil's search field.
|
Yes, but what we re all asking is, "HTML file made from WHAT, and how?" An HTML file is (generally) the output of a program--Word, wordperfect, Pages, AbbyyFineReader, etc. Do you have any idea what the source was, just out of curiosity?
Quote:
As you suggested, I opened the book in a hex editor, which allowed me to successfully do a search and replace for the circumflex characters. After correcting the errors -- always a missing ellipsis or emdash that caused the words on each side of it to run together, I copied and pasted the corrected file back into Sigil and deleted the original file. The book appears to read fine now.
I'm still not sure what caused the rogue characters to appear in the first place, but at least now I have a readable book and I'll know how to fix the problem if it occurs in the future.
Thanks, everyone, for all the help!
|
The conversion from word-processing file (or scanned file, etc.) to HTML, is the likely cause, and some lack of attention to the file encoding when it was subsequently uploaded to Sigil is what caused it. We'd all like to know what your source file was-at least, I would--just because that's the type of stuff we like to know.
Moreover, there's really no reason for this to occur "again in the future" once you understand what caused it, and what you need to do to prevent that from happening. Which might motivate you to tell us what that source was, so someone here can tell you how to get around the issue of all of it appearing in the
first place. Particularly if, as I infer from your penultimate paragraph, you're planning on cleaning or fixing or making ePUBs as an ongoing concern.
Hitch