Quote:
Originally Posted by LittleMissS
I am not sure why I can see an emoji inserted within the code above when I preview my reply to your reply! Would that be in the code in Sigil / epub - or only in here, maybe because I used the bold?
|
That's a MobileRead thing. It automatically changes :p into

.
In order to disable that:
1. Press the "Go Advanced" button to get into Advanced posting mode.
2. Below your post, and the "Submit Reply" + "Preview Post" buttons, there's a checkbox for "Disable smilies in text". Check that box.
Sadly, you have to check the box each time, but you only need to check the box if you see smilies you don't intend.
Quote:
Originally Posted by LittleMissS
I don't want to remove all the incidents of p> unless I know that is what I must do.
|
As others have already put their fingers on, this was definitely a bad Find&Replace job.
Quote:
Originally Posted by LittleMissS
Do you know if there is an extensive list of code to search for and swap / remove?
|
If converting from Word's DOCX format, a better choice is:
1. Go back and create a clean Word document (using Styles).
2. Convert using tools so you won't get all the garbage Word code in the first place!
In 2019,
I wrote a bit about this and linked to a few resources.
On Styles, I really like these two videos:
Word 2013: Use Quick Styles
How to REALLY use Microsoft Office: Word Styles 101
And on super clean conversion, I recommend Toxaris's
EPUBTools. This is a add-on for Microsoft Word (Windows only), that outputs
extremely clean EPUBs. All you'll have left is your basic HTML: <h1>, <p>, <i>, [...].
Note: Or you can use Calibre, or for more advanced users, DiapDealer's
DOCXImport Sigil plugin, or even as a last resort, Word's "Clean HTML".
Once you have the clean Styles in the first place, all future cleanup is MUCH easier no matter the conversion methods.
Quote:
Originally Posted by LittleMissS
|
Do not follow that video.
Regular Expressions (Regex), while extremely helpful and powerful, are much more intermediate/advanced.
* * *
For example, let's say you wanted to "find all the numbers in the book and replace them with 999":
Code:
2 doctors, 3 horses, 678 toys.
With normal Find&Replace, you would have to individually search for:
Find: 678
Find: 3
Find: 2
then replace all with:
Replace: 999
and you'd probably make mistakes, errors, miss lots of numbers.
With regular expressions, special symbols can be used to search for entire groups/categories/multiples of things:
Find (Regex): \d+
Replace: 999
In regular expressions, \d is a special symbol for "any number" and + is a symbol for "1 or more".
So in plain English, what the regex is saying is "Look for 1 or more of any number, then replace with 999".
* * *
What that video is giving you, are a lot of symbols (/[]*>), which are very easy to mistype/break... ESPECIALLY if your initial document is full of gunk.
Another common error is people accidentally forget to switch of Regex mode, then continue doing their normal Search/Replace, but it's too late, they already used a special symbol and replaced a lot more than they intended (only recognizing the broken <p>p> an hour later... and who knows what else they accidentally deleted).