View Single Post
Old 03-28-2020, 04:31 PM   #17
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by LittleMissS View Post
I am not sure why I can see an emoji inserted within the code above when I preview my reply to your reply! Would that be in the code in Sigil / epub - or only in here, maybe because I used the bold?
That's a MobileRead thing. It automatically changes :p into .

In order to disable that:

1. Press the "Go Advanced" button to get into Advanced posting mode.

2. Below your post, and the "Submit Reply" + "Preview Post" buttons, there's a checkbox for "Disable smilies in text". Check that box.

Sadly, you have to check the box each time, but you only need to check the box if you see smilies you don't intend.

Quote:
Originally Posted by LittleMissS View Post
I don't want to remove all the incidents of p> unless I know that is what I must do.
As others have already put their fingers on, this was definitely a bad Find&Replace job.

Quote:
Originally Posted by LittleMissS View Post
Do you know if there is an extensive list of code to search for and swap / remove?
If converting from Word's DOCX format, a better choice is:

1. Go back and create a clean Word document (using Styles).

2. Convert using tools so you won't get all the garbage Word code in the first place!

In 2019, I wrote a bit about this and linked to a few resources.

On Styles, I really like these two videos:

Word 2013: Use Quick Styles
How to REALLY use Microsoft Office: Word Styles 101

And on super clean conversion, I recommend Toxaris's EPUBTools. This is a add-on for Microsoft Word (Windows only), that outputs extremely clean EPUBs. All you'll have left is your basic HTML: <h1>, <p>, <i>, [...].

Note: Or you can use Calibre, or for more advanced users, DiapDealer's DOCXImport Sigil plugin, or even as a last resort, Word's "Clean HTML".

Once you have the clean Styles in the first place, all future cleanup is MUCH easier no matter the conversion methods.

Quote:
Originally Posted by LittleMissS View Post
So far, code I have either deleted or swapped has been
about 700 lines of a hgue bock of code following Cubbon Sigil video. https://www.youtube.com/watch?v=A_Z8aQeEMmg
Do not follow that video.

Regular Expressions (Regex), while extremely helpful and powerful, are much more intermediate/advanced.

* * *

For example, let's say you wanted to "find all the numbers in the book and replace them with 999":

Code:
2 doctors, 3 horses, 678 toys.
With normal Find&Replace, you would have to individually search for:

Find: 678
Find: 3
Find: 2

then replace all with:

Replace: 999

and you'd probably make mistakes, errors, miss lots of numbers.

With regular expressions, special symbols can be used to search for entire groups/categories/multiples of things:

Find (Regex): \d+
Replace: 999

In regular expressions, \d is a special symbol for "any number" and + is a symbol for "1 or more".

So in plain English, what the regex is saying is "Look for 1 or more of any number, then replace with 999".

* * *

What that video is giving you, are a lot of symbols (/[]*>), which are very easy to mistype/break... ESPECIALLY if your initial document is full of gunk.

Another common error is people accidentally forget to switch of Regex mode, then continue doing their normal Search/Replace, but it's too late, they already used a special symbol and replaced a lot more than they intended (only recognizing the broken <p>p> an hour later... and who knows what else they accidentally deleted).

Last edited by Tex2002ans; 03-28-2020 at 04:38 PM.
Tex2002ans is offline   Reply With Quote