![]() |
Don't use Calibre to clean up the filtered HTML. Either do it manually in Sigil or use a program/macro to do it.
Conversion to ePUB in Calibre will cause big changes in your styles. Further more, it is not necessary, since Sigil can import HTML without issues. |
Quote:
Karen |
Quote:
But, I'll keep working on it and eventually I will have a decent looking, if not perfect, eBook! Karen |
Quote:
It didn't do what it was intended to do?... or it didn't do what you wanted/expected it to do? There's a difference. ;) It certainly should have done what I said it would do... if you had the ePub open in Sigil, in Code View(an html file), with the F&R widget open (and in Regex mode) and set to "All HTML Files". |
Suppressing <br /> tags only in "body text" style.
Could there be a way to destroy the soft hyphens only when they are included in a "body text" paragraph? Rationale: After using a new (and not perfect) OCR , I found that my recognized text was interspersed with a lot of <br /> tags (soft hyphens?). I usually insert the html file in OpenOffice and clean all formatting to begin with. Even this way, I realized that these resilient tags survived. It is not that bad. Some poems or songs are thus nicely transcribed. On the other hand, I have to clean these tags for many standard paragraphs of text. Sigil provides a simple way out. The user has a choice either cleaning every one of them, good and bad, or selectively and patiently suppress the useless tags... There could a better one. Give your songs or poems their own style, keep standard text in its "body text" class and then launch the following Regex... |
<br />'s are not soft-hyphens.... just to be clear. ;)
Quote:
If there's only one occurrence of the <br /> tag inside a paragraph, this expression should find it (only inside p tags of the class "body-text"): Code:
<p class="body-text">(?!</p>).*\K<br[^>]*?/>The following expression should match the first occurrence (if there's more than one) of a <br /> tag inside p tags of the class "body-text". Code:
(?U)<p class="body-text">(?!</p>).*\K<br[^>]*?/>It's certainly not ideal, but if you have multiple <br /> tags inside the targeted paragraph (class name "body-text"), you could conceivably run one or the other of these "Replace All" expressions multiple times until the search no longer matches anything. Still quicker than stepping through each occurrence (and will ignore all other p classes), though. |
@DiapDealer
Thanks very much for your reply. I will put it soon to work. Do you think it is possible to join your two commands with a kind of AND/OR link so that it would destroy the tags two by two or be happy with one? Thanks for the vocabulary. I was not sure about it. Now I know. |
Quote:
|
@DiapDealer
I am very pleased to report full success of your Regex ( I used the first one) which deleted successively in seven busy rounds: 53/22/7/5/2/2/2 occurrences of the <br /> tag. :thumbsup: :thanks: This is only the top of the iceberg, because on the odt I previously manually destroyed probably about over one hundred. I did not know then I would use your regex. For information, this is the styles break-up of the test EPUB (classes only): Spoiler:
|
Quote:
|
reverse linking time consuming woes
<a href="../Text/notes.html#scrip1" id="backscrip1">This text is a link</a>
The above is some code in my file that I use to reverse link, or tag/anchor, whatever they call it. You click on a link in one file (in this case clicking on the text "This text is a link" would take you to the "../Text/notes.html file, where another link is designated as "scrip1", with the previous link "This text is a link" was designated as "backscrip1". So they go back and forth. When there are hundreds of reverse links, it take me a short time to list the main code ie... <a href="../Text/scriptures.html#scrip1" id="backscrip1">This text is a link</a> <a href="../Text/scriptures.html#scrip1" id="backscrip1">This text is a link</a> <a href="../Text/scriptures.html#scrip1" id="backscrip1">This text is a link</a> <a href="../Text/scriptures.html#scrip1" id="backscrip1">This text is a link</a> <a href="../Text/scriptures.html#scrip1" id="backscrip1">This text is a link</a> but now I have to go back and change the second occurrence of the linking code to "2" then "3" then "4", ie... <a href="../Text/scriptures.html#scrip1" id="backscrip1">This text is a link</a> <a href="../Text/scriptures.html#scrip2" id="backscrip2">This text is a link</a> <a href="../Text/scriptures.html#scrip3" id="backscrip3">This text is a link</a> <a href="../Text/scriptures.html#scrip4" id="backscrip4">This text is a link</a> ....you get the idea. Is there a way to use the find and replace in such a way that it would search for this code and bump up the number for each occurrence, so I won't have to manually find each one and put in each number separately myself? :thanks: |
Quote:
|
I was afraid of that. I guess the best thing would be to save it as a template and insert the text, but that still entails manually inserting each occurrence. Is there a quicker way of doing such a task that I just am not aware of yet? Thanks for the consideration.
|
I don't know about Sigil, but this is what I do in vim:
I use a special symbol (¬, |, ¦ are useful for this) where I want the consecutive numbers: Code:
<a href="../Text/scriptures.html#scrip¬" id="backscrip¬">This text is a link</a>Code:
: let n=1 | g/¬/s/¬/\=n/g | let n+=1 |
Omg are you serious? I will have to give it a go! So how would I go about getting the code into Sigil afterward? That is the only way I know to convert it into epub.
|
| All times are GMT -4. The time now is 07:52 PM. |
Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.