Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 01-09-2022, 06:27 AM   #1
abraum
Enthusiast
abraum began at the beginning.
 
Posts: 28
Karma: 10
Join Date: Aug 2020
Device: Tablet
Convert text formating from CSS to HTML

Hi,

i offen convert scanned books to epub using first abbyy finereader and then using sigil or calibre editor. abbyy finereader is using css to format the text (bold, italic, ...) and not html (<b>,<i>,...). Sometimes i have to make a lot of customize the epub file because abby finereader makes crazy things. a lot of css in the code doesnt make it easier. sigil and calibre can display bold or italic text as bold or italic directly in the code. html is much shorter then css and its clearer.

There is nbothing i can do in Finereader so that FR is using HTML instead of CSS when creating the ePub file. Can sigil or calibre do this? converting the most important css into html? something like:

Code:
<span style="font-weight: bold;">text</span>
into

Code:
<b>text</b>
Or can this be done with a normal html editor?
abraum is offline   Reply With Quote
Old 01-09-2022, 08:25 AM   #2
BeckyEbook
Fanatic
BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.
 
BeckyEbook's Avatar
 
Posts: 537
Karma: 1599199
Join Date: Jan 2017
Location: Poland
Device: Kindle (Key3, PW2, PW3), Nook (ST, GLP), Kobo Touch, Tolino Vision 2
Use the TagMechanic plugin.
Attached Thumbnails
Click image for larger version

Name:	tagmechanic-example.png
Views:	106
Size:	10.4 KB
ID:	191463  

Last edited by BeckyEbook; 01-09-2022 at 08:28 AM.
BeckyEbook is offline   Reply With Quote
Advert
Old 01-09-2022, 12:18 PM   #3
Turtle91
A Hairy Wizard
Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.
 
Turtle91's Avatar
 
Posts: 2,482
Karma: 14768783
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 11/X/6/iPad 1,2 & Air/Surface Pro/Kindle PW
Tag mechanic certainly works!

I do these kinds of changes constantly so I just use a regex find/replace:

Code:
find: <span style="font-weight: bold;">(.*?)</span>
replace: <b>\1</b>
or
replace: <strong>\1</strong>

find: <span style="font-style: italic;">(.*?)</span>
replace: <i>\1</i>
or
replace: <em>\1</em>
Unfortunately it's rare to see two ebooks that use the same style (spaces and ; are different) so I just highlight the <span ....>...</span> and copy that down to the find field, then I replace the text between the span tags with "(.*?)".

You could even make the regex more robust like:
<span style="\s*font-style:\s*italic\s*;*\s*">(.*?)</span>
<span style="\s*font-weight:\s*bold\s*;*\s*">(.*?)</span>

and save it as a saved search (group). That fixes all of them with a single mouse click.

Beware: this process assumes that the ebook spans are simplistic. I have seen some books that throw spans around EVERYTHING and this regex would not correctly select the corresponding </span>. In that case it's probably easier to use tag mechanic.

Code example:
Code:
barf:
<p class="para"><span><span class="calibre9"><span><span class="calibre1">
Hello,</span></span><span><span class="italic">buddy!
</span></span><span><span class="calibre1">How are you?
</span></span></span></span></p>

clean:
<p>Hello, <em>buddy!</em> How are you?</p>
Turtle91 is offline   Reply With Quote
Old 01-09-2022, 01:17 PM   #4
abraum
Enthusiast
abraum began at the beginning.
 
Posts: 28
Karma: 10
Join Date: Aug 2020
Device: Tablet
I'll test the regex first because i use calibre editor more then sigil. Thanks for that.
abraum is offline   Reply With Quote
Old 01-09-2022, 02:05 PM   #5
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 6,371
Karma: 4500000
Join Date: Nov 2009
Device: many
Quote:
Originally Posted by abraum View Post
I'll test the regex first because i use calibre editor more then sigil. Thanks for that.
Then why ask your question in the Sigil forum. Why not ask it in the calibre forum?
KevinH is offline   Reply With Quote
Advert
Old 01-09-2022, 08:01 PM   #6
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,023
Karma: 9496635
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by abraum View Post
i offen convert scanned books to epub using first abbyy finereader and then using sigil or calibre editor. abbyy finereader is using css to format the text (bold, italic, ...) and not html (<b>,<i>,...).
I explained exactly this a few months ago in:

Go down to where I said:

Quote:
Originally Posted by Tex2002ans View Post
Back in 2020, I partially wrote about my "12-step Finereader Cleanup" (Sigil Saved Searches).

Here's the last 5 steps of my Saved Searches dealing with Finereader tables:

[...]
Like BeckyEbook said, TagMechanic is your best friend.

Step 1. In Sigil, press Tools > Saved Searches.

Step 2. Right-Click in the list and press New Group. Name the new Group "Finereader Cleanup".

Step 3. Then fill it with entries such as:

Name: Fix Bold
Find: <span style="font-weight:bold;">
Replace: <span class="bold">

Name: Fix Italics
Find: <span style="font-style:italic;">
Replace: <span class="italics">

[...]

Step 4. In your Saved Searches, you can click on the bold "Finereader Cleanup" to highlight it, then press Replace All.

This will run all those Search/Replaces on your selected files.

Now ugly Finereader HTML like:

Code:
This is <span style="font-weight:bold;">bold text</span>.
changes into the more human-readable:

Code:
This is <span class="bold">bold text</span>.
* * *

Then you can use TagMechanic to change those <span class="bold"> into <b>:

Code:
This is <b>bold text</b>.
For example, here's one mini TagMechanic tutorial I wrote in 2020:

Last edited by Tex2002ans; 01-09-2022 at 08:07 PM.
Tex2002ans is offline   Reply With Quote
Old 01-14-2022, 08:51 AM   #7
CalibUser
Groupie
CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.
 
Posts: 193
Karma: 62230
Join Date: Jul 2015
Device: Sony
@ abraum: You could try the ePubTidyTool plugin [https://www.mobileread.com/forums/sh....php?t=264378].

This can remove unwanted tags and can change tags that contain, e.g.,

<span style="font-weight: bold;">text</span>

to

<b>text</b>
CalibUser is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Convert Several Html and Css files into one Epub File? MarjaE Editor 3 04-04-2017 03:20 PM
Best practice to OCR and convert PDF to text or html or epub crankypants ePub 15 12-14-2015 08:00 PM
Bug converting html css text-indent and left-margin bhoyt Conversion 15 01-24-2014 07:28 PM
Convert HTML tag keeping the original text Reenokazar Conversion 2 01-10-2013 03:58 PM
Convert EPUB to HTML Zip extra meta text meme Conversion 2 05-28-2012 01:34 PM


All times are GMT -4. The time now is 03:40 AM.


MobileRead.com is a privately owned, operated and funded community.