01-09-2022, 06:27 AM | #1 |
Enthusiast
Posts: 31
Karma: 10
Join Date: Aug 2020
Device: Tablet
|
Convert text formating from CSS to HTML
Hi,
i offen convert scanned books to epub using first abbyy finereader and then using sigil or calibre editor. abbyy finereader is using css to format the text (bold, italic, ...) and not html (<b>,<i>,...). Sometimes i have to make a lot of customize the epub file because abby finereader makes crazy things. a lot of css in the code doesnt make it easier. sigil and calibre can display bold or italic text as bold or italic directly in the code. html is much shorter then css and its clearer. There is nbothing i can do in Finereader so that FR is using HTML instead of CSS when creating the ePub file. Can sigil or calibre do this? converting the most important css into html? something like: Code:
<span style="font-weight: bold;">text</span> Code:
<b>text</b> |
01-09-2022, 08:25 AM | #2 |
Guru
Posts: 692
Karma: 2180740
Join Date: Jan 2017
Location: Poland
Device: Misc
|
Use the TagMechanic plugin.
Last edited by BeckyEbook; 01-09-2022 at 08:28 AM. |
Advert | |
|
01-09-2022, 12:18 PM | #3 |
A Hairy Wizard
Posts: 3,093
Karma: 18727053
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
Tag mechanic certainly works!
I do these kinds of changes constantly so I just use a regex find/replace: Code:
find: <span style="font-weight: bold;">(.*?)</span> replace: <b>\1</b> or replace: <strong>\1</strong> find: <span style="font-style: italic;">(.*?)</span> replace: <i>\1</i> or replace: <em>\1</em> You could even make the regex more robust like: <span style="\s*font-style:\s*italic\s*;*\s*">(.*?)</span> <span style="\s*font-weight:\s*bold\s*;*\s*">(.*?)</span> and save it as a saved search (group). That fixes all of them with a single mouse click. Beware: this process assumes that the ebook spans are simplistic. I have seen some books that throw spans around EVERYTHING and this regex would not correctly select the corresponding </span>. In that case it's probably easier to use tag mechanic. Code example: Code:
barf: <p class="para"><span><span class="calibre9"><span><span class="calibre1"> Hello,</span></span><span><span class="italic">buddy! </span></span><span><span class="calibre1">How are you? </span></span></span></span></p> clean: <p>Hello, <em>buddy!</em> How are you?</p> |
01-09-2022, 01:17 PM | #4 |
Enthusiast
Posts: 31
Karma: 10
Join Date: Aug 2020
Device: Tablet
|
I'll test the regex first because i use calibre editor more then sigil. Thanks for that.
|
01-09-2022, 02:05 PM | #5 |
Sigil Developer
Posts: 7,636
Karma: 5433388
Join Date: Nov 2009
Device: many
|
|
Advert | |
|
01-09-2022, 08:01 PM | #6 | ||
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Go down to where I said: Quote:
Step 1. In Sigil, press Tools > Saved Searches. Step 2. Right-Click in the list and press New Group. Name the new Group "Finereader Cleanup". Step 3. Then fill it with entries such as: Name: Fix Bold Find: <span style="font-weight:bold;"> Replace: <span class="bold"> Name: Fix Italics Find: <span style="font-style:italic;"> Replace: <span class="italics"> [...] Step 4. In your Saved Searches, you can click on the bold "Finereader Cleanup" to highlight it, then press Replace All. This will run all those Search/Replaces on your selected files. Now ugly Finereader HTML like: Code:
This is <span style="font-weight:bold;">bold text</span>. Code:
This is <span class="bold">bold text</span>. Then you can use TagMechanic to change those <span class="bold"> into <b>: Code:
This is <b>bold text</b>. Last edited by Tex2002ans; 01-09-2022 at 08:07 PM. |
||
01-14-2022, 08:51 AM | #7 |
Addict
Posts: 201
Karma: 62362
Join Date: Jul 2015
Device: Sony
|
@ abraum: You could try the ePubTidyTool plugin [https://www.mobileread.com/forums/sh....php?t=264378].
This can remove unwanted tags and can change tags that contain, e.g., <span style="font-weight: bold;">text</span> to <b>text</b> |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Convert Several Html and Css files into one Epub File? | MarjaE | Editor | 3 | 04-04-2017 03:20 PM |
Best practice to OCR and convert PDF to text or html or epub | crankypants | ePub | 15 | 12-14-2015 08:00 PM |
Bug converting html css text-indent and left-margin | bhoyt | Conversion | 15 | 01-24-2014 07:28 PM |
Convert HTML tag keeping the original text | Reenokazar | Conversion | 2 | 01-10-2013 03:58 PM |
Convert EPUB to HTML Zip extra meta text | meme | Conversion | 2 | 05-28-2012 01:34 PM |