05-11-2023, 02:28 PM | #1 |
Connoisseur
Posts: 84
Karma: 10
Join Date: Dec 2008
Device: Kindle Paperwhite, TabPRO 8.4, Galaxy Light, Sony PRS-300
|
<p class="block_15">, etc, etc...
I recently converted a book found only in PDF format to MS Word, and then using Calibre, to ePub. Much of the conversion went quite well. However, the formatting is occasionally "off." Usually, this is a matter of a missing indentation of the first line of a new paragraph. I've traced this to the presence of this tag:
Code:
<p class="block_16"> Code:
<p class="block_15"> Code:
<p class="block_**"> Code:
<p class="block_15"> Code:
<p class="block_38"> FWIW, the Code:
<p class="block_16"> 1) Where is the best source for understanding the function of the many Code:
<p class="block_**"> 2) other than a search/replace of known offending variants, is there any way to fix these formatting glitches? 3) is there any way to reduce the likelihood of their being introduced in the first place? Many thanks. |
05-11-2023, 04:49 PM | #2 |
the rook, bossing Never.
Posts: 11,164
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
Double check the paragraph styles in the MS Word docx.
Docx to epub conversion is just about perfect if the styles are correct. |
Advert | |
|
05-12-2023, 09:08 AM | #3 |
Connoisseur
Posts: 84
Karma: 10
Join Date: Dec 2008
Device: Kindle Paperwhite, TabPRO 8.4, Galaxy Light, Sony PRS-300
|
Quoth,
Thanks. I was so relieved to find all of the pdf text and images displayed nicely in a .docx file that I didn't even think to examine what styles were in effect in the document before converting to ePub. As it is, I managed a tedious but effective search/replace of the numerous odd CLASS codes in my ePub files. All 50 chapters... |
05-12-2023, 01:41 PM | #4 |
the rook, bossing Never.
Posts: 11,164
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
I forget how Word works, but you can search and replace styles in LO Writer. Styles in LO Writer* or MS Word map one-to-one with CSS classes when docx is converted to epub.
So I KNOW these spurious classes are spurious paragraph styles or direct formatting in the docx. Of course direct formatting is very bad and harder to get rid off. Simply resetting format to paragraph style of identical as big as possible chunks of a chapter works. [* Only edit LO Writer in odt format saves/loads after initial docx import, then for epub do an extra Save As in docx also. This is because otherwise if you edit docx it's a conversion every time with multiple similar page, paragraph, heading styles etc that need fixed each time the docx is opened. This is also an issue exchanging docx between different versions of MS Word!] Last edited by Quoth; 05-12-2023 at 01:47 PM. |
05-12-2023, 02:46 PM | #5 |
Connoisseur
Posts: 84
Karma: 10
Join Date: Dec 2008
Device: Kindle Paperwhite, TabPRO 8.4, Galaxy Light, Sony PRS-300
|
Quoth,
Helpful. Thank you. I'm just using MS Word again after years of LO, but have minimal use of either for creating .docx documents intended for conversion to ePub format. The only clear upshot of my approach is that the embedded poetry in my document did get the necessary tags to appear properly in ePub. That was, unfortunately, offset by the many spurious tags which were eliminated by a tedious "search & destroy" mission. I'll pay more attention to styles in the LO or MS Word documents in future. |
Advert | |
|
05-12-2023, 04:14 PM | #6 |
Evangelist
Posts: 482
Karma: 2267928
Join Date: Nov 2015
Device: none
|
I usually convert PDFs by hand with LibreOffice and regular expressions. The result can be easily converted to EPUB or FB2, and you can easily detect and unify inconsistent formatting.
|
05-12-2023, 05:20 PM | #7 |
Resident Curmudgeon
Posts: 74,015
Karma: 129333114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Create a stye for just p such as
Code:
p { margin-top: 0; margin-bottom: 0; text-indent: 1.2em; } |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
"class="calibre"" in the <html header line? | retiredbiker | Conversion | 11 | 06-21-2023 01:25 PM |
Point of "<br class="calibre1"/>"? | enuddleyarbl | ePub | 41 | 08-10-2022 03:17 PM |
Class action: "sale" royalties on "licensed" ebooks | drjenkins | News | 2 | 06-02-2016 08:44 AM |
"U.S. court throws out Google digital books class status" | John F | News | 50 | 07-10-2013 06:20 PM |
class="none" vs class="none1" | hymie | Conversion | 2 | 10-27-2011 06:45 AM |