Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 05-11-2023, 02:28 PM   #1
RMOP
Connoisseur
RMOP began at the beginning.
 
Posts: 84
Karma: 10
Join Date: Dec 2008
Device: Kindle Paperwhite, TabPRO 8.4, Galaxy Light, Sony PRS-300
<p class="block_15">, etc, etc...

I recently converted a book found only in PDF format to MS Word, and then using Calibre, to ePub. Much of the conversion went quite well. However, the formatting is occasionally "off." Usually, this is a matter of a missing indentation of the first line of a new paragraph. I've traced this to the presence of this tag:

Code:
  <p class="block_16">
, whereas the properly indented paragraph first lines usually (but not invariably) use
Code:
<p class="block_15">
. In correcting this, I've also found numerous other variants of the
Code:
<p class="block_**">
. Some seem to work just like
Code:
<p class="block_15">
(e.g.,
Code:
<p class="block_38">
), but others are quite different in their effects. I have not enumerated all of the variants, but they are probably at least 10 in number.

FWIW, the
Code:
<p class="block_16">
occurs both at the beginning of a new paragraph, but also in mid-sentence of the last line of a paragraph. The result of the latter is to force a new, unindented paragraph consisted of the last fragment of that sentence.

1) Where is the best source for understanding the function of the many
Code:
<p class="block_**">
variants?

2) other than a search/replace of known offending variants, is there any way to fix these formatting glitches?

3) is there any way to reduce the likelihood of their being introduced in the first place?

Many thanks.
RMOP is offline   Reply With Quote
Old 05-11-2023, 04:49 PM   #2
Quoth
the rook, bossing Never.
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 11,164
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
Double check the paragraph styles in the MS Word docx.

Docx to epub conversion is just about perfect if the styles are correct.
Quoth is offline   Reply With Quote
Advert
Old 05-12-2023, 09:08 AM   #3
RMOP
Connoisseur
RMOP began at the beginning.
 
Posts: 84
Karma: 10
Join Date: Dec 2008
Device: Kindle Paperwhite, TabPRO 8.4, Galaxy Light, Sony PRS-300
Quoth,

Thanks. I was so relieved to find all of the pdf text and images displayed nicely in a .docx file that I didn't even think to examine what styles were in effect in the document before converting to ePub. As it is, I managed a tedious but effective search/replace of the numerous odd CLASS codes in my ePub files. All 50 chapters...
RMOP is offline   Reply With Quote
Old 05-12-2023, 01:41 PM   #4
Quoth
the rook, bossing Never.
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 11,164
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
I forget how Word works, but you can search and replace styles in LO Writer. Styles in LO Writer* or MS Word map one-to-one with CSS classes when docx is converted to epub.

So I KNOW these spurious classes are spurious paragraph styles or direct formatting in the docx.

Of course direct formatting is very bad and harder to get rid off. Simply resetting format to paragraph style of identical as big as possible chunks of a chapter works.

[* Only edit LO Writer in odt format saves/loads after initial docx import, then for epub do an extra Save As in docx also. This is because otherwise if you edit docx it's a conversion every time with multiple similar page, paragraph, heading styles etc that need fixed each time the docx is opened. This is also an issue exchanging docx between different versions of MS Word!]

Last edited by Quoth; 05-12-2023 at 01:47 PM.
Quoth is offline   Reply With Quote
Old 05-12-2023, 02:46 PM   #5
RMOP
Connoisseur
RMOP began at the beginning.
 
Posts: 84
Karma: 10
Join Date: Dec 2008
Device: Kindle Paperwhite, TabPRO 8.4, Galaxy Light, Sony PRS-300
Quoth,

Helpful. Thank you. I'm just using MS Word again after years of LO, but have minimal use of either for creating .docx documents intended for conversion to ePub format. The only clear upshot of my approach is that the embedded poetry in my document did get the necessary tags to appear properly in ePub. That was, unfortunately, offset by the many spurious tags which were eliminated by a tedious "search & destroy" mission. I'll pay more attention to styles in the LO or MS Word documents in future.
RMOP is offline   Reply With Quote
Advert
Old 05-12-2023, 04:14 PM   #6
Sarmat89
Evangelist
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 482
Karma: 2267928
Join Date: Nov 2015
Device: none
I usually convert PDFs by hand with LibreOffice and regular expressions. The result can be easily converted to EPUB or FB2, and you can easily detect and unify inconsistent formatting.
Sarmat89 is offline   Reply With Quote
Old 05-12-2023, 05:20 PM   #7
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 74,015
Karma: 129333114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Create a stye for just p such as
Code:
p {
  margin-top: 0;
  margin-bottom: 0;
  text-indent: 1.2em;
}
and edit out the classes you don't need and leave them as just <p>.
JSWolf is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
"class="calibre"" in the <html header line? retiredbiker Conversion 11 06-21-2023 01:25 PM
Point of "<br class="calibre1"/>"? enuddleyarbl ePub 41 08-10-2022 03:17 PM
Class action: "sale" royalties on "licensed" ebooks drjenkins News 2 06-02-2016 08:44 AM
"U.S. court throws out Google digital books class status" John F News 50 07-10-2013 06:20 PM
class="none" vs class="none1" hymie Conversion 2 10-27-2011 06:45 AM


All times are GMT -4. The time now is 02:44 PM.


MobileRead.com is a privately owned, operated and funded community.