02-24-2024, 03:37 PM | #1 |
the rook, bossing Never.
Posts: 11,166
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
Converting LO-Writer to Epub2 in Calibre
Epub3 can be for different purposes. That will be separately considered, or you can epub2 to epub3 and add extra epub3 stuff.
Old Open Office /LO had a plugin. It was poor. I compared LO Writer 7.4 used with Calibre 7.4 to make an epub2. LO Writer direct export epub offers epub2 or epub3. Many of the chapter headings lose text in the TOC. The files are bloated with <span> tags within paragraphs. It's still useless. Back in LO 5.x and Calibre 6.x the best option was an extra Save As in docx and convert that in Calibre to epub2. Using LO Writer 7.4 Currently it looks OK, but the H1 to H3 are replaced by styled <p>, though page/file breaks, internal links and TOC is OK. I was sure it used to do H1 to H3 etc. It may be the conversion for Save As in docx is the issue. I'll test with MS Word later. The LO 7.4 certainly seems buggier than the 6.x I was using. Currently using the LO Writer 7.4 odt direct with Calibre 7.4 is working better than docx. The headings are correct <hx> where x is heading level in LO Writer paragraph style. The 6.x odt to older Calibre epub2 erroneously used list format instead of <h> for headings. It's puzzling and as Kovid suggested the odt import wasn't being updated, I suspect the changes are in LO Writer 7.4 version. Next I'll look at images, where the CSS has always needed checking unless the image is pixels in source to px in CSS with no "auto". Last edited by Quoth; 02-25-2024 at 12:24 PM. |
02-24-2024, 05:10 PM | #2 |
Addict
Posts: 387
Karma: 1638210
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Pop_Os!, Jutoh, Kobo Forma
|
This is excellent stuff, thank you. I have been using Writer for years and struggling to get an odt document into a good epub with the least confusion and manual editing. (Currently on Writer 7.3.7.2, Calibre 7.2, on Pop_os 22.04, which is Ubuntu under the skin.)
Just to add another dimension, I recently started using Sigil as an experiment, since the window management of the Editor has become a bit iffy. There is a new version of Doitsu's ODT Import plugin for Sigil that is based on the writer2xhtml 1.7 tool. (https://www.mobileread.com/forums/sh...d.php?t=274536) I have been playing with this for just a week or so with a couple of book-length documents, so I am far from an expert, but it has some very nice features: There is a config file that lets you fine tune it in extreme detail; I've hardly explored it except for break levels and dimension conversion. I have it set to split on h1-h4, and that just simply works. Perfectly, with the h-levels respected in the epub. It builds a competent TOC from the levels as broken so you get the ideal 1 file per TOC entry. I use a Writer template with my preferred styles pre-defined. I have a separate css file that matches...in that the class "indent" has a 1 cm indent in Writer and a 2em indent in the css file (for example). I have the Sigil ODT Import plugin set up to use my css file as a default. So when I pull in an odt using my "indent" class...there it is in the epub, named "indent" and using the 2em indent. Same for other custom styles, "firstpara", "verse1", you name it. Superb. But, on the downside, it makes it's own css as well, so I will also get an "indent" class in that file that uses the 1 cm from Writer. Easy enough to remove the competing file, but it is confusing. Picture handling is almost non-existent, all need editing. But on the whole, it is the best and easiest odt to epub converter I have found yet. And as I said, I've only just begun to explore it. EDIT: thought of a couple other features: This produces no <span>s like the Calibre ODT importer. For bold and italic it simply uses <b> and <i> tags. No <div>s either. Very simple. Last edited by retiredbiker; 02-24-2024 at 06:27 PM. |
Advert | |
|
02-25-2024, 06:31 AM | #3 |
the rook, bossing Never.
Posts: 11,166
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
I think apart from the issue of headings losing the h tag (easily put back and both style and page break is correct) the sequence of extra Save As in docx (not editing docx) and convert to epub2 Calibre is far better than other approaches for a a novel. All of the paragraph styles are automatically recreated as the correct CSS, including headings. The occasional inline formats are correctly converted to HTML. The only issue is the missing <h tags, which can be fixed.
The ODT import/conversion misses some page breaks and adds spurious spans, some of which are wrong. It's far better than Writer2epub or later LO Writer versions built in epub export. So I will continue doing and extra Save As in docx and converting that in Calibre. It's pretty quick to find the headings and restore the <hx tag. The work needed for any other approach is much greater. This is the section for Calibre posts, so I'm not going to write much about Sigil. I think it's of more advantage for reference works, text books and hand crafted epub2 than a novel as the odt and docx import plugins don't automatically map all wordprocessor styles to css like calibre does. I have used Sigil, but Calibre suits my workflow and source documents better. I don't need to edit the converted epub at all for proofing or betareaders, only for final publishing. There are no tables, dropcaps, images with flowed text, or small caps to be wrangled. Rarely footnotes and my system for them works fine. The calibre docx to epub2 is very clean. I'll copy my docx from Linux to Windows 10 and Word 2007 to see is it the Save As docx in LO Writer that's losing the <hx tags or the Calibre conversion. Last edited by Quoth; 02-25-2024 at 06:36 AM. |
02-25-2024, 07:33 AM | #4 |
the rook, bossing Never.
Posts: 11,166
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
Calibre is not converting docx Levels to H tags!
Mysteriously I couldn't find any VM with Windows and Word 2007. They all have Word 2002/XP, which can only even read docx with a plugin.
My WINE has Word 2003, which isn't much better, except it works on WINE. So I installed my Word 2007 on my licensed Win10 VM and amazingly the online registration worked. Anyway, eventually I figured the stupid ribbon and found Document Map and Paragraph styles. Those should be open by default and I have the LO equivalent always open, either docked or floating depending on screen. The docx export from LO Writer 7.4 was perfect. All the Chapter heads were Level 2. All other styles also correct. The appearance also matched. So Calibre is no longer converting docx Levels to H tags! Though it's doing most of the TOC and file breaks correctly. I'm nearly sure it used to, Note that the paragraph names in docx and odt that are not body text and have a Level don't have to have Heading in the name. Perhaps that's the issue? Curiously the CSS is correct. It's only the H tag that's missing. |
02-25-2024, 08:46 AM | #5 | |
the rook, bossing Never.
Posts: 11,166
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
The only <div> in the entire docx to epub2 was wrapping the Calibre inserted cover.
There are a few <spans> which made sense, such as Quote:
Anyway, the edit Span & Divs plug-in only offers to convert <p to <div, not h2, so I temporarily changed the one <div> to <p> and did replace to <div matching the class only used for h2 with replace properties entire ticked. Then all </div> to </h2> and all <div to <h2. You could use a regex with replacing the extra wild card content as a parameter. Both epub checks passed. Then I edited all the image css replacing pt (from LO) with 1.3333 times as many px, or a % and auto for larger images (all of which are in <p that no margins and a center property.) Looks just like WP source, which I use a small page for. Paper version is edited from final for ebook by making a copy with headers, footers, page numbering, different page styles (front, body, chapter start, end matter etc), registration, page size, margins and revised fonts/sizes/margins/padding in paragraph styles to suit paper. The big publisher way of doing a PDF first and then the ebook is back-to-front. |
|
Advert | |
|
02-25-2024, 09:02 AM | #6 |
creator of calibre
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Conversion of heading styles to <h> tags works fine provided the heading styles are named Headin 1 Heading 2 etc. See the demo docx file at
https://manual.calibre-ebook.com/con...word-documents |
02-25-2024, 11:16 AM | #7 | |
the rook, bossing Never.
Posts: 11,166
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
Quote:
Thanks. (Also you are either up late or very early!) But that doesn't work if you need different styles at the same heading level. So for now I'll have to edit the final epub, which I have to do anyway to tidy up image CSS The <p tags with correct style and TOC & page breaks for headings is fine for proofing, beta readers etc. The actual <hx tag is only needed for semantic purposes, like perhaps TTS, because the TOC structure, layout/format/style and page breaks and appearance are all fine. Identical in fact! |
|
02-25-2024, 02:40 PM | #8 | |
the rook, bossing Never.
Posts: 11,166
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
Quote:
Main WS is a Dell 7050 with UPS and 23″ 4K HDR screen. Main laptop is a Lenovo with 1920 x 1080 screen. Both hybrid SSD and HDD with /var and /home as HDD partitions. |
|
02-26-2024, 04:48 PM | #9 |
Member
Posts: 15
Karma: 50592
Join Date: Dec 2021
Device: Kobo Libra 2
|
I just tried to convert to epub and come across this thread. (Am not sure if I should start a new thread.)
LibreOffice 7.6, Calibre 7.5 I've got an odt document which is saved as docx. In the document there is one style like this: before text 1,2cm; after text 0cm; first line 0cm; above paragraph 0,3cm; below paragraph 0,3cm. After importing the docx to Calibre and converting to epub, the relevant css looks like this: display: block; text-indent: 0; margin: 8.5pt 0; padding: 0; When I set the margin to 8.5pt 0 8.5pt 34pt, the View in Calibre is ok. The former is without the left margin (0) and the latter is with the left margin 34pt or 1,2cm as set in LibreOffice. It's interesting that exporting odt from LO directly to epub is ok regarding this issue. (margin: 0.1181in 0.0000in 0.1181in 0.4720in) Can someone tell where could be a problem? |
02-26-2024, 05:10 PM | #10 |
the rook, bossing Never.
Posts: 11,166
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
Never use cm for margins or padding for ebooks! That's for PDF output.
Use pt and most ebook software will regard 12 pt = 1em. Using em is best for margins, padding and fonts, but LO doesn't have that. So using pt is next best. So 18 pt is 1.5em and 16pt is a reasonable indent. The Kobo kepub format uses a slightly different equivalent for the pt rendering, so any epub intended to be a kepub should have all converted to em for margins, padding and fonts. The body font should be 12pt as that is 1em. The body font size ought to be 1em always, and usually is that if not defined. Also don't set line spacing, or else have the conversion settings remove CSS line-height, because it's default is set by font metrics if not set and then the user on the reader gadget or app can change it. They can't if you set it in ebook. docx to epub is fine using margins, padding and fonts. Using cm is meaningless for ebooks. Images may or may not rescale with user font size on the reader GUI if using pt. You should use px for images, though large ones may need auto and %. The Images css will need manually adjusted. Last edited by Quoth; 02-26-2024 at 05:13 PM. |
02-27-2024, 02:58 PM | #11 | |
Member
Posts: 15
Karma: 50592
Join Date: Dec 2021
Device: Kobo Libra 2
|
Quote:
I switched LO to pt measurement unit without an effect. Margin in css was again 8.5pt 0. But, I previously saved odt as Word 2010 365 Document (.docx). When I chose Word 2007 (.docx) the Calibre followed the style with a margin 8.5pt 0 8.5pt 34pt. |
|
02-27-2024, 03:38 PM | #12 |
the rook, bossing Never.
Posts: 11,166
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
I don't see any Word 2010-365 option, just 2007-365.
What version of LO on which OS has that option? The newest Word I have is 2007. I'd not touch Google Docs or Office 365 except if I had to for online collaboration, I collaborate without those. |
02-27-2024, 04:56 PM | #13 |
Member
Posts: 15
Karma: 50592
Join Date: Dec 2021
Device: Kobo Libra 2
|
I've got, only for reading, without a licence, Word for Mac version 16. So it's probably from this installation that LO offers Word 2010-365 file type to save.
I was picking the latest Word without any thinking. |
02-27-2024, 05:12 PM | #14 |
the rook, bossing Never.
Posts: 11,166
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
Maybe the Mac LO Writer offers that. I can't see how LO would be affected or care what MS software you have! I'm running Linux and it has the MS option I mentioned even if no MS software is installed, and any MS install would be on WINE anyway. Only Word 2003 works completely on WINE and it doesn't have docx. The docx patches for Word 2003 only install on Windows, and even then not always. I only have Word 2007 on Win10, which I don't use, though technically it's the only Word apart from 2003 that installs on WINE (but is rated lower).
Thanks. |
02-28-2024, 03:37 AM | #15 |
Member
Posts: 15
Karma: 50592
Join Date: Dec 2021
Device: Kobo Libra 2
|
I played with different LO versions and found out that Word 2010-365 file type was introduced with 7.6.0.1 version. Calibre seems not to cooperate well with this latest extension.
Installation of Word doesn't have any impact. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
EPUB2 vs EPUB3 | langshipley | ePub | 4 | 01-13-2023 06:47 PM |
Edit epub2 .opf file through calibre plug-in...? | carmenchu | Development | 5 | 05-06-2020 09:37 AM |
Calibre 3.22.1 update epub2 -> epub3 | tatteredscroll | Calibre | 6 | 04-21-2018 08:58 AM |
epub3 to epub2 | AlanHK | Sigil | 11 | 08-09-2017 05:06 AM |
EPUB2 and the DOCTYPEgate | roger64 | ePub | 21 | 07-18-2014 07:49 PM |