Probably can do it all in Sigil (and a Tab is a unique character that can be found by Search & Replace and should never be in a novel).
Obviously Sigil is great tool for creating complex ebooks, but if you are downloading / importing finished ebooks you didn't make, then Calibre is a more likely starting place. I'd never use Calibre to fix odt/docx/RTF source etc, only content already ebooks (which IMO excludes PDF as that is format not intended to be edited or reflowed as it's designed for WYSIWYG publishing / print).
Also I have two alternates of conversion filters for all ebooks I add to Calibre:
1) Automatically remove all white-space CSS and line-height CSS (Commercial ebooks)
2) If it's from Gutenberg; the above and remove space between paragraphs, set 1.3em indent smart punctuation, force full justification (only affects body).
I convert even epub to epub.
If I need non-epub, I convert from known "correct" epub. I create PDF from LO Writer after fixing styles, fonts, headings, header, footer, page styles (front matter, body, back matter and others), footnotes, page numbers, endnotes etc. I only add PDFs to Calibre that are obtained as PDFs and work on 8″ Sage (instruction booklets etc) and some picture books for tablet. Most PDFs are in directory structures and are non-fiction.
If I open an epub on the reader and it's an imported (Gutenberg, other PD, commercial etc) one that's "too awkward" to read I'd fix it in Calibre.
I'd very rarely ever export an ebook to docx and edit, and only if a larger omnibus PD title that excessively badly proofed and formatted. Maybe three times in 15 years.
Saved Searches is a nice feature.
I have used Sigil, but I'd probably only use it to help build a non-fiction or textbook. A simple novel with almost no illustrations with an extra docx save from the odt converts perfectly automatically with all styles automatically mapping to CSS assuming:
Everything is styles
All pagebreaks start with a heading in the TOC. No explicit pagebreaks added (except for PDF), define Insert Page Break Before on style of any TOC header.
Only one smaller size page style.
No headers, footer or page numbers.
No conventional footnotes or endnotes (there is a limited solution that works)
No lists (all simulated with non-list paragraph styles)
Only three heading levels (usually 1 or 2)
Only free fonts
No columns
No frames
No tabs, only one space ever, no empty paragraphs
Subscripts & superscripts fit with enclosing paragraph line height (possible by style edit in LO Writer)
No line spacing
No formula/maths.
Images anchored as character and in their own paragraph.
If more than one image in a paragraph, same height.
No text in an image paragraph.
Templates are used to start a new document.
Proofing /annotations done on epub on ereader, annotates pasted back into a tabbed text editor such as Notepad++ or KATE configured for text rather than programming. Only edited back to epub if the source is an ebook. If the source is docx or odt, then edited back to an incremented version number of the odt.
If PDF is needed, then a new copy of the document is edited.
If source is a web page or multiple web pages the images and text may have to be copy / pasted separately and text as Unformatted. Depends. I avoid having to do that.
Note that in LO Writer you can search for direct formatting (say italics) by rexex (a . is everything) and defined format and "include styles". Then you can double click on a suitable character style and all the found direct format italics will now be a style.
There is also two kinds of Clear direct format icons.
The built in Search & Replace can't search for character styles (it can S & R a paragraph style to a different one), but go to Search for Extensions and add Altsearch and it can S&R Character Styles.
The important style types are Paragraph (heading is a flavour of that), Character, Graphic and only for paper or PDF, page styles.
You should always have the Outline window and the Styles window open on Word or LO Writer and remove ALL the toolbars except Status.
Last edited by Quoth; 07-22-2025 at 01:05 PM.
|