![]() |
#1 |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 191
Karma: 1341124
Join Date: Aug 2010
Device: Kindle 3
|
Broken Paragraphs in MS Word
Sorry, it seems VB stuff is unwanted.
Last edited by LordP; 07-15-2025 at 01:16 PM. |
![]() |
![]() |
![]() |
#2 |
Still reading
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 14,073
Karma: 105206895
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
Define "Broken paragraph"?
For over 20 years I and others have disabled VB in MS Office for security. Can't this be done with search or search and replace? |
![]() |
![]() |
![]() |
#3 | |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 191
Karma: 1341124
Join Date: Aug 2010
Device: Kindle 3
|
Quote:
A text file that looks like this: This ebook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this ebook or online at www.gutenberg.org. If you are not located in the United States, you will have to check the laws of the country where you are located before using this eBook. |
|
![]() |
![]() |
![]() |
#4 |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 191
Karma: 1341124
Join Date: Aug 2010
Device: Kindle 3
|
|
![]() |
![]() |
![]() |
#5 |
Still reading
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 14,073
Karma: 105206895
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
That text file can be fixed with a regex search and replace. Doesn't need VBA. I've been fixing that for years in Word and now in LO Writer.
I've used word-processing for manuals, project proposals, training materials and novels. MS Word for Windows since about Office 4.3. I've never enabled or needed to use VBA in Word and about twice in Excel in maybe over 25 years (stuff that should have been a program). I used to teach word-processing and DTP as well as programming (real VB6, C++, Modula-2, Forth, SQL etc). Edit: If it's a downloaded epub it's also easily fixed in Calibre without export as a docx and re-conversion back to epub. Last edited by Quoth; 07-15-2025 at 02:19 PM. |
![]() |
![]() |
![]() |
#6 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,625
Karma: 3120635
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
|
![]() |
![]() |
![]() |
#7 |
Still reading
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 14,073
Karma: 105206895
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
If it's an ebook it's easiest to fix in Calibre.
I'm no expert on regex, so I decline to look like an idiot and post the idiotic regexes I use in LO Writer. I replace all tabs with a space. I replace multiple spaces with a space. I replace a space at the start of of a paragraph with nothing. I replace a space at the end of of a paragraph with nothing. I replace empty paragraphs with nothing. I have regexes to find illegal (in English, French is different) space with punctuation. I have a spreadsheet with list of docs on first column and headings on other columns are regexes to copy/paste. Then I put a checkmark in the column. Other sheets have revision level, status, etc. Only paragraphs that are headings or certain lists (e.g. contents) should end with no punctuation. Those can be found with a regex. However if it's a downloaded ebook rather than a source docx/odt, then regex and daip's toolbag (global change HTML tags etc) in Calibre is best. |
![]() |
![]() |
![]() |
#8 | |
A Hairy Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,355
Karma: 20171571
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 15/11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
Quote:
Nothing beats a workflow that you already have setup! However, if you haven't set it up in Calibre yet, you can use Sigil's "Mend and Prettify" function to do the first 4 (not positive about the tab??) of those steps with a click of the mouse. Then you can add as many more regexes as you wish to a Saved Searches group to do them all with another click. The only extra step is having to save your document as an html file before importing to Sigil...but you have to do that anyway to make an ePub. |
|
![]() |
![]() |
![]() |
#9 |
Still reading
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 14,073
Karma: 105206895
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
Probably can do it all in Sigil (and a Tab is a unique character that can be found by Search & Replace and should never be in a novel).
Obviously Sigil is great tool for creating complex ebooks, but if you are downloading / importing finished ebooks you didn't make, then Calibre is a more likely starting place. I'd never use Calibre to fix odt/docx/RTF source etc, only content already ebooks (which IMO excludes PDF as that is format not intended to be edited or reflowed as it's designed for WYSIWYG publishing / print). Also I have two alternates of conversion filters for all ebooks I add to Calibre: 1) Automatically remove all white-space CSS and line-height CSS (Commercial ebooks) 2) If it's from Gutenberg; the above and remove space between paragraphs, set 1.3em indent smart punctuation, force full justification (only affects body). I convert even epub to epub. If I need non-epub, I convert from known "correct" epub. I create PDF from LO Writer after fixing styles, fonts, headings, header, footer, page styles (front matter, body, back matter and others), footnotes, page numbers, endnotes etc. I only add PDFs to Calibre that are obtained as PDFs and work on 8″ Sage (instruction booklets etc) and some picture books for tablet. Most PDFs are in directory structures and are non-fiction. If I open an epub on the reader and it's an imported (Gutenberg, other PD, commercial etc) one that's "too awkward" to read I'd fix it in Calibre. I'd very rarely ever export an ebook to docx and edit, and only if a larger omnibus PD title that excessively badly proofed and formatted. Maybe three times in 15 years. Saved Searches is a nice feature. I have used Sigil, but I'd probably only use it to help build a non-fiction or textbook. A simple novel with almost no illustrations with an extra docx save from the odt converts perfectly automatically with all styles automatically mapping to CSS assuming: Everything is styles All pagebreaks start with a heading in the TOC. No explicit pagebreaks added (except for PDF), define Insert Page Break Before on style of any TOC header. Only one smaller size page style. No headers, footer or page numbers. No conventional footnotes or endnotes (there is a limited solution that works) No lists (all simulated with non-list paragraph styles) Only three heading levels (usually 1 or 2) Only free fonts No columns No frames No tabs, only one space ever, no empty paragraphs Subscripts & superscripts fit with enclosing paragraph line height (possible by style edit in LO Writer) No line spacing No formula/maths. Images anchored as character and in their own paragraph. If more than one image in a paragraph, same height. No text in an image paragraph. Templates are used to start a new document. Proofing /annotations done on epub on ereader, annotates pasted back into a tabbed text editor such as Notepad++ or KATE configured for text rather than programming. Only edited back to epub if the source is an ebook. If the source is docx or odt, then edited back to an incremented version number of the odt. If PDF is needed, then a new copy of the document is edited. If source is a web page or multiple web pages the images and text may have to be copy / pasted separately and text as Unformatted. Depends. I avoid having to do that. Note that in LO Writer you can search for direct formatting (say italics) by rexex (a . is everything) and defined format and "include styles". Then you can double click on a suitable character style and all the found direct format italics will now be a style. There is also two kinds of Clear direct format icons. The built in Search & Replace can't search for character styles (it can S & R a paragraph style to a different one), but go to Search for Extensions and add Altsearch and it can S&R Character Styles. The important style types are Paragraph (heading is a flavour of that), Character, Graphic and only for paper or PDF, page styles. You should always have the Outline window and the Styles window open on Word or LO Writer and remove ALL the toolbars except Status. Last edited by Quoth; Yesterday at 01:05 PM. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Dealing with bad formatting: "broken" lines inside paragraphs? | MelBr | Calibre | 5 | 08-26-2013 12:10 AM |
Word Wrap and Creating Line breaks for paragraphs? | mjt57 | Calibre | 3 | 01-03-2011 10:14 PM |
How to join broken paragraphs? | purcelljf | Workshop | 8 | 08-19-2010 03:21 PM |
Handling Broken Paragraphs | crutledge | Sigil | 14 | 06-21-2010 07:41 PM |
Word broken in PDF | rfog | HanLin eBook | 2 | 09-30-2008 01:16 PM |