Quote:
Originally Posted by conan50
If possible without removing all formatting.
|
Could be possible, could be possible.
Please contact me via PM and send me your file.
(Upload to Google Drive or other filesharing site and I could take a look.)
Quote:
Originally Posted by conan50
I have a fantasy novel that I've been working on for ages, around 500 pages long. I think I was using Word 97 when I first started writing it. Eventually I'm going to want to convert it to an ebook, but before I get there I'm trying to figure out the best way to clean up the entire file.
|
Styles.
For more info on the why/how, see three of my posts in "eBook Formatting in Sigil":
Post #48
Post #50
Post #52
especially the 2 videos I linked in
"MS Word vs Open Office Word" (Post #3).
Quote:
Originally Posted by conan50
Editing is slow, as in the file is unwieldy compared to files of similar size, which makes me think there is excess garbage in the code from numerous conversions over the years from doc to odt to docx.
My question, what is the best way to clean up a large docx file that likely is loaded with conversion artifacts?
|
Yep, definitely sounds like a bunch of cruft built up and hidden in the background.
... Especially if you've been using the same file for over 20 years + saved in various different programs/formats. Who knows what crap crept in.
In your case, it may be best to export to a super clean/minimalist document, then reimport back so that you're starting from a proper foundation.
Once you have that fantastic base, everything else becomes better.
Quote:
Originally Posted by conan50
Thanks folks! Sounds like it is going to be a big job to clean it up. Pretty much what I figured.
|
Generating clean documents has never been easier.
There are tools (like
Toxaris's EPUBTools) to generate super clean ebooks from your Word files.
You still might have to put in a little elbow grease to get some of the more complicated formatting back (like blockquotes/poetry/footnotes), but the vast bulk of the conversion can be converted super cleanly.
Quote:
Originally Posted by retiredbiker
The traditional "dynamite" approach to fix this is to copy all the text into a plain text editor to remove ALL the formatting.
[...]
Unfortunately, this will blow away italics, and that can be a real pain to put back in if they are used a lot. I found a work-around for this that I used on a book or two with good results. It's a nasty thing, but for what it's worth, I'll add it. I used Writer, but something similar could be done with Word:
[...]
|
I referenced this in passing just 2 days ago.
There's a much easier way of doing this using Word's/LibreOffice's Advanced Find and Replace.
No need to introduce their disgusting HTML exports/imports.
Instead, you use the word processor's italics formatting within Advanced Search, then replace with your own "markdown".
Last year, I wrote step-by-step instructions for:
The 1st one went from:
Code:
This is italics and more italics too.
to:
Code:
This is \emph{italics} and \emph{more italics too}.
and the 2nd one went from:
Code:
This is <i>italics</i> and <i>more italics too</i>.
to:
Code:
This is italics and more italics too.
Microsoft Word and LibreOffice have slightly different buttons, but the concept is all the same.