12-22-2012, 06:12 AM | #1 |
Addict
Posts: 286
Karma: 7742186
Join Date: Apr 2007
Location: Idaho, USA
Device: Various PalmOS PDAs, Android Phones, Sharper Image Literati
|
Fixing a document with too many carriage returns?
I'm wanting to convert from a PDF that was written by someone who began writing long before personal computers, possibly even before computers numbered more than a few dozen in the world...
The problem is the source document is formatted just like if it was written on a typewriter with double spaced lines and only indents for paragraphs. Somewhat randomly there's an additional blank line between paragraphs. What happens when I convert it is every single line gets made into its own paragraph so what comes out is a string of sentence fragments with blank lines between, which on my phone screen are each about 1.5 lines. Is there any way to even partially automatically fix this or am I stuck scrolling through it, manually deleting every extraneous carriage return and replacing it with a space? Some "silver citizen" authors take to computers like a duck to water, some try to treat them like extra fancy electric typewriters, using fixed line lengths, double spaces after punctuation and all the other manual formatting one had to do with ink smacked onto paper. Much easier to simply write the paragraphs and do nothing special other than a single blank line between them. Let the software handle all the formatting and flow the text. I cut my computer writing teeth on WordStar on a Xerox 820-II CP/M computer. It was like an extra fancy electric typewriter! Took me a while to get used to the more "free flowing" capability of word mangling software for Windows and stop doing things like hitting Enter at the end of every line in e-mails. |
12-22-2012, 06:52 AM | #2 |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
Well, most part is relatively easy. In what program do you want to edit? In general you can say that if a line ends in a lower case letter or starts with a lower case letter, than the paragraph break is incorrect (not always true, but almost always). You can do that with a RegEx. The correct syntax really depends on the source document and the editing program.
If a new paragraph always starts with an indent, I think I can think of a S&R command to solve the remainder of the lines. Again, not a 100% garantuee, really depending on the programs you use. If you tell us which program you use, we might be able to help you. |
Advert | |
|
12-22-2012, 07:43 AM | #3 |
Addict
Posts: 286
Karma: 7742186
Join Date: Apr 2007
Location: Idaho, USA
Device: Various PalmOS PDAs, Android Phones, Sharper Image Literati
|
I have Word 2003 on XP Pro.
|
12-22-2012, 08:17 AM | #4 |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
Word 2003. Hmm, ok. I cannot garantuee it will work then, since I don't use that version anymore for quite some time.
Try the following S&R's (activate wildcards!): Search for: ([a-z])^13 Replace by ((there is a space after the 1): \1 Search for: ^13([a-z]) Replace by ((there is a space before the 1): \1 That should help with the most. Cases of commas and other tpygraphic symbols are not taken into account. It is possible though, but it might need some trial and error. Word is not always predicatable. For the processing of the lines with an indent a macro would be needed. That is not done by simple search and replace. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Calibre extra unwanted carriage returns in PCB file conversion | gragradownunder | Conversion | 0 | 05-12-2011 06:57 AM |
Carriage Returns not translating | oldbitcollector | Sigil | 2 | 04-21-2011 03:20 AM |
How to convert a Word document into a Kindle document? | PS Kindle | Kindle Developer's Corner | 2 | 12-08-2009 08:40 PM |
Removing excess carriage returns | Halk | Calibre | 5 | 05-17-2009 02:35 PM |
Forcing carriage returns | KindleHog | Amazon Kindle | 3 | 05-01-2009 01:14 PM |