Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 12-22-2012, 06:12 AM   #1
bizzybody
Addict
bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.
 
Posts: 286
Karma: 7742186
Join Date: Apr 2007
Location: Idaho, USA
Device: Various PalmOS PDAs, Android Phones, Sharper Image Literati
Fixing a document with too many carriage returns?

I'm wanting to convert from a PDF that was written by someone who began writing long before personal computers, possibly even before computers numbered more than a few dozen in the world...

The problem is the source document is formatted just like if it was written on a typewriter with double spaced lines and only indents for paragraphs. Somewhat randomly there's an additional blank line between paragraphs.

What happens when I convert it is every single line gets made into its own paragraph so what comes out is a string of sentence fragments with blank lines between, which on my phone screen are each about 1.5 lines.

Is there any way to even partially automatically fix this or am I stuck scrolling through it, manually deleting every extraneous carriage return and replacing it with a space?

Some "silver citizen" authors take to computers like a duck to water, some try to treat them like extra fancy electric typewriters, using fixed line lengths, double spaces after punctuation and all the other manual formatting one had to do with ink smacked onto paper. Much easier to simply write the paragraphs and do nothing special other than a single blank line between them. Let the software handle all the formatting and flow the text.

I cut my computer writing teeth on WordStar on a Xerox 820-II CP/M computer. It was like an extra fancy electric typewriter! Took me a while to get used to the more "free flowing" capability of word mangling software for Windows and stop doing things like hitting Enter at the end of every line in e-mails.
bizzybody is offline   Reply With Quote
Old 12-22-2012, 06:52 AM   #2
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
Well, most part is relatively easy. In what program do you want to edit? In general you can say that if a line ends in a lower case letter or starts with a lower case letter, than the paragraph break is incorrect (not always true, but almost always). You can do that with a RegEx. The correct syntax really depends on the source document and the editing program.
If a new paragraph always starts with an indent, I think I can think of a S&R command to solve the remainder of the lines. Again, not a 100% garantuee, really depending on the programs you use.

If you tell us which program you use, we might be able to help you.
Toxaris is offline   Reply With Quote
Advert
Old 12-22-2012, 07:43 AM   #3
bizzybody
Addict
bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.
 
Posts: 286
Karma: 7742186
Join Date: Apr 2007
Location: Idaho, USA
Device: Various PalmOS PDAs, Android Phones, Sharper Image Literati
I have Word 2003 on XP Pro.
bizzybody is offline   Reply With Quote
Old 12-22-2012, 08:17 AM   #4
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
Word 2003. Hmm, ok. I cannot garantuee it will work then, since I don't use that version anymore for quite some time.

Try the following S&R's (activate wildcards!):
Search for: ([a-z])^13
Replace by ((there is a space after the 1): \1

Search for: ^13([a-z])
Replace by ((there is a space before the 1): \1

That should help with the most. Cases of commas and other tpygraphic symbols are not taken into account. It is possible though, but it might need some trial and error. Word is not always predicatable.
For the processing of the lines with an indent a macro would be needed. That is not done by simple search and replace.
Toxaris is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Calibre extra unwanted carriage returns in PCB file conversion gragradownunder Conversion 0 05-12-2011 06:57 AM
Carriage Returns not translating oldbitcollector Sigil 2 04-21-2011 03:20 AM
How to convert a Word document into a Kindle document? PS Kindle Kindle Developer's Corner 2 12-08-2009 08:40 PM
Removing excess carriage returns Halk Calibre 5 05-17-2009 02:35 PM
Forcing carriage returns KindleHog Amazon Kindle 3 05-01-2009 01:14 PM


All times are GMT -4. The time now is 03:51 AM.


MobileRead.com is a privately owned, operated and funded community.