View Single Post
Old 10-27-2010, 10:11 PM   #2
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,108
Karma: 60406498
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by bfollowell View Post
I have an epub that someone did a totally horrible formatting job on. Rather than have each paragraph defined as, of all things, a paragraph, basically each chapter is a paragragh with breaks and extra line spacing between what should be paragraphs. I've tried to clean it up a little by converting with Calibre but have had no luck at all. Since each chapter is really just one giant paragraph, there is no way to remove the extra spacing between the paragraph groups. They aren't really paragraphs as far as Calibre is concerned so there's nothing to remove.

I'm almost afraid to ask but, is there an easy way to removed all these breaks and clean this up or am I basically going to have to edit this whole 500 page book by hand in order to clean it up?

I'd love to hear any advice anyone may have to offer.

Thanks.

- Byron Followell
Kinda hard to come up with a precise solution just from your description.

Are you looking at the file in Code View?
Are these line spaces: <br /> You could try replacing each one with a </p> <p> <- Note you are ending the previous and starting another.

Also note that a replace All may get the few legitimate "Breaks" in the overhaul.

Second pass, then you need to decide what to do with multiple "empty Paragraphs (<p></p>) that are left.

Other than that, you might need to craft a regex that "finds the uniqueness" of those breaks.
For each successful "cleaning pass", do a save. Abort (reload without save) if you get undesired results.
Clean in little fixes and multiple passes instead of trying for "just one Replace"


BTW:
Have you tried saving the file as a Text (no HTML, Looks as it would print), then have Calibre convert using the best choice:
1) treat multiple blank lines as para markers.
2)removing extra blank lines and treat indents as Para markers
theducks is online now   Reply With Quote