Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 10-11-2009, 04:58 PM   #1
p3aul
Captain Courageous
p3aul doesn't litterp3aul doesn't litter
 
p3aul's Avatar
 
Posts: 235
Karma: 102
Join Date: Apr 2009
Device: calibre, PRS 505
Exclamation books with cr in the middle of a sentence

What do you do when you have a book in text format that has linebreaks and/or carriage returns in the middle of a sentence? how do you repair this either before or during conversion to an epub file?
TIA
Paul
p3aul is offline   Reply With Quote
Old 10-11-2009, 06:11 PM   #2
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,439
Karma: 950001
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Can you give a little more info on the structure of the text.

Is is:

Code:
this is the first
paragraph that is on two lines.

This is a second one.
or

Code:
This is the first
one that is spit.
This is the second.
or

Code:
    This is the first
one that is spit.
    This is the second.
user_none is offline   Reply With Quote
 
Enthusiast
Old 10-11-2009, 11:38 PM   #3
p3aul
Captain Courageous
p3aul doesn't litterp3aul doesn't litter
 
p3aul's Avatar
 
Posts: 235
Karma: 102
Join Date: Apr 2009
Device: calibre, PRS 505
It's the first one. Also I have files with like this:

Quote:
Before he could tell her to wait, to write out the words, she put the
car in gear and drove to the dirt road, rocking slowly over the uneven
field. The car made an abrupt turn and disappeared behind a row of
trees. The brake lights flashed once and then she was gone.

He trudged back to the bloody Nissan. Here, he smudged all the
fingerprints but his own and then rearranged the bloody knife, the guns,
and the two bodies until the crime scene told a credible, if dishonest,
story.
Now this is wierd! I cut and pasted a sample in forum editor. it looked like this:

Before he could tell her to wait, to write out the words, she put the
car in gear and drove to the dirt road, rocking slowly over the uneven
field. The car made an abrupt turn and disappeared behind a row of
trees. The brake lights flashed once and then she was gone.

He trudged back to the bloody Nissan. Here, he smudged all the
fingerprints but his own and then rearranged the bloody knife, the guns,
and the two bodies until the crime scene told a credible, if dishonest,
story.
Well no it didn't either! That code in the editor without the bbcode was supposed to look like my problem text, but when it got posted it came out allright. But it *won't* come out right When I convert it!
Only there was a blank line between lines of text. I put the pasted lines between quote and /quote (bbcode) and they came out correct! Something is going on with the character encoding I think. The text is garbled like that in both notepad and Word 2003. What's the solution?

Thanks,
Paul

Last edited by p3aul; 10-11-2009 at 11:40 PM.
p3aul is offline   Reply With Quote
Old 10-12-2009, 12:14 AM   #4
nomesque
Snooty Bestselling Author
nomesque ought to be getting tired of karma fortunes by now.nomesque ought to be getting tired of karma fortunes by now.nomesque ought to be getting tired of karma fortunes by now.nomesque ought to be getting tired of karma fortunes by now.nomesque ought to be getting tired of karma fortunes by now.nomesque ought to be getting tired of karma fortunes by now.nomesque ought to be getting tired of karma fortunes by now.nomesque ought to be getting tired of karma fortunes by now.nomesque ought to be getting tired of karma fortunes by now.nomesque ought to be getting tired of karma fortunes by now.nomesque ought to be getting tired of karma fortunes by now.
 
nomesque's Avatar
 
Posts: 1,486
Karma: 1000000
Join Date: Aug 2009
Location: Ipswich, QLD, Australia
Device: PRS-650
Quote:
Originally Posted by p3aul View Post
Only there was a blank line between lines of text. I put the pasted lines between quote and /quote (bbcode) and they came out correct! Something is going on with the character encoding I think. The text is garbled like that in both notepad and Word 2003. What's the solution?
If you turn on Show Formatting Marks (click on the back-to-front P next to the zoom percentage list box, or go to Tools -> Options, click on View tab, tick All under the Formatting Marks), what sort of marks is it showing you? Paragraphs (back-to-front P) or line-breaks (right-angled arrow)?
nomesque is offline   Reply With Quote
Old 10-12-2009, 02:48 AM   #5
p3aul
Captain Courageous
p3aul doesn't litterp3aul doesn't litter
 
p3aul's Avatar
 
Posts: 235
Karma: 102
Join Date: Apr 2009
Device: calibre, PRS 505
Quote:
f you turn on Show Formatting Marks (click on the back-to-front P next to the zoom percentage list box, or go to Tools -> Options, click on View tab, tick All under the Formatting Marks), what sort of marks is it showing you? Paragraphs (back-to-front P) or line-breaks (right-angled arrow)?
Paragraphs.

And here is an odd thing when I do a find and replace. and choose paragraph character( the backward p) I get this combo of symols "^|" and then when I click ok it searches the document and can't find anything. You would think it would show the backwards P in the find window.

I though if I did a find and replace and replace the multiple Para symbols with just one it would work but it doesn't.
p3aul is offline   Reply With Quote
Old 10-12-2009, 02:59 AM   #6
nomesque
Snooty Bestselling Author
nomesque ought to be getting tired of karma fortunes by now.nomesque ought to be getting tired of karma fortunes by now.nomesque ought to be getting tired of karma fortunes by now.nomesque ought to be getting tired of karma fortunes by now.nomesque ought to be getting tired of karma fortunes by now.nomesque ought to be getting tired of karma fortunes by now.nomesque ought to be getting tired of karma fortunes by now.nomesque ought to be getting tired of karma fortunes by now.nomesque ought to be getting tired of karma fortunes by now.nomesque ought to be getting tired of karma fortunes by now.nomesque ought to be getting tired of karma fortunes by now.
 
nomesque's Avatar
 
Posts: 1,486
Karma: 1000000
Join Date: Aug 2009
Location: Ipswich, QLD, Australia
Device: PRS-650
Quote:
Originally Posted by p3aul View Post
Paragraphs.

And here is an odd thing when I do a find and replace. and choose paragraph character( the backward p) I get this combo of symols "^|" and then when I click ok it searches the document and can't find anything. You would think it would show the backwards P in the find window.

I though if I did a find and replace and replace the multiple Para symbols with just one it would work but it doesn't.
Try doing a find on ^p instead.

But warning - if you just replace all ^p with (space) you'll end up with chaos.

Save under a different filename in case it all goes belly-up.

If doing a find on ^p works, do a find/replace, finding ^p^p and replacing ^m (page break - temporary). Then do another find/replace, finding ^p and replacing with (space char). Then yet another find/replace to find ^m and replace with ^p or ^p^p.
nomesque is offline   Reply With Quote
Old 10-12-2009, 06:18 AM   #7
tbergman
Enthusiast
tbergman knows the square root of minus one.tbergman knows the square root of minus one.tbergman knows the square root of minus one.tbergman knows the square root of minus one.tbergman knows the square root of minus one.tbergman knows the square root of minus one.tbergman knows the square root of minus one.tbergman knows the square root of minus one.tbergman knows the square root of minus one.tbergman knows the square root of minus one.tbergman knows the square root of minus one.
 
Posts: 45
Karma: 7642
Join Date: Jul 2009
Device: Kindle
When repairing files like this, I've found a technique that works fairly well. This technique depend on the theory that a proper paragraph will always end with either a period, a question mark, or a quote.

For the purposes of this discussion, I'm illustrating a paragraph mark as ^p. Of course the file may have carriage returns or line breaks, you'll need to know which to do the repair.

With that logic in mind, and using your favorite editor, replace all instances of .^p with some marker, I usually use [.] Do the same with "^p and ?^p.
Now replace all the ^p left in the doc with either a space or with nothing. The choice depends on if you need a space when the lines get joined.
Now just replace [.] with .^p , ["] with "^p etc.

While you sometimes get an extra paragraph due to the fact that some lines may have ended with a period and were not originally the end of a paragraph, this technique make the document quite readable in my experience.

Tom
tbergman is offline   Reply With Quote
Old 10-12-2009, 01:18 PM   #8
JMikeD
Evangelist
JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.
 
JMikeD's Avatar
 
Posts: 452
Karma: 15000
Join Date: Jul 2008
Device: Various and sundry
That's what I do, also. Except I use something like %%% or &&& instead of a period.

I've never figured out how a conversion can end up with many broken lines like that when the source file doesn't. Very strange.
JMikeD is offline   Reply With Quote
Old 10-12-2009, 01:21 PM   #9
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 63,494
Karma: 41548799
Join Date: Nov 2006
Location: UK
Device: PW2, iPad Retina Mini, iPhone 4, MS Surface Pro, Onyx T68, N7,
In this post I uploaded an extremely useful freeware tool (for Windows) called "textify", which is excellent for handling files of this type. It will stitch together lines to form paragraphs, indent them to your specification, and much more. It's what I use as the first stage in book creation for any book that starts out life as a text file.
HarryT is offline   Reply With Quote
Old 10-12-2009, 01:26 PM   #10
p3aul
Captain Courageous
p3aul doesn't litterp3aul doesn't litter
 
p3aul's Avatar
 
Posts: 235
Karma: 102
Join Date: Apr 2009
Device: calibre, PRS 505
Thanks, to both of you! The first procedure did the trick, but I am saving both procedures on my harddrive to use in the future. An old man can't always remember what he read or types. An old man can't always remember what he reads or types.
Paul
p3aul is offline   Reply With Quote
Old 10-12-2009, 01:34 PM   #11
p3aul
Captain Courageous
p3aul doesn't litterp3aul doesn't litter
 
p3aul's Avatar
 
Posts: 235
Karma: 102
Join Date: Apr 2009
Device: calibre, PRS 505
When I posted that last msg I hadn't realized that when I was trying out the first solution others had posted their solution also. Many thanks, and like I said I will save all these to my harddrive. Ever since i installed Calibre I've been pulling my hair out over these garbled text files!
p3aul is offline   Reply With Quote
Old 10-13-2009, 07:23 AM   #12
mrmikel
Book Twiddler
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,032
Karma: 1424487
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
Thanks, Harry

Thanks for the pointer to textify which makes it as simple as answering a few questions.

In Vista and perhaps in Windows7, it is best to install this in a separate document directory or drive if you have one. Since it works from the command line, it is easiest to have the text document you want to convert in the same directory as textify. This makes things kind of gummed up if you install it in Program Files.

If it is installed in Program Files, it does not store an output html document there, but under your document directory which can make it hard to find.

When I moved it to my document drive, it just created the html right there in the same directory as textify and as the original document.

The program worked well for me and I think I will be able to trust it a bit more than the gutenberg prettifier which seemed to create duplicate paragraphs from time to time.
mrmikel is offline   Reply With Quote
Old 10-13-2009, 08:18 AM   #13
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 63,494
Karma: 41548799
Join Date: Nov 2006
Location: UK
Device: PW2, iPad Retina Mini, iPhone 4, MS Surface Pro, Onyx T68, N7,
I have a "Tools" directory for command-line tools, which I add to the path, so it's available from anywhere, then I just start a command prompt in whatever folder the book file is in.
HarryT is offline   Reply With Quote
Old 10-13-2009, 03:04 PM   #14
p3aul
Captain Courageous
p3aul doesn't litterp3aul doesn't litter
 
p3aul's Avatar
 
Posts: 235
Karma: 102
Join Date: Apr 2009
Device: calibre, PRS 505
I just added it my path statement. now it works from any directory(folder). I have XP I don't know if that would work for Vista and 7
Paul
p3aul is offline   Reply With Quote
Old 06-19-2010, 05:09 PM   #15
rschlack
Junior Member
rschlack began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Jun 2010
Device: Sony Reader pocket edition
I tried to do this in Word on OSX and it kept crashing. Such a bad product. For the more techie crowd, here is a perl script that will do the same thing.

my $file = shift;
open (FILE,$file);

while (<FILE>) {
chomp;
my $line = $_;
my $lc = substr($line,length($line)-1);

if (length($line) > 1) {
print $line;

if ($lc eq '.' || $lc eq '?' || $lc eq '"') {
print "\n\n";
} else {
print " ";
}
}

}
rschlack is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Each sentence treated as a paragraph in new versions of calibre jidiot008 Calibre 3 11-16-2010 06:00 PM
Hello from the middle of France cazeault Introduce Yourself 6 01-15-2010 12:56 AM
Here's to the Dash! Post a Great Sentence! Lima_dat Writers' Corner 3 04-25-2009 03:41 AM
hello from the middle of the midwest Richard Maseles Introduce Yourself 6 01-08-2009 12:30 PM
On ideal paragraph and sentence lengths for e-books Colin Dunstan News 6 01-13-2006 06:17 AM


All times are GMT -4. The time now is 09:18 PM.


MobileRead.com is a privately owned, operated and funded community.