![]() |
#1 |
Captain Courageous
![]() ![]() Posts: 239
Karma: 102
Join Date: Apr 2009
Device: calibre, PRS 505
|
![]()
What do you do when you have a book in text format that has linebreaks and/or carriage returns in the middle of a sentence? how do you repair this either before or during conversion to an epub file?
TIA Paul |
![]() |
![]() |
![]() |
#2 |
Sigil & calibre developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
Can you give a little more info on the structure of the text.
Is is: Code:
this is the first paragraph that is on two lines. This is a second one. Code:
This is the first one that is spit. This is the second. Code:
This is the first one that is spit. This is the second. |
![]() |
![]() |
![]() |
#3 | |
Captain Courageous
![]() ![]() Posts: 239
Karma: 102
Join Date: Apr 2009
Device: calibre, PRS 505
|
It's the first one. Also I have files with like this:
Quote:
Before he could tell her to wait, to write out the words, she put the car in gear and drove to the dirt road, rocking slowly over the uneven field. The car made an abrupt turn and disappeared behind a row of trees. The brake lights flashed once and then she was gone. He trudged back to the bloody Nissan. Here, he smudged all the fingerprints but his own and then rearranged the bloody knife, the guns, and the two bodies until the crime scene told a credible, if dishonest, story. Well no it didn't either! That code in the editor without the bbcode was supposed to look like my problem text, but when it got posted it came out allright. But it *won't* come out right When I convert it! Only there was a blank line between lines of text. I put the pasted lines between quote and /quote (bbcode) and they came out correct! Something is going on with the character encoding I think. The text is garbled like that in both notepad and Word 2003. What's the solution? Thanks, Paul Last edited by p3aul; 10-11-2009 at 11:40 PM. |
|
![]() |
![]() |
![]() |
#4 | |
Snooty Bestselling Author
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,485
Karma: 1000000
Join Date: Aug 2009
Location: Ipswich, QLD, Australia
Device: PRS-650
|
Quote:
|
|
![]() |
![]() |
![]() |
#5 | |
Captain Courageous
![]() ![]() Posts: 239
Karma: 102
Join Date: Apr 2009
Device: calibre, PRS 505
|
Quote:
And here is an odd thing when I do a find and replace. and choose paragraph character( the backward p) I get this combo of symols "^|" and then when I click ok it searches the document and can't find anything. You would think it would show the backwards P in the find window. I though if I did a find and replace and replace the multiple Para symbols with just one it would work but it doesn't. |
|
![]() |
![]() |
![]() |
#6 | |
Snooty Bestselling Author
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,485
Karma: 1000000
Join Date: Aug 2009
Location: Ipswich, QLD, Australia
Device: PRS-650
|
Quote:
But warning - if you just replace all ^p with (space) you'll end up with chaos. Save under a different filename in case it all goes belly-up. If doing a find on ^p works, do a find/replace, finding ^p^p and replacing ^m (page break - temporary). Then do another find/replace, finding ^p and replacing with (space char). Then yet another find/replace to find ^m and replace with ^p or ^p^p. |
|
![]() |
![]() |
![]() |
#7 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 59
Karma: 7642
Join Date: Jul 2009
Device: Kindle
|
When repairing files like this, I've found a technique that works fairly well. This technique depend on the theory that a proper paragraph will always end with either a period, a question mark, or a quote.
For the purposes of this discussion, I'm illustrating a paragraph mark as ^p. Of course the file may have carriage returns or line breaks, you'll need to know which to do the repair. With that logic in mind, and using your favorite editor, replace all instances of .^p with some marker, I usually use [.] Do the same with "^p and ?^p. Now replace all the ^p left in the doc with either a space or with nothing. The choice depends on if you need a space when the lines get joined. Now just replace [.] with .^p , ["] with "^p etc. While you sometimes get an extra paragraph due to the fact that some lines may have ended with a period and were not originally the end of a paragraph, this technique make the document quite readable in my experience. Tom |
![]() |
![]() |
![]() |
#8 |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 475
Karma: 15000
Join Date: Jul 2008
Device: Various and sundry
|
That's what I do, also. Except I use something like %%% or &&& instead of a period.
I've never figured out how a conversion can end up with many broken lines like that when the source file doesn't. Very strange. |
![]() |
![]() |
![]() |
#9 |
eBook Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 85,544
Karma: 93383099
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
In this post I uploaded an extremely useful freeware tool (for Windows) called "textify", which is excellent for handling files of this type. It will stitch together lines to form paragraphs, indent them to your specification, and much more. It's what I use as the first stage in book creation for any book that starts out life as a text file.
|
![]() |
![]() |
![]() |
#10 |
Captain Courageous
![]() ![]() Posts: 239
Karma: 102
Join Date: Apr 2009
Device: calibre, PRS 505
|
Thanks, to both of you! The first procedure did the trick, but I am saving both procedures on my harddrive to use in the future. An old man can't always remember what he read or types. An old man can't always remember what he reads or types.
![]() Paul |
![]() |
![]() |
![]() |
#11 |
Captain Courageous
![]() ![]() Posts: 239
Karma: 102
Join Date: Apr 2009
Device: calibre, PRS 505
|
When I posted that last msg I hadn't realized that when I was trying out the first solution others had posted their solution also. Many thanks, and like I said I will save all these to my harddrive. Ever since i installed Calibre I've been pulling my hair out over these garbled text files!
![]() |
![]() |
![]() |
![]() |
#12 |
Color me gone
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
|
Thanks, Harry
Thanks for the pointer to textify which makes it as simple as answering a few questions.
In Vista and perhaps in Windows7, it is best to install this in a separate document directory or drive if you have one. Since it works from the command line, it is easiest to have the text document you want to convert in the same directory as textify. This makes things kind of gummed up if you install it in Program Files. If it is installed in Program Files, it does not store an output html document there, but under your document directory which can make it hard to find. When I moved it to my document drive, it just created the html right there in the same directory as textify and as the original document. The program worked well for me and I think I will be able to trust it a bit more than the gutenberg prettifier which seemed to create duplicate paragraphs from time to time. |
![]() |
![]() |
![]() |
#13 |
eBook Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 85,544
Karma: 93383099
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
I have a "Tools" directory for command-line tools, which I add to the path, so it's available from anywhere, then I just start a command prompt in whatever folder the book file is in.
|
![]() |
![]() |
![]() |
#14 |
Captain Courageous
![]() ![]() Posts: 239
Karma: 102
Join Date: Apr 2009
Device: calibre, PRS 505
|
I just added it my path statement. now it works from any directory(folder). I have XP I don't know if that would work for Vista and 7
Paul |
![]() |
![]() |
![]() |
#15 |
Junior Member
![]() Posts: 4
Karma: 10
Join Date: Jun 2010
Device: Sony Reader pocket edition
|
I tried to do this in Word on OSX and it kept crashing. Such a bad product. For the more techie crowd, here is a perl script that will do the same thing.
my $file = shift; open (FILE,$file); while (<FILE>) { chomp; my $line = $_; my $lc = substr($line,length($line)-1); if (length($line) > 1) { print $line; if ($lc eq '.' || $lc eq '?' || $lc eq '"') { print "\n\n"; } else { print " "; } } } |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Each sentence treated as a paragraph in new versions of calibre | jidiot008 | Calibre | 3 | 11-16-2010 06:00 PM |
Hello from the middle of France | cazeault | Introduce Yourself | 6 | 01-15-2010 12:56 AM |
Here's to the Dash! Post a Great Sentence! | Lima_dat | Writers' Corner | 3 | 04-25-2009 03:41 AM |
hello from the middle of the midwest | Richard Maseles | Introduce Yourself | 6 | 01-08-2009 12:30 PM |
On ideal paragraph and sentence lengths for e-books | Colin Dunstan | News | 6 | 01-13-2006 06:17 AM |