Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 02-24-2018, 02:45 AM   #1
nopi1001
Junior Member
nopi1001 began at the beginning.
 
nopi1001's Avatar
 
Posts: 4
Karma: 48
Join Date: Feb 2018
Location: SC
Device: kindle paperwhite
Fixing breaks in dialogue

In several my books, there are points where the dialogue is broken with newlines. Below is an HTML code example from an AZW3 formatted book of the thing I'd like to fix:

<p class="calibre1">"Let's just assume that you really do read minds. </p>
<p class="calibre1">What on earth makes you believe that I can do the same? </p>
<p class="calibre1">I think I would have a much easier time at work if I could read my clients' thoughts." </p>

I would like to change it to something like:

<p class="calibre1">"Let's just assume that you really do read minds. What on earth makes you believe that I can do the same? I think I would have a much easier time at work if I could read my clients' thoughts." </p>

Is there any way to find all these broken dialogues within a file with regex/regex-functions and make them appear whole and uninterrupted? I just started using the whole regex thing and haven't been able to find a way to fix problems like this.
nopi1001 is offline   Reply With Quote
Old 02-24-2018, 03:11 AM   #2
sjfan
Addict
sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.
 
Posts: 281
Karma: 7724454
Join Date: Sep 2017
Location: Bethesda, MD, USA
Device: Kobo Aura H20, Kobo Clara HD
Quote:
Originally Posted by nopi1001 View Post
In several my books, there are points where the dialogue is broken with newlines. Below is an HTML code example from an AZW3 formatted book of the thing I'd like to fix:

<p class="calibre1">"Let's just assume that you really do read minds. </p>
<p class="calibre1">What on earth makes you believe that I can do the same? </p>
<p class="calibre1">I think I would have a much easier time at work if I could read my clients' thoughts." </p>

I would like to change it to something like:

<p class="calibre1">"Let's just assume that you really do read minds. What on earth makes you believe that I can do the same? I think I would have a much easier time at work if I could read my clients' thoughts." </p>

Is there any way to find all these broken dialogues within a file with regex/regex-functions and make them appear whole and uninterrupted? I just started using the whole regex thing and haven't been able to find a way to fix problems like this.
1. Smarten the punctuation before you try this; it makes the matching a lot easier, since “ and ” are used in the main text while " straight quotes are used inside tags.
2. Use (?m) for multi-line matches; e.g. search for something like:
(?m)(“[^”])</p>[\n]<p class="calibre1">
and replace with:
\1

Run that replace a few times until it doesn't match any more.

But you probably want to do it by hand (examine each case) unless you're 100% confident all your quotation marks nest properly within each paragraph and the book isn't intentionally using continuing-quotes or the like.
sjfan is offline   Reply With Quote
Advert
Old 02-24-2018, 05:41 AM   #3
nopi1001
Junior Member
nopi1001 began at the beginning.
 
nopi1001's Avatar
 
Posts: 4
Karma: 48
Join Date: Feb 2018
Location: SC
Device: kindle paperwhite
Quote:
Originally Posted by sjfan View Post
1. Smarten the punctuation before you try this; it makes the matching a lot easier, since “ and ” are used in the main text while " straight quotes are used inside tags.
2. Use (?m) for multi-line matches; e.g. search for something like:
(?m)(“[^”])</p>[\n]<p class="calibre1">
and replace with:
\1

Run that replace a few times until it doesn't match any more.

But you probably want to do it by hand (examine each case) unless you're 100% confident all your quotation marks nest properly within each paragraph and the book isn't intentionally using continuing-quotes or the like.


Step one was a great idea that I'll have to remember and the comment on continuing-quotes saved me a lot of work as well.
Although the expression you supplied did not work for me (probably something I was doing wrong/don't understand yet ) the logic behind it allowed me to put together something that worked with my initial example and the rest of the book I was working on (and I'll be able to edit it for use in other books as well)! If anyone is curious of what I cobbled together:

\“.*\”(*SKIP)(*FAIL)|(?<thing>\“.*)</p>\s*<p class="calibre1">

Another question did come up though. In the replace field you said to put a \1. This worked beautifully but if you could explain why/what exactly this (expression?) is doing, I would greatly appreciate it as this was one of the things that was tripping me up (being able to dynamically copy & paste part of something found with desired changes).

This has been stumping me for a while now so thanks again! Now I can add another means of efficiently editing books to my slowly growing repertoire!

Update
Just edited the above expression to account for continuing-quotes:

\“.*\”(*SKIP)(*FAIL)|\“.*</p>\s*<p class="calibre1">\“(*SKIP)(*FAIL)|(?<thing>\“.*)</p>\s*<p class="calibre1">

Last edited by nopi1001; 02-24-2018 at 05:51 AM. Reason: updating
nopi1001 is offline   Reply With Quote
Old 02-24-2018, 05:58 AM   #4
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,550
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
If you can wrangle your books into Word 2007/10/13/16 you could use the Dialogue Checker in Toxaris' excellent e-Book Tools - a Word add-in.

The addin itself can import and export EPUB. Or you could convert to DOCX and then convert the modified DOCX back to EPUB using one of several DOCX->EPUB tools.

BR
BetterRed is online now   Reply With Quote
Old 02-24-2018, 06:50 AM   #5
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,887
Karma: 128597114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by nopi1001 View Post
In several my books, there are points where the dialogue is broken with newlines. Below is an HTML code example from an AZW3 formatted book of the thing I'd like to fix:

<p class="calibre1">"Let's just assume that you really do read minds. </p>
<p class="calibre1">What on earth makes you believe that I can do the same? </p>
<p class="calibre1">I think I would have a much easier time at work if I could read my clients' thoughts." </p>

I would like to change it to something like:

<p class="calibre1">"Let's just assume that you really do read minds. What on earth makes you believe that I can do the same? I think I would have a much easier time at work if I could read my clients' thoughts." </p>

Is there any way to find all these broken dialogues within a file with regex/regex-functions and make them appear whole and uninterrupted? I just started using the whole regex thing and haven't been able to find a way to fix problems like this.
Where did this eBook come from in case we can get a sample to have a look at?

Last edited by DoctorOhh; 02-24-2018 at 07:35 AM. Reason: fixed quote closing tag
JSWolf is offline   Reply With Quote
Advert
Old 02-24-2018, 10:36 AM   #6
deback
Book E d i t o r
deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.
 
Posts: 432
Karma: 288184
Join Date: May 2015
Device: Laptop
Quote:
\“.*\”(*SKIP)(*FAIL)|\“.*</p>\s*<p class="text">\“(*SKIP)(*FAIL)|(?<thing>\“.*)</p>\s*<p class="text">
Thank you very much for this code (changed slightly to match my code for text)! It works great and is something I can use.

I have also been using ctrl-n often to see how many matches there are. Thank you to the poster (in another thread) who posted that!
deback is offline   Reply With Quote
Old 02-24-2018, 12:07 PM   #7
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,887
Karma: 128597114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Please don't help until the OP comes back and can prove that the eBook was bought. I took a look at a sample from Amazon and a sample from Kobo and both had no problem. So I don't think this is a legit copy the OP is talking about.
JSWolf is offline   Reply With Quote
Old 02-24-2018, 02:14 PM   #8
sjfan
Addict
sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.
 
Posts: 281
Karma: 7724454
Join Date: Sep 2017
Location: Bethesda, MD, USA
Device: Kobo Aura H20, Kobo Clara HD
Quote:
Originally Posted by nopi1001 View Post
:
Another question did come up though. In the replace field you said to put a \1. This worked beautifully but if you could explain why/what exactly this (expression?) is doing, I would greatly appreciate it as this was one of the things that was tripping me up (being able to dynamically copy & paste part of something found with desired changes).
In the expression, parenthesis are used to create “capturing subpatterns”. These are automatically numbered, starting with “1”. In the replacement field, backslash followed by a number is replaced with the contents of the corresponding capturing subpattern.

Consider the sentence “I like Saturday and Sunday, but Samuel prefers sandwiches.”

Search for: .*(Sat[^ ]*).*\(san.*\)[.]
Replace with: \2 was last, \1 was first
Results in: “sandwiches was last, Saturday was first”

That's a relatively simple case, there are a lot of other possiblities.

http://www.pcre.org/current/doc/html...ern.html#SEC19
http://www.pcre.org/current/doc/html...ern.html#SEC14
sjfan is offline   Reply With Quote
Old 02-24-2018, 02:33 PM   #9
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by BetterRed View Post
If you can wrangle your books into Word 2007/10/13/16 you could use the Dialogue Checker in Toxaris' excellent e-Book Tools - a Word add-in.
I second this. The Dialogue Check also catches mismatched parenthesis/brackets, and handles quotation marks in other languages. It is extremely thorough, and is what I exclusively use now.

I used to use a lot of hackish Regex, but it would always miss hard cases, especially cases of inner/outer quotes.

If you still want to use Regex though, as sjfan mentioned, the most important step is to first smarten the punctuation. There's not a reliable way you can fix missing quotations with dumb quotes.

For example, this is one I used to use:

Search: (“[^”\r\n]*)</p>\s+<p>
Replace: \1

(There is a space after that "\1 " in Replace.)

That Regex would look for a LEFT double quote in a paragraph without a RIGHT closing quote.

Anyway, there was a lot of quotation mark discussion in previous topics which might also help you:

https://www.mobileread.com/forums/sh...d.php?t=292818
https://www.mobileread.com/forums/sh...d.php?t=212029

Quote:
Originally Posted by JSWolf View Post
Please don't help until the OP comes back and can prove that the eBook was bought. I took a look at a sample from Amazon and a sample from Kobo and both had no problem. So I don't think this is a legit copy the OP is talking about.
This is absurd.

The topic is about fixing breaks in dialogue, which is a very common error across all types of ebooks.
Tex2002ans is offline   Reply With Quote
Old 02-24-2018, 03:05 PM   #10
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,161
Karma: 1404241
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
I really don't understand why people uses PDF files and don't tell it.

And to go forward with this, what did it really help to use a regexp in this case? The only effect is that the user have now an endless flow of words. Also not a really helpful tip. (I know he had ask for it)

The first part of the request from JSWolf wasn't stupid if we do think a second about it...
Maybe it would be more efficient to wait for the answer and then giving a better solution (e.g. how to do a better conversion of a PDF for this little special problem...)

Last edited by Divingduck; 02-24-2018 at 03:07 PM.
Divingduck is offline   Reply With Quote
Old 02-24-2018, 03:21 PM   #11
sjfan
Addict
sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.
 
Posts: 281
Karma: 7724454
Join Date: Sep 2017
Location: Bethesda, MD, USA
Device: Kobo Aura H20, Kobo Clara HD
Quote:
Originally Posted by Divingduck View Post
And to go forward with this, what did it really help to use a regexp in this case? The only effect is that the user have now an endless flow of words.
Huh? The regex solutions outlined only join the current quote, they don't turn everything into a continuous paragraph.
sjfan is offline   Reply With Quote
Old 02-24-2018, 04:08 PM   #12
nopi1001
Junior Member
nopi1001 began at the beginning.
 
nopi1001's Avatar
 
Posts: 4
Karma: 48
Join Date: Feb 2018
Location: SC
Device: kindle paperwhite
Quote:
Originally Posted by JSWolf View Post
Please don't help until the OP comes back and can prove that the eBook was bought. I took a look at a sample from Amazon and a sample from Kobo and both had no problem. So I don't think this is a legit copy the OP is talking about.
Of course the sample you acquired from amazon is not going to show the problem, I crafted it from the HTML code as an example as it was the book I had open at the time (was reformatting to make it easier for me to read). I did not feel the need to open another file and search for the problem there when I knew what it looked like and could make an example quickly. I do understand where you are coming from though as pirating is NOT OK! I generally run into this problem on occasion when editing literature converted from PDFs.

I appreciate everyone's suggestions and am thrilled with all I have learned from this post! Both new sources and new logic to use.
nopi1001 is offline   Reply With Quote
Old 02-24-2018, 04:27 PM   #13
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,887
Karma: 128597114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
If you are converting from PDF, then the solution to that is don't do it. It's not worth the hassle.

The problem is that if you have split dialog, you have other split lines. The only way to sort of fix some of it is to check for lack of punctuation at the end of the line and combine that with the next line. But the only way to fix it is to find a good PDF or pBook source and s/b compare. That's how it will be fixed.

Last edited by JSWolf; 02-24-2018 at 04:30 PM.
JSWolf is offline   Reply With Quote
Old 02-24-2018, 04:55 PM   #14
nopi1001
Junior Member
nopi1001 began at the beginning.
 
nopi1001's Avatar
 
Posts: 4
Karma: 48
Join Date: Feb 2018
Location: SC
Device: kindle paperwhite
...I explicitly stated in previous posts on this thread that my problem has been solved by sjfan (his solution/logic worked for my problem and with it I have already made headway in other projects) and was further helped along with his later advice and the advice from others. I will continue to convert from PDF and do not appreciate your unhelpful comments.
nopi1001 is offline   Reply With Quote
Old 02-24-2018, 05:01 PM   #15
sjfan
Addict
sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.
 
Posts: 281
Karma: 7724454
Join Date: Sep 2017
Location: Bethesda, MD, USA
Device: Kobo Aura H20, Kobo Clara HD
Quote:
Originally Posted by JSWolf View Post
If you are converting from PDF, then the solution to that is don't do it. It's not worth the hassle.

The problem is that if you have split dialog, you have other split lines. The only way to sort of fix some of it is to check for lack of punctuation at the end of the line and combine that with the next line. But the only way to fix it is to find a good PDF or pBook source and s/b compare. That's how it will be fixed.
I’m not sure why you’re hung up on PDFs in particular; creating a clean marked up copy—be it epub, HTML, LaTeX, whatever—from a PDF is no worse than doing so from a paper copy or scanned source. It may be better, depending on where the letterforms came from and whether the alternative requires OCR. It might even be better than starting from a plain text file, depending on how the latter handles line and page breaks and such.

It’s certainly true that you need to do a manual check of things to get everything right. But it’s still worth automating what you can. There are certainly cases like:
Quote:
“Call the general,” he said.

“We’re moving forward with the plan immediately.”
that are virtually impossible to automate; whether to merge those two lines depends on whether they're meant to be spoken by the same person or not. And some paragraphs will incorrectly split on a sentence border by happenstance; that you need to check by hand as well.

But you still want to automate what you can. It saves a lot of work and reduces error rates. If you take care of 90% cases mechanically, you’re mentally freer as you're reading through and proofing things. It allows you to focus your energy on the cases that really need some thought. And it reduces the chances that you’ll miss something.
sjfan is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Unquoted dialogue BetterRed Workshop 10 08-16-2015 04:16 AM
Dialogue Questions jhempel24 Writers' Corner 20 12-10-2012 06:17 PM
Writing online dialogue mr ploppy Writers' Corner 7 05-02-2011 05:17 PM
Adding page breaks in Calibre breaks ePubcheck validation bookraft Conversion 16 03-01-2011 01:23 PM
Request Success Dialogue aidren enTourage Archive 0 04-19-2010 06:19 PM


All times are GMT -4. The time now is 08:22 PM.


MobileRead.com is a privately owned, operated and funded community.