Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 07-16-2022, 02:21 PM   #1
slantybard
my parent's oops...
slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.
 
Posts: 492
Karma: 1477572
Join Date: Feb 2009
Device: Vx->Handera->Clie-> Axim->505->650->KPW/Aura ->L2->iOS/CBW
User Error in Search/Replace?

When copying paragraphs of text from #blurb to #comments, Calibre is inserting <br> tags into the html of the text. I have attached 3 images of this issue - the first is the regex being used to copy text and the other 2 show the results (normal view and html view). I am interested in *not* having Calibre insert <br> tags into the copies text.

Thanks for any thoughts.
Attached Thumbnails
Click image for larger version

Name:	Screen Shot 2022-07-16 at 11.15.10 AM.jpeg
Views:	78
Size:	244.7 KB
ID:	195048   Click image for larger version

Name:	Screen Shot 2022-07-16 at 11.14.07 AM.jpeg
Views:	73
Size:	424.0 KB
ID:	195049   Click image for larger version

Name:	Screen Shot 2022-07-16 at 11.14.24 AM.jpeg
Views:	80
Size:	379.1 KB
ID:	195050  
slantybard is offline   Reply With Quote
Old 07-16-2022, 02:39 PM   #2
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,432
Karma: 8012664
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
S & R doesn't add any text or html tags.

Your first screen capture shows that #blurb has html markup, which is copied to comments. The end of the comments matches the end of #blurb. Have you "proved" that the <br> tags are not already in #blurb?
chaley is offline   Reply With Quote
Advert
Old 07-16-2022, 03:21 PM   #3
slantybard
my parent's oops...
slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.
 
Posts: 492
Karma: 1477572
Join Date: Feb 2009
Device: Vx->Handera->Clie-> Axim->505->650->KPW/Aura ->L2->iOS/CBW
Quote:
Originally Posted by chaley View Post
S & R doesn't add any text or html tags.

Your first screen capture shows that #blurb has html markup, which is copied to comments. The end of the comments matches the end of #blurb. Have you "proved" that the <br> tags are not already in #blurb?
Yes, I have "proved" that the <br> tags are not in the #blurb - if you look at the 3rd image, it shows the html for both comments and the blurb - the blurb html has no <br> whereas the text copied to the comments now has <br> tags. The <br> tags are being added by Calibre when doing the S&R.
slantybard is offline   Reply With Quote
Old 07-16-2022, 04:14 PM   #4
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,432
Karma: 8012664
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
It isn't search & replace that is changing it.

Experiment: appending a custom HTML comment to the standard comment
  • Here is an image of the html of the comments field before running S&R.
    Click image for larger version

Name:	comments before changing.jpg
Views:	62
Size:	55.7 KB
ID:	195051
  • Here is the contents of a custom HTML column that will be copied to comments.
    Click image for larger version

Name:	custom comment before changing.jpg
Views:	60
Size:	36.9 KB
ID:	195052
  • Here is the S&R used.
    Click image for larger version

Name:	search and replace settings.jpg
Views:	59
Size:	137.5 KB
ID:	195053
  • Here is the contents of the "Test text" for book 1, same as shown in the above image.
    Code:
    <div>
    <p>This is a line in the comment.</p>
    <p>This is another line in the comment.</p></div>
  • Here is the contents of the "Test result" for book 1. The result is the two fields concatenated. No BR tags are added.
    Code:
    <div>
    <p>{This is the only text in this comment}</p></div><div>
    <p>This is a line in the comment.</p>
    <p>This is another line in the comment.</p></div>
  • Here is an image of the contents of comments after the S&R. *Something* (not S&R) cleaned up the </div><div>, joining the two blocks together. Other than that the HTML is the same as what S&R wrote into the column. No BR tags are added.
    Click image for larger version

Name:	Contents of comments column.jpg
Views:	56
Size:	22.2 KB
ID:	195054
To determine what is "cleaning" the HTML, I tried pasting the "after" text into a custom HTML column.
  • After the paste it looks like this:
    Click image for larger version

Name:	Experiment after paste.jpg
Views:	60
Size:	22.0 KB
ID:	195055
  • Switch to Text mode:
    Click image for larger version

Name:	Experiment normal view.jpg
Views:	59
Size:	29.5 KB
ID:	195056
  • Switch back to HTML mode. The internal DIVs are gone. Nothing to do with S&R
    Click image for larger version

Name:	Experiment after reopen html.jpg
Views:	64
Size:	19.8 KB
ID:	195057
These experiments show that the Qt RichText (HTML) widget changes the widget's contents. Nothing to do with S&R.
chaley is offline   Reply With Quote
Old 07-16-2022, 05:45 PM   #5
slantybard
my parent's oops...
slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.
 
Posts: 492
Karma: 1477572
Join Date: Feb 2009
Device: Vx->Handera->Clie-> Axim->505->650->KPW/Aura ->L2->iOS/CBW
Ok, thanks so much for showing me that. I have run some tests on an empty test epub book that has "1000" Words in the #Words metadata and the Calibre workflow I have used is introducing <br> tags somehow. Can you look at what I'm doing and give your thoughts please and thanks:

Image 1: Empty epub test file with: 1000 words, no comments, test paragraphs in #blurb (no <br> tags) - I have attached it if someone wants to try this on their system

Image 2: Run S&R regex to copy "Words" count to the comment field:
Search field #wordcount
Code:
(\d+)
Replace append mode field comments
Code:
\1 Words
See Image 2 and the results of this S&R in Image 3
Notice that the results image does not have any <br> tags anywhere

Image 4: Run S&R regex to append the #blurb paragraphs to the comment field:
Search field #blurb
Code:
(.*)
or
Code:
^(.*)$
Replace append mode field comments
Code:
\1
See Image 4 and the results of this in Images 5 (normal view) and 6 (html view) where <br> tags have somehow been introduced???

If I do NOT run the 1st S&R regex copying the word count into the comments, <br> tags are not introduced by the 2nd regex copying the blurb

If I have an empty comments field OR have manually typed in the word count into the comments field first (ie again, not running the 1st regex) before running the 2nd regex, <br> tags are not introduced.

I guess my final question would be if this is a Calibre issue or QT5/6 issue?
Attached Thumbnails
Click image for larger version

Name:	Image 1.jpeg
Views:	57
Size:	166.4 KB
ID:	195059   Click image for larger version

Name:	Image 2.jpeg
Views:	61
Size:	398.7 KB
ID:	195060   Click image for larger version

Name:	Image 3.jpeg
Views:	45
Size:	201.5 KB
ID:	195061   Click image for larger version

Name:	Image 4.jpeg
Views:	55
Size:	457.2 KB
ID:	195062   Click image for larger version

Name:	Image 5.jpeg
Views:	58
Size:	273.0 KB
ID:	195063   Click image for larger version

Name:	Image 6.jpeg
Views:	52
Size:	252.5 KB
ID:	195064  
Attached Files
File Type: epub Test - test.epub (2.4 KB, 42 views)

Last edited by slantybard; 07-16-2022 at 05:58 PM.
slantybard is offline   Reply With Quote
Advert
Old 07-17-2022, 06:37 AM   #6
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,432
Karma: 8012664
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
I tried your experiment. Jumping to the end, I found that the problem is caused by the "1000 Words" not being enclosed in HTML.

My apologies for the length of what follows. I wrote down all the steps so I could be sure I knew what I was doing/had done. The steps I ran to discover this:
  1. Empty the comments field.
  2. Use your S&R to copy int Words to the comments field. After the S&R comments *but before Edit Metadata* contains
    Code:
    1000 Words
    I used a database manager to look directly at the data in the comments column.
  3. Look at the book with the Metadata Editor.
    Code:
    <div>
    <p>1000 Words</p></div>
    Some HTML has magically appeared. I checked the database. That HTML isn't actually in the column data, according to the database.
  4. Set a custom comment to the following HTML
    Code:
    <div>
    <p>This is a line in the comment.</p>
    <p>This is another line in the comment.</p></div>
    Closed the metadata editor and reopened. The text was still correct.
  5. Checked the blurb field in the database. It is the same.
  6. Ran the S&R to append the blurb to comments. S&R says it is putting the following in comments.
    Code:
    1000 Words<div>
    <p>This is a line in the comment.</p>
    <p>This is another line in the comment.</p></div>
  7. Checking the database, the comments column contains the same text.
    Code:
    1000 Words<div>
    <p>This is a line in the comment.</p>
    <p>This is another line in the comment.</p></div>
  8. Edit metatada on the book. The comments field shows
    Code:
    <div>
    <p>1000 Words</p>
    <p><br></p>
    <p>This is a line in the comment.</p>
    <p><br></p>
    <p>This is another line in the comment.</p></div>
    This is very different from what is in the database.
  9. Checking the database, it still contains the original text.
    Code:
    1000 Words<div>
    <p>This is a line in the comment.</p>
    <p>This is another line in the comment.</p></div>
    Lets look at what HTML is being used when showing comments in book details. I changed to code to show it.
  10. Here is the output being sent to book details. It has all the cruft and more.
    Code:
    <p class="description">1000 Words</p>
    <div><br/><p class="description">This is a line in the comment.</p>
    <br/><p class="description">This is another line in the comment.</p>
    </div>
    Why?

    Looking at the code, calibre does this because it thinks the comments aren't already HTML because the content doesn't start with a '<'. Lets try fixing that by changing the original "1000 Words" to be "<div>1000 words</div>" so calibre thinks it is HTML. After the S&R, the comments field contains
    Code:
    <div>1000 Words</div>
  11. edit metadata shows it as
    Code:
    <div>
    <p>1000 Words</p></div>
    Not the same but not as different as above. The database hasn't changed, though.
  12. Run the second S&R to add the blurb. The comments field now contains
    Code:
    <div>1000 Words</div><div>
    <p>This is a line in the comment.</p>
    <p>This is another line in the comment.</p></div>
  13. The comments field HTML in edit metadata now shows:
    Code:
    <div>
    <p>1000 Words</p>
    <p>This is a line in the comment.</p>
    <p>This is another line in the comment.</p></div>
    This isn't what is in the database but it is much closer.
  14. The output in book details is
    Code:
    <div>1000 Words</div><div>
    <p>This is a line in the comment.</p>
    <p>This is another line in the comment.</p></div>
    This is actually what is in the database. No extra <br> or what-have-you.

    We now know how to "fix" it. Ensure that *everything* that is added to comments is correct HTML. In this case that means surrounding the "words" stuff with tags.
  15. We see that using <div> causes calibre to add <p>. For fun, try it using <p></p> around the word count instead of div. Without showing the intermediate results, we get in edit metadata:
    Code:
    <div>
    <p>1000 Words</p>
    <p>This is a line in the comment.</p>
    <p>This is another line in the comment.</p></div>
    This isn't bad!

    We see in book details:
    Code:
    <p>1000 Words</p><div>
    <p>This is a line in the comment.</p>
    <p>This is another line in the comment.</p></div>
    which is what we want.
Bottom line: you must ensure that the resulting comments field contains valid HTML. As such, your first S&R should be like this:
Click image for larger version

Name:	Clipboard01.jpg
Views:	70
Size:	126.6 KB
ID:	195080

Looking at the code I thought about why calibre was adding all the stuff if the contents didn't appear to be HTML. The answer (I think) is that calibre has no idea what is in there. It must protect itself against strange and malformed html to avoid having book details get scribbled on. One could argue it is being overzealous, but Kovid arrived here after many years of experience with strange things.
chaley is offline   Reply With Quote
Old 07-20-2022, 02:45 PM   #7
slantybard
my parent's oops...
slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.slantybard ought to be getting tired of karma fortunes by now.
 
Posts: 492
Karma: 1477572
Join Date: Feb 2009
Device: Vx->Handera->Clie-> Axim->505->650->KPW/Aura ->L2->iOS/CBW
Thank you so much for looking into this and taking the time to explain what is happening. I really appreciate your effort!
slantybard is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Search & Replace Error MerlinMama Editor 1 07-12-2017 03:36 AM
Regex in search problems (NOT Search&Replace; the search bar) lairdb Calibre 3 03-15-2017 07:10 PM
"invalid group reference" error on search/replace fodiator Calibre 4 11-01-2015 12:11 PM
I got this error when comparing search replace results! user743 Editor 0 06-26-2014 08:30 PM
save multiple search/replace, or search/replace multiple ebooks user743 Editor 12 04-12-2014 02:38 AM


All times are GMT -4. The time now is 02:49 PM.


MobileRead.com is a privately owned, operated and funded community.