Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 07-25-2020, 01:50 PM   #1
calvin-c
Guru
calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.
 
Posts: 787
Karma: 1575310
Join Date: Jul 2009
Device: Moon+ Pro
Same book from two sources shows severe diff in word count

Is there a good way to compare two versions of the same book? I have a book downloaded from two different sources. I've run the Count Pages plugin on both. One shows 30,000 words, the other 37,000. I've checked the obvious (licensing pages) and didn't find the extra words there. I thought it might be organization so I checked in editor. One has 71 files, the other has 109. If Count Pages was counting words in the <head> then that still doesn't explain it-because the one with fewer files shows as having more words.

Is there a plug in that can compare two books & highlight any text differences? Or even any differences at all. Thanks.
calvin-c is offline   Reply With Quote
Old 07-25-2020, 02:44 PM   #2
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 79,740
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
The best way to do it is to convert each book to text and load them both in Notepad++. With Notepad++, you can download the compare plugin and that will show the differences.

Thing is, one copy could have a preface, about the author, other books the authro has written,a review snippet section, one could be ePub 3 and the other ePub 2 and the ePub 3 ToC would be counted, could be a preview of some other book in one and not the other. There's a number of reason for the word count difference.
JSWolf is offline   Reply With Quote
Advert
Old 07-25-2020, 03:00 PM   #3
calvin-c
Guru
calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.
 
Posts: 787
Karma: 1575310
Join Date: Jul 2009
Device: Moon+ Pro
Thanks for the suggestion about Notepad++. I used that years ago but somehow never installed it on my new PC. I'll do that today. As for your other ideas, I manually checked both the beginning & end of the books so no preface, preview of next book, etc. Unless, for some weird reason, they put it in the middle of the book. Didn't think to check for ePub 2 vs 3 though. Thanks for your suggestions.
calvin-c is offline   Reply With Quote
Old 07-25-2020, 03:11 PM   #4
thiago.eec
Wizard
thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.thiago.eec ought to be getting tired of karma fortunes by now.
 
Posts: 1,211
Karma: 1419583
Join Date: Dec 2016
Location: Goiânia - Brazil
Device: iPad, Kindle Paperwhite, Kindle Oasis
Quote:
Originally Posted by calvin-c View Post
Is there a good way to compare two versions of the same book? I have a book downloaded from two different sources. I've run the Count Pages plugin on both. One shows 30,000 words, the other 37,000. I've checked the obvious (licensing pages) and didn't find the extra words there. I thought it might be organization so I checked in editor. One has 71 files, the other has 109. If Count Pages was counting words in the <head> then that still doesn't explain it-because the one with fewer files shows as having more words.

Is there a plug in that can compare two books & highlight any text differences? Or even any differences at all. Thanks.
You can check the difference with calibre Editor:

1) Open Book 01 with the Editor (on calibre library, select it and press T)
2) On the Editor window, go to File > Compare to another book
3) Browse for Book 02
4) The diferences will be displayed side by side, line by line
thiago.eec is offline   Reply With Quote
Old 07-25-2020, 03:16 PM   #5
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 79,740
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by thiago.eec View Post
You can check the difference with calibre Editor:

1) Open Book 01 with the Editor (on calibre library, select it and press T)
2) On the Editor window, go to File > Compare to another book
3) Browse for Book 02
4) The diferences will be displayed side by side, line by line
It won't work if the filenames are different and for sure, some filenames will be different given that one has more HTML files then the other. That's why I didn't suggest this.
JSWolf is offline   Reply With Quote
Advert
Old 07-25-2020, 03:31 PM   #6
calvin-c
Guru
calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.
 
Posts: 787
Karma: 1575310
Join Date: Jul 2009
Device: Moon+ Pro
FYI, I think I've found the problem. One file has numerous instances of words run together without spaces. Obviously this counts what could be 3 or 4 words as if they were a single word. There's also an issue with the textual Contents page. One has dot leaders separate, for some reason, by spaces. If I'm understanding things right, that counts each dot as a word. The other, with no dot leaders, would therefore have quite a few fewer 'words'. Thanks.
calvin-c is offline   Reply With Quote
Old 07-25-2020, 04:04 PM   #7
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 79,740
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by calvin-c View Post
FYI, I think I've found the problem. One file has numerous instances of words run together without spaces. Obviously this counts what could be 3 or 4 words as if they were a single word. There's also an issue with the textual Contents page. One has dot leaders separate, for some reason, by spaces. If I'm understanding things right, that counts each dot as a word. The other, with no dot leaders, would therefore have quite a few fewer 'words'. Thanks.
I do not think a period would be counted as a word as it's not a word.
JSWolf is offline   Reply With Quote
Old 07-26-2020, 09:47 AM   #8
calvin-c
Guru
calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.
 
Posts: 787
Karma: 1575310
Join Date: Jul 2009
Device: Moon+ Pro
You're probably right so the difference is due to words run together. Seems a huge difference for that but the more I look into it the more likely it is. Looking at the code I can tell it was produced by a conversion program. Some words have spaces between them. Others have a <scan> tag applying a class to that word only. And usually a class set up specifically for that word. (Although some are re-used the stylesheet still has over 500 classes.) This can go on for 10-20 words before you find a space. Going to be work to clean up so maybe I'll just take the version with more 'words' even if I don't like the formatting as well.
calvin-c is offline   Reply With Quote
Old 07-26-2020, 10:06 AM   #9
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 79,740
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
What is the book?
JSWolf is offline   Reply With Quote
Old 07-26-2020, 04:06 PM   #10
calvin-c
Guru
calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.
 
Posts: 787
Karma: 1575310
Join Date: Jul 2009
Device: Moon+ Pro
It's a copy I found of the 1st 3 Investigators book, Secret of Terror Castle. Not entirely sure it's a legal copy so I won't say where I found it. Except I did find it on several websites which, generally, don't carry pirated copies. (That's how I ended up with the different versions.) I haven't found it in ebook on any site that sells ebooks so I have my suspicions-and will probably get rid of it once I satisfy myself about the word count discrepancy. As an aside, I really don't understand somebody who puts in the work to create an ebook version then puts it online without ever, apparently, looking at the results.
calvin-c is offline   Reply With Quote
Old 07-26-2020, 05:26 PM   #11
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 46,149
Karma: 168983734
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by calvin-c View Post
It's a copy I found of the 1st 3 Investigators book, Secret of Terror Castle. Not entirely sure it's a legal copy so I won't say where I found it. Except I did find it on several websites which, generally, don't carry pirated copies. (That's how I ended up with the different versions.) I haven't found it in ebook on any site that sells ebooks so I have my suspicions-and will probably get rid of it once I satisfy myself about the word count discrepancy. As an aside, I really don't understand somebody who puts in the work to create an ebook version then puts it online without ever, apparently, looking at the results.
For what it's worth, it is unlikely to be a legal copy. What you are seeing is probably the result of a quick scan/OCR with no attention paid to how the ebook looks and a second copy where someone cared enough to do some cleanup.
DNSB is offline   Reply With Quote
Old 07-26-2020, 09:22 PM   #12
calvin-c
Guru
calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.calvin-c ought to be getting tired of karma fortunes by now.
 
Posts: 787
Karma: 1575310
Join Date: Jul 2009
Device: Moon+ Pro
Strange thing is, I'm not seeing any of the word errors typical of an unproofed OCR conversion. Only formatting problems. But I'm coming to agree with you-probably not a legal copy. So I'll end this discussion. I was more interested in figuring out the problem than in keeping the ebook anyway. Thanks.
calvin-c is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Word Count and Page Count? CrossReach Library Management 2 07-19-2018 05:44 PM
Word Count in Marvin 3? Deahna Marvin 10 10-31-2017 07:41 PM
Word Count? noirverse Marvin 0 11-11-2016 08:23 PM
word count Tanjamuse Editor 5 11-09-2014 06:31 AM
Word Count leebase Calibre 34 06-07-2011 11:53 PM


All times are GMT -4. The time now is 09:25 PM.


MobileRead.com is a privately owned, operated and funded community.