|  07-25-2020, 01:50 PM | #1 | 
| Guru            Posts: 787 Karma: 1575310 Join Date: Jul 2009 Device: Moon+ Pro | 
				
				Same book from two sources shows severe diff in word count
			 
			
			Is there a good way to compare two versions of the same book? I have a book downloaded from two different sources. I've run the Count Pages plugin on both. One shows 30,000 words, the other 37,000. I've checked the obvious (licensing pages) and didn't find the extra words there. I thought it might be organization so I checked in editor. One has 71 files, the other has 109. If Count Pages was counting words in the <head> then that still doesn't explain it-because the one with fewer files shows as having more words. Is there a plug in that can compare two books & highlight any text differences? Or even any differences at all. Thanks. | 
|   |   | 
|  07-25-2020, 02:44 PM | #2 | 
| Resident Curmudgeon            Posts: 80,594 Karma: 150249619 Join Date: Nov 2006 Location: Roslindale, Massachusetts Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3 | 
			
			The best way to do it is to convert each book to text and load them both in Notepad++. With Notepad++, you can download the compare plugin and that will show the differences. Thing is, one copy could have a preface, about the author, other books the authro has written,a review snippet section, one could be ePub 3 and the other ePub 2 and the ePub 3 ToC would be counted, could be a preview of some other book in one and not the other. There's a number of reason for the word count difference. | 
|   |   | 
|  07-25-2020, 03:00 PM | #3 | 
| Guru            Posts: 787 Karma: 1575310 Join Date: Jul 2009 Device: Moon+ Pro | 
			
			Thanks for the suggestion about Notepad++. I used that years ago but somehow never installed it on my new PC. I'll do that today. As for your other ideas, I manually checked both the beginning & end of the books so no preface, preview of next book, etc. Unless, for some weird reason, they put it in the middle of the book. Didn't think to check for ePub 2 vs 3 though. Thanks for your suggestions.
		 | 
|   |   | 
|  07-25-2020, 03:11 PM | #4 | |
| Wizard            Posts: 1,282 Karma: 1419583 Join Date: Dec 2016 Location: Goiânia - Brazil Device: iPad, Kindle Paperwhite, Kindle Oasis | Quote: 
 1) Open Book 01 with the Editor (on calibre library, select it and press T) 2) On the Editor window, go to File > Compare to another book 3) Browse for Book 02 4) The diferences will be displayed side by side, line by line | |
|   |   | 
|  07-25-2020, 03:16 PM | #5 | |
| Resident Curmudgeon            Posts: 80,594 Karma: 150249619 Join Date: Nov 2006 Location: Roslindale, Massachusetts Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3 | Quote: 
 | |
|   |   | 
|  07-25-2020, 03:31 PM | #6 | 
| Guru            Posts: 787 Karma: 1575310 Join Date: Jul 2009 Device: Moon+ Pro | 
			
			FYI, I think I've found the problem. One file has numerous instances of words run together without spaces. Obviously this counts what could be 3 or 4 words as if they were a single word. There's also an issue with the textual Contents page. One has dot leaders separate, for some reason, by spaces. If I'm understanding things right, that counts each dot as a word. The other, with no dot leaders, would therefore have quite a few fewer 'words'. Thanks.
		 | 
|   |   | 
|  07-25-2020, 04:04 PM | #7 | |
| Resident Curmudgeon            Posts: 80,594 Karma: 150249619 Join Date: Nov 2006 Location: Roslindale, Massachusetts Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3 | Quote: 
 | |
|   |   | 
|  07-26-2020, 09:47 AM | #8 | 
| Guru            Posts: 787 Karma: 1575310 Join Date: Jul 2009 Device: Moon+ Pro | 
			
			You're probably right so the difference is due to words run together. Seems a huge difference for that but the more I look into it the more likely it is. Looking at the code I can tell it was produced by a conversion program. Some words have spaces between them. Others have a <scan> tag applying a class to that word only. And usually a class set up specifically for that word. (Although some are re-used the stylesheet still has over 500 classes.) This can go on for 10-20 words before you find a space. Going to be work to clean up so maybe I'll just take the version with more 'words' even if I don't like the formatting as well.
		 | 
|   |   | 
|  07-26-2020, 10:06 AM | #9 | 
| Resident Curmudgeon            Posts: 80,594 Karma: 150249619 Join Date: Nov 2006 Location: Roslindale, Massachusetts Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3 | 
			
			What is the book?
		 | 
|   |   | 
|  07-26-2020, 04:06 PM | #10 | 
| Guru            Posts: 787 Karma: 1575310 Join Date: Jul 2009 Device: Moon+ Pro | 
			
			It's a copy I found of the 1st 3 Investigators book, Secret of Terror Castle. Not entirely sure it's a legal copy so I won't say where I found it. Except I did find it on several websites which, generally, don't carry pirated copies. (That's how I ended up with the different versions.) I haven't found it in ebook on any site that sells ebooks so I have my suspicions-and will probably get rid of it once I satisfy myself about the word count discrepancy. As an aside, I really don't understand somebody who puts in the work to create an ebook version then puts it online without ever, apparently, looking at the results.
		 | 
|   |   | 
|  07-26-2020, 05:26 PM | #11 | |
| Bibliophagist            Posts: 47,869 Karma: 174315098 Join Date: Jul 2010 Location: Vancouver Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos | Quote: 
 | |
|   |   | 
|  07-26-2020, 09:22 PM | #12 | 
| Guru            Posts: 787 Karma: 1575310 Join Date: Jul 2009 Device: Moon+ Pro | 
			
			Strange thing is, I'm not seeing any of the word errors typical of an unproofed OCR conversion. Only formatting problems. But I'm coming to agree with you-probably not a legal copy. So I'll end this discussion. I was more interested in figuring out the problem than in keeping the ebook anyway. Thanks.
		 | 
|   |   | 
|  | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Word Count and Page Count? | CrossReach | Library Management | 2 | 07-19-2018 05:44 PM | 
| Word Count in Marvin 3? | Deahna | Marvin | 10 | 10-31-2017 07:41 PM | 
| Word Count? | noirverse | Marvin | 0 | 11-11-2016 08:23 PM | 
| word count | Tanjamuse | Editor | 5 | 11-09-2014 06:31 AM | 
| Word Count | leebase | Calibre | 34 | 06-07-2011 11:53 PM |