Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book General > General Discussions

Notices

Reply
 
Thread Tools Search this Thread
Old 02-27-2018, 08:12 PM   #1
avid01
Addict
avid01 ought to be getting tired of karma fortunes by now.avid01 ought to be getting tired of karma fortunes by now.avid01 ought to be getting tired of karma fortunes by now.avid01 ought to be getting tired of karma fortunes by now.avid01 ought to be getting tired of karma fortunes by now.avid01 ought to be getting tired of karma fortunes by now.avid01 ought to be getting tired of karma fortunes by now.avid01 ought to be getting tired of karma fortunes by now.avid01 ought to be getting tired of karma fortunes by now.avid01 ought to be getting tired of karma fortunes by now.avid01 ought to be getting tired of karma fortunes by now.
 
Posts: 300
Karma: 2000410
Join Date: Jan 2012
Device: Kindle 4
Easiest way to count the occurrence of a word across a few EPUB books?

I see we have dedicated threads for a few different e-book software, which if good. I have no idea in which software's topic to ask, so maybe I should ask it here?

I want to count for the occurrence of a word across a few EPUB books. What's the easiest or best ways to accomplish that? The search function didn't really give clue to this type of search.
avid01 is offline   Reply With Quote
Old 02-27-2018, 09:24 PM   #2
Ripplinger
350 Hoarder
Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.
 
Ripplinger's Avatar
 
Posts: 3,574
Karma: 8281267
Join Date: Dec 2010
Location: Midwest USA
Device: Sony PRS-350, Kobo Glo & Glo HD, PW2
I'm more familiar with Sigil since I use that most often, but in the lower right corner when you search for a word in Sigil, click the "Count all" button and it will show you the number of times the word occurs. Then just do that for any other books.

Others may have their favorite software they use for such functions.
Ripplinger is offline   Reply With Quote
Old 02-27-2018, 10:46 PM   #3
ilovejedd
hopeless n00b
ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.
 
ilovejedd's Avatar
 
Posts: 5,110
Karma: 19597086
Join Date: Jan 2009
Location: in the middle of nowhere
Device: PW4, PW3, Libra H2O, iPad 10.5, iPad 11, iPad 12.9
I keep plain text copies of my ebooks and just use Notepad++ "Find in Files" feature when I need to search for something.

If you're running Linux, I think there may be a Linux-specific plugin for Calibre that will do full text searches.
ilovejedd is offline   Reply With Quote
Old 02-27-2018, 11:00 PM   #4
ProDigit
Karmaniac
ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.
 
Posts: 2,553
Karma: 11499146
Join Date: Oct 2008
Location: Miami FL
Device: PRS-505, Jetbook, + Mini, +Color, Astak Ez Reader Pro, PPW1, Aura H2O
Well, your best bet is Kindle's X-ray.
It'll tell you per book, so if you have a kindle with X-ray, you just check each book for the word on X-ray.

Easiest solution.

Another solution, is to find and download a digital version of the book online (could be txt, epub, pdf, ...) , and search with a word processor on a pc.
ProDigit is offline   Reply With Quote
Old 02-28-2018, 08:11 AM   #5
avid01
Addict
avid01 ought to be getting tired of karma fortunes by now.avid01 ought to be getting tired of karma fortunes by now.avid01 ought to be getting tired of karma fortunes by now.avid01 ought to be getting tired of karma fortunes by now.avid01 ought to be getting tired of karma fortunes by now.avid01 ought to be getting tired of karma fortunes by now.avid01 ought to be getting tired of karma fortunes by now.avid01 ought to be getting tired of karma fortunes by now.avid01 ought to be getting tired of karma fortunes by now.avid01 ought to be getting tired of karma fortunes by now.avid01 ought to be getting tired of karma fortunes by now.
 
Posts: 300
Karma: 2000410
Join Date: Jan 2012
Device: Kindle 4
Quote:
Originally Posted by Ripplinger View Post
I'm more familiar with Sigil since I use that most often, but in the lower right corner when you search for a word in Sigil, click the "Count all" button and it will show you the number of times the word occurs. Then just do that for any other books.
I fired up an EPUB in Sigil, even found "Count all," but it dissects the book into single HTML files and it seems it wants to find my word on one page only at a time. That's how much familiar I'm with Sigil.

Quote:
Originally Posted by ilovejedd View Post
I keep plain text copies of my ebooks and just use Notepad++ "Find in Files" feature when I need to search for something.
It was easy to convert my ePUBs to TXT with Calibre, but the second part doesn't seem to be so straightforward and a Google search didn't help much, either.

Quote:
Originally Posted by ProDigit View Post
Well, your best bet is Kindle's X-ray.
It'll tell you per book, so if you have a kindle with X-ray, you just check each book for the word on X-ray.

Easiest solution.
Thanks, if all else (the above) fails, I may try this one. My old Kindle doesn't have this feature built-in and my first preference is solving the quest with free (as in free software) and open source tools. I'm on a PC.

Quote:
Originally Posted by ProDigit View Post
Another solution, is to find and download a digital version of the book online (could be txt, epub, pdf, ...) , and search with a word processor on a pc.
Hint: as I stated in the thread stater I have the books in EPUB format. With a word processor? That's a little vague for a complete solution.

Thanks to all!
avid01 is offline   Reply With Quote
Old 02-28-2018, 08:49 AM   #6
sjfan
Addict
sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.
 
Posts: 281
Karma: 7724454
Join Date: Sep 2017
Location: Bethesda, MD, USA
Device: Kobo Aura H20, Kobo Clara HD
Quote:
Originally Posted by avid01 View Post
I fired up an EPUB in Sigil, even found "Count all," but it dissects the book into single HTML files and it seems it wants to find my word on one page only at a time.
On the "Search" dialog, change the "Mode" dropdown at the bottom from "Current File" to "All HTML Files". Then run "Count All" again.
sjfan is offline   Reply With Quote
Old 02-28-2018, 09:09 AM   #7
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,341
Karma: 203719646
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
For the record: Sigil doesn't dissect the book into individual html files; the ebook's creator does (because it's common practice to do so). Sigil will happily allow someone to create an epub with one monstrous html file if they like.
DiapDealer is offline   Reply With Quote
Old 02-28-2018, 09:42 AM   #8
Ripplinger
350 Hoarder
Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.
 
Ripplinger's Avatar
 
Posts: 3,574
Karma: 8281267
Join Date: Dec 2010
Location: Midwest USA
Device: Sony PRS-350, Kobo Glo & Glo HD, PW2
Quote:
Originally Posted by avid01 View Post
I fired up an EPUB in Sigil, even found "Count all," but it dissects the book into single HTML files and it seems it wants to find my word on one page only at a time. That's how much familiar I'm with Sigil.
Quote:
Originally Posted by sjfan View Post
On the "Search" dialog, change the "Mode" dropdown at the bottom from "Current File" to "All HTML Files". Then run "Count All" again.
Do what sjfan said to fix that. My normal mode is "All HTML Files" so I forgot to mention that setting. It's just under the search box entries. That option is very handy when you do need to search a current page only.
Ripplinger is offline   Reply With Quote
Old 02-28-2018, 10:44 AM   #9
ilovejedd
hopeless n00b
ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.
 
ilovejedd's Avatar
 
Posts: 5,110
Karma: 19597086
Join Date: Jan 2009
Location: in the middle of nowhere
Device: PW4, PW3, Libra H2O, iPad 10.5, iPad 11, iPad 12.9
Quote:
Originally Posted by avid01 View Post
It was easy to convert my ePUBs to TXT with Calibre, but the second part doesn't seem to be so straightforward and a Google search didn't help much, either.
Install Notepad++. There's a portable version, too, if you don't want it cluttering your programs.
  1. Select Search - Find in Files
  2. Input the relevant parameters (search text, file filters, directory, case sensitivity, etc)
  3. Click on Find All

Screenshots are attached for reference.

For me, Notepad++ (or similar text editors) is the easiest option since I sometimes have to search through hundreds of different ebooks (typically for lines that I remember but didn't annotate/lost the annotation).
Attached Thumbnails
Click image for larger version

Name:	notepad001.png
Views:	334
Size:	40.8 KB
ID:	162548   Click image for larger version

Name:	notepad002.png
Views:	319
Size:	13.0 KB
ID:	162549   Click image for larger version

Name:	notepad003.png
Views:	311
Size:	45.4 KB
ID:	162550   Click image for larger version

Name:	notepad004.png
Views:	289
Size:	61.5 KB
ID:	162551   Click image for larger version

Name:	notepad005.png
Views:	310
Size:	61.0 KB
ID:	162552  

Last edited by ilovejedd; 02-28-2018 at 10:55 AM.
ilovejedd is offline   Reply With Quote
Old 02-28-2018, 10:15 PM   #10
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by avid01 View Post
I want to count for the occurrence of a word across a few EPUB books. What's the easiest or best ways to accomplish that?
You could use EPUBMerge plugin for Calibre:

https://www.mobileread.com/forums/sh...d.php?t=169744

merge all of them into one behemoth EPUB, and then open it in Calibre's Editor and use Tools > Check Spelling.

AZARDI is an EPUB reader that lets you search across multiple EPUBs:

http://azardi.infogridpacific.com/

You have to right click each EPUB and "Index" it, but after that you can search across them freely.

May I ask what the use-case is? Are you trying to see how often a character is mentioned in a series?

Last edited by Tex2002ans; 02-28-2018 at 10:19 PM.
Tex2002ans is offline   Reply With Quote
Old 03-01-2018, 01:05 PM   #11
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 44,568
Karma: 167913281
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by avid01 View Post
I fired up an EPUB in Sigil, even found "Count all," but it dissects the book into single HTML files and it seems it wants to find my word on one page only at a time. That's how much familiar I'm with Sigil.
Hmmm... Enter text to find in the Find: box, Mode: Normal, All HTML files, Up/Down doesn't matter if Wrap is checked and click Count All.
Attached Thumbnails
Click image for larger version

Name:	sigil_search.PNG
Views:	266
Size:	6.5 KB
ID:	162567  

Last edited by DNSB; 03-01-2018 at 01:10 PM. Reason: fat fingers cause typos...
DNSB is offline   Reply With Quote
Old 03-10-2018, 02:18 AM   #12
ProDigit
Karmaniac
ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.
 
Posts: 2,553
Karma: 11499146
Join Date: Oct 2008
Location: Miami FL
Device: PRS-505, Jetbook, + Mini, +Color, Astak Ez Reader Pro, PPW1, Aura H2O
Unpack the ebooks with a compression tool.
Search for programs in google:
Free tools include '7z' (free program), 'winrar' (free trial), or 'winzip' (evaluation)

Then edit using an HTML editor, or word processor, or worst case, an advanced notepad:
- Free HTML editor: NVU, Kompozer, Microsoft FrontPage
- Free office and word editor: Apache OpenOffice
- Notepads (free): Notepad++, Windows included Notepad or Wordpad.
ProDigit is offline   Reply With Quote
Old 03-10-2018, 03:43 AM   #13
GERGE
Guru
GERGE ought to be getting tired of karma fortunes by now.GERGE ought to be getting tired of karma fortunes by now.GERGE ought to be getting tired of karma fortunes by now.GERGE ought to be getting tired of karma fortunes by now.GERGE ought to be getting tired of karma fortunes by now.GERGE ought to be getting tired of karma fortunes by now.GERGE ought to be getting tired of karma fortunes by now.GERGE ought to be getting tired of karma fortunes by now.GERGE ought to be getting tired of karma fortunes by now.GERGE ought to be getting tired of karma fortunes by now.GERGE ought to be getting tired of karma fortunes by now.
 
Posts: 733
Karma: 5797160
Join Date: Jun 2010
Location: Istanbul
Device: Kobo Libra
Under macOS or Linux it is pretty easy. Create this bash script and make it executable:

Code:
#!/bin/bash
PAT=${1:?"Usage: grep-epub PAT *.epub files to grep"}
shift
: ${1:?"Need epub files to grep"}
for i in $* ;do
  echo $0 $i
  unzip -p $i "*.htm*" "*.xml" "*.opf"   |
    perl -lpe 's![<][^>]{1,200}?[>]!!g;' |
    grep -Pinaso  ".{0,60}$PAT.{0,60}"   |
    grep -Pi --color "$PAT"
done
Put it somewhere in your home and link it to /usr/local/bin, this is how I did it:

Code:
sudo ln -s ~/Apps/CLI/grep-epub.sh /usr/local/bin/grep-epub
Then just use it like this:

To find all occurences of sea in Dubliners:

Code:
grep-epub "sea" Dubliners.epub
To find only the word sea"

Code:
grep-epub " sea " Dubliners.epub
To count, just use it with wc, GNU word count utility.

In Windows, you can use this with Linux subsystem.

Last edited by GERGE; 03-10-2018 at 03:48 AM.
GERGE is offline   Reply With Quote
Old 03-10-2018, 10:39 AM   #14
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 78,985
Karma: 144284074
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by avid01 View Post
I see we have dedicated threads for a few different e-book software, which if good. I have no idea in which software's topic to ask, so maybe I should ask it here?

I want to count for the occurrence of a word across a few EPUB books. What's the easiest or best ways to accomplish that? The search function didn't really give clue to this type of search.
The simplest way to do what you want is to use Calibre. Install the Quality Check plugin. It will allow you to do a search for words in ePub eBooks of your choice. You can select the ePub you want to search and it will find the ePub that contain our search words.
JSWolf is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Word Count in Marvin 3? Deahna Marvin 10 10-31-2017 07:41 PM
Word Count? noirverse Marvin 0 11-11-2016 08:23 PM
word count Tanjamuse Editor 5 11-09-2014 06:31 AM
Word Count leebase Calibre 34 06-07-2011 11:53 PM


All times are GMT -4. The time now is 05:15 PM.


MobileRead.com is a privately owned, operated and funded community.