![]() |
#1 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() Posts: 64
Karma: 666
Join Date: May 2020
Location: Germany
Device: android smartphone + tablet with Moon Reader and ReadEra Apps.
|
[GUI Plugin] TextDiff
[GUI Plugin] TextDiff - Version 1.3.1 - 09-27-2025
A Calibre GUI plugin for finding text differences in two book formats. Main features: -------------- This plugin shows the differences between two selected book formats. The formats are first converted to text format (even if the source format is already text) with Calibre's convert utility (https://manual.calibre-ebook.com/gen...k-convert.html). If the conversion fails, the format has no text content (as scanned PDF files) or Calibre cannot find an appropriate conversion tool (as Microsoft wordconv). Then the text files obtained this way are read into memory and possibly edited (removing blank lines, soft hyphens, ...). Then the compare is done with Python's DiffLib (https://docs.python.org/3/library/difflib.html). The ratio gives a measure for the similarity of the two texts. 1.0 means the texts are identical, a value near 0.0 means that the texts are complete different. The last thing may also occur, when the source format has no text content (as scanned PDF files). Then one should create a new book format (text) with an extra OCR process. The detailed workflow is as follows: 1. Select a book with at least two formats or two books with at least one format each to compare. 2. Chose two formats. 3. Chose the output format and other comparison options. 4. Hit "Compare". 5. The formats are converted and compared and the result is displayed in the output window. A ratio is also computed and displayed. 6. If wished, copy the comparison output to the clipboard and/or save it to a file and/or save it as book with an suitable format (HTML or text). If you want to compare other formats, repeat step 1 and hit the "Refresh formats" button. Then repeat steps 2 - 5. The "Compare"-Dialog is modeless, what permits to move it around and touch the Calibre screen. Limitations: ------------ - The converted formats are stored as strings in memory, so extreme large formats may run out of memory. Version History: Spoiler:
Installation: ------------- Download the attached zip file and install the plugin as described in the plugins thread on mobileread. You need to add the calibre path to your $PATH variable. To report Bugs and suggestions: ------------------------------- If you find any issues or have suggestions, please report them on GitHub or in the MobileRead Forum. Last edited by feuille; 09-27-2025 at 05:46 AM. Reason: Version 1.3.1 |
![]() |
![]() |
![]() |
#2 | ||
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 80,431
Karma: 150249609
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
Quote:
Overall, I do like the idea of this plugin. Thanks. |
||
![]() |
![]() |
![]() |
#3 | |
Custom User Title
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,248
Karma: 77935877
Join Date: Oct 2018
Location: Canada
Device: Kobo Libra H2O, formerly Aura HD
|
Quote:
But yes, the memory limitation seems... not great. I'd use temp files. This'll be a useful plugin though ![]() |
|
![]() |
![]() |
![]() |
#4 | |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 80,431
Karma: 150249609
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
I do think the memory usage should be fixed right away. |
|
![]() |
![]() |
![]() |
#5 | ||
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 12,522
Karma: 8065528
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Quote:
Quote:
I don't know how many people will be comparing 900,000 page books, or even 488,000 page books. |
||
![]() |
![]() |
![]() |
#6 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() Posts: 64
Karma: 666
Join Date: May 2020
Location: Germany
Device: android smartphone + tablet with Moon Reader and ReadEra Apps.
|
|
![]() |
![]() |
![]() |
#7 | |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 80,431
Karma: 150249609
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
|
|
![]() |
![]() |
![]() |
#8 | |
Custom User Title
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,248
Karma: 77935877
Join Date: Oct 2018
Location: Canada
Device: Kobo Libra H2O, formerly Aura HD
|
Quote:
While I probably wouldn't be running it specifically on those books, PDFs downloaded from the Internet Archive use some sort of layering compression that means that pdftotext can extract gigabytes of image layers into the temp folder alongside the text layer. (This happens when indexing for FTS or running the word count plugin, which should only require the text layer.) Would it try to keep all that in memory? Last edited by ownedbycats; 11-21-2022 at 06:59 AM. |
|
![]() |
![]() |
![]() |
#9 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 12,522
Karma: 8065528
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Quote:
If I were the developer then doing the diffs using files would be low priority, especially if the underlying tool didn't directly support it. FWIW: calibre keeps the entire db in memory. @owndbycats: it converts to txt, which by definition contains only text. Where it gets the text from a layered pdf is another issue. |
|
![]() |
![]() |
![]() |
#10 |
Custom User Title
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,248
Karma: 77935877
Join Date: Oct 2018
Location: Canada
Device: Kobo Libra H2O, formerly Aura HD
|
Yeah, just mentioned it because other things that I thought should be text-only (full-text indexing and the Count Pages plugin) I've seen it extracting the images into temp.
|
![]() |
![]() |
![]() |
#11 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 80,431
Karma: 150249609
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
I've given TextDiff a try and it's pretty good. Thanks for creating this plugin.
|
![]() |
![]() |
![]() |
#12 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() Posts: 64
Karma: 666
Join Date: May 2020
Location: Germany
Device: android smartphone + tablet with Moon Reader and ReadEra Apps.
|
|
![]() |
![]() |
![]() |
#13 |
want to learn what I want
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,673
Karma: 7908443
Join Date: Sep 2020
Device: none
|
Thank you for this plugin! I plan to use it more, in the future, to compare Orwell's 1984 translations, but I just gave it a try for testing purposes and noticed the HTML output to file will be great for this use case, as it displays the text differences side-by-side in a very synchronized fashion.
some quick notes:
![]() Last edited by Comfy.n; 11-24-2022 at 07:09 AM. |
![]() |
![]() |
![]() |
#14 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() Posts: 64
Karma: 666
Join Date: May 2020
Location: Germany
Device: android smartphone + tablet with Moon Reader and ReadEra Apps.
|
Thank you Comfy.n for the hints! I'll try to fix that. For a better feedback of the program status I had already considered a progress bar.
|
![]() |
![]() |
![]() |
#15 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() Posts: 64
Karma: 666
Join Date: May 2020
Location: Germany
Device: android smartphone + tablet with Moon Reader and ReadEra Apps.
|
Version 1.1.0 is out
Hello Comfy.n,
I implemented two of your hints (see history). In order to calculate the display time of a progress bar, I stopped intermediate times in the program flow. To my surprise, the HTML rendering of the text browser widget consumes 2/3 of the total runtime! I may implement in a future release a workaround like stepwise asynchronous loading with a timer or so, but I need to do some research on that. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
[GUI Plugin] KindleUnpack - The Plugin | DiapDealer | Plugins | 527 | 08-15-2025 01:36 PM |
[GUI Plugin] Noosfere_util, a companion plugin to noosfere DB | lrpirlet | Plugins | 2 | 08-18-2022 03:15 PM |
[GUI Plugin] Save Virtual Libraries To Column (GUI) | chaley | Plugins | 14 | 04-04-2021 05:25 AM |
[GUI Plugin] Plugin Updater **Deprecated** | kiwidude | Plugins | 159 | 06-19-2011 12:27 PM |