Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Development

Notices

Reply
 
Thread Tools Search this Thread
Old 06-16-2021, 11:26 PM   #1
xxyzz
Evangelist
xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.
 
Posts: 413
Karma: 2666666
Join Date: Nov 2020
Device: none
Failed to extract text from gutenberg books

I get the following error on some gutenberg books when I call "MobiReader.extract_text()". For example, both Kindle ebooks of Alice's Adventures in Wonderland at https://www.gutenberg.org/ebooks/11 will cause this error.

Code:
 mobiReader.extract_text()
  File "calibre/ebooks/mobi/reader/mobi6.py", line 802, in extract_text
  File "calibre/ebooks/mobi/reader/mobi6.py", line 802, in <listcomp>
  File "calibre/ebooks/mobi/reader/mobi6.py", line 797, in text_section
  File "calibre/ebooks/mobi/reader/mobi6.py", line 787, in sizeof_trailing_entries
TypeError: ord() expected a character, but string of length 0 found
KindleUnpack doesn't have this issue, I find the similar code at https://github.com/kevinhendricks/Ki...r.py#L816-L830 but don't know how to fix the bug.
xxyzz is online now   Reply With Quote
Old 06-16-2021, 11:32 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
converting pg11.mobi works fine for me, which uses that same code.
kovidgoyal is offline   Reply With Quote
Old 06-16-2021, 11:40 PM   #3
xxyzz
Evangelist
xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.
 
Posts: 413
Karma: 2666666
Join Date: Nov 2020
Device: none
I forget to mention I also run the following code before calling "extract_text()"

Code:
 with open(book_path, 'r+b') as f:
    mu = MetadataUpdater(f)
    mu.update(mi, asin="BBJH94AM2L")
xxyzz is online now   Reply With Quote
Old 06-17-2021, 12:43 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Doing

ebook-meta -t XXX pg11.mobi && ebook-convert pg11.mobi .epub

also works fine for me.
kovidgoyal is offline   Reply With Quote
Old 06-17-2021, 01:18 AM   #5
xxyzz
Evangelist
xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.
 
Posts: 413
Karma: 2666666
Join Date: Nov 2020
Device: none
I not sure which part of the plugin code causes the error but here are the steps to reproduce it:
  1. add the mobi book to calibre
  2. install WordDumb plugin
  3. use this plugin on the book

Only `MetadataUpdater.update(mi, asin=asin)` changes the book file, I also add the "mobi-asin" identifier to the book metadata.

The plugin code:
xxyzz is online now   Reply With Quote
Old 06-17-2021, 01:41 AM   #6
xxyzz
Evangelist
xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.
 
Posts: 413
Karma: 2666666
Join Date: Nov 2020
Device: none
I move the code to a file:

Code:
#!/usr/bin/env python3

from calibre.library import db
from calibre.utils.logging import default_log
from calibre.ebooks.mobi.reader.mobi6 import MobiReader

lib_db = db('~/Calibre Library').new_api
alice_id = 0
for book_id in lib_db.all_book_ids():
    mi = lib_db.get_metadata(book_id)
    if mi.get('title') == "Alice's Adventures in Wonderland":
        alice_id = book_id
        break

book_path = lib_db.format_abspath(alice_id, 'MOBI')
mobiReader = MobiReader(book_path, default_log)
mobiReader.extract_text()
Use calibre-debug to run this code should reproduce the error.

Last edited by xxyzz; 06-17-2021 at 01:45 AM.
xxyzz is online now   Reply With Quote
Old 06-17-2021, 02:46 AM   #7
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
That's a joint mobi file, you cant extract it like that, see plugins/mobi_input.py for how to do it.
kovidgoyal is offline   Reply With Quote
Old 06-17-2021, 05:45 AM   #8
xxyzz
Evangelist
xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.
 
Posts: 413
Karma: 2666666
Join Date: Nov 2020
Device: none
I wasn't aware of this type of book, I should read the code more carefully. Thanks!
xxyzz is online now   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
How to extract text and images from an .mobi file (ebook)? Arkadya Workshop 7 02-28-2019 05:14 AM
Failed to Convert Gutenberg MOBI into DOCX CrossReach Conversion 3 08-31-2016 06:58 PM
Extract PDF text and store in custom column diazlaz Development 2 12-30-2013 10:00 PM
Best format to extract text from speed vs accuracy Txomin Conversion 6 02-07-2013 12:54 AM
Text tool for formatting Gutenberg text files bob_ninja Workshop 5 11-13-2007 12:28 PM


All times are GMT -4. The time now is 08:33 PM.


MobileRead.com is a privately owned, operated and funded community.