MobileRead Forums - View Single Post - extract node text at epubcfi/last_read_position

wwfn · 06-18-2022, 11:22 PM

cfiepub in last_read_positions is exciting metadata! I'm hoping to play around with it -- first trying to extract the node/text at the identifier/last read position. Is this reasonable/possible with code already in calibre?

I think I'm stuck on building concatenated html from an epub container. I imagine there is already a container method to generate this. But I haven't found it yet. Or maybe I'm approaching it all wrong. Any pointers? (initial attempt below)

If that's possible, I'd also like to generate a fragment identifier given a node of an epub tree. Is this something that can be done from python? That code looks like it's in the pyj files (?)

Thanks!

Code:

import init_calibre
import calibre
from calibre.ebooks.oeb.polish.container import get_container
from calibre.ebooks.epub.cfi.parse import parser as cfi_parser, decode_cfi
from calibre.ebooks.oeb.polish.parsing import parse as parse_book


# select path from book where id = 296;
fname_epub = '/path/to/my/file296.epub'
# select cfi from last_read_positions where book = 296;
cfi_str='/36/2/4[x9780525538332_EPUB-16]/2/6/1:46'
container = get_container(fname_epub, tweak_mode=False)
cfi = cfi_parser().parse_path(cfi_str)

# calibre/gui2/tweak_book/boss.py uses editor.get_raw_data()
# maybe combine container.mime_map and then calibre.ebooks.oeb.polish.parsing?
raw_data = .... #? 
root = parse_book(
    raw_data, decoder=lambda x: x.decode('utf-8'),
    line_numbers=True, linenumber_attribute='data-lnum')

node = decode_cfi(root, cfi)

06-18-2022, 11:22 PM	#1
wwfn Junior Member Posts: 1 Karma: 10 Join Date: Jun 2022 Device: emacs	extract node text at epubcfi/last_read_position cfiepub in last_read_positions is exciting metadata! I'm hoping to play around with it -- first trying to extract the node/text at the identifier/last read position. Is this reasonable/possible with code already in calibre? I think I'm stuck on building concatenated html from an epub container. I imagine there is already a container method to generate this. But I haven't found it yet. Or maybe I'm approaching it all wrong. Any pointers? (initial attempt below) If that's possible, I'd also like to generate a fragment identifier given a node of an epub tree. Is this something that can be done from python? That code looks like it's in the pyj files (?) Thanks! Code: import init_calibre import calibre from calibre.ebooks.oeb.polish.container import get_container from calibre.ebooks.epub.cfi.parse import parser as cfi_parser, decode_cfi from calibre.ebooks.oeb.polish.parsing import parse as parse_book # select path from book where id = 296; fname_epub = '/path/to/my/file296.epub' # select cfi from last_read_positions where book = 296; cfi_str='/36/2/4[x9780525538332_EPUB-16]/2/6/1:46' container = get_container(fname_epub, tweak_mode=False) cfi = cfi_parser().parse_path(cfi_str) # calibre/gui2/tweak_book/boss.py uses editor.get_raw_data() # maybe combine container.mime_map and then calibre.ebooks.oeb.polish.parsing? raw_data = .... #? root = parse_book( raw_data, decoder=lambda x: x.decode('utf-8'), line_numbers=True, linenumber_attribute='data-lnum') node = decode_cfi(root, cfi)