View Single Post
Old 10-12-2023, 10:06 AM   #43
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,884
Karma: 6120478
Join Date: Nov 2009
Device: many
I am a little worried that this might be a calibre related issue. I searched the calibre github site for QMimedata usage and found the following code snippet that actually removes tables from html when copying text to the clipboard.

As I am unfamiliar with the calibre code base, I have no idea if this routine is being invoked at all when copying out of calibre's Normal view.

Code:
def copy_all(text_browser):
    mf = getattr(text_browser, 'details', text_browser)
    c = QApplication.clipboard()
    md = QMimeData()
    html = mf.toHtml()
    md.setHtml(html)
    from html5_parser import parse
    from lxml import etree
    root = parse(html)
    tables = tuple(root.iterdescendants('table'))
    for tag in root.iterdescendants(('table', 'tr', 'tbody')):
        tag.tag = 'div'
    parent = root
    is_vertical = getattr(text_browser, 'vertical', True)
    if not is_vertical:
        parent = tables[1]
    for tag in parent.iterdescendants('td'):
        for child in tag.iterdescendants('br'):
            child.tag = 'span'
            child.text = '\ue000'
        tt = etree.tostring(tag, method='text', encoding='unicode')
        tag.tag = 'span'
        for child in tuple(tag):
            tag.remove(child)
        tag.text = tt.strip()
    if not is_vertical:
        for tag in root.iterdescendants('td'):
            tag.tag = 'div'
    for tag in root.iterdescendants('a'):
        tag.attrib.pop('href', None)
    from calibre.utils.html2text import html2text
    simplified_html = etree.tostring(root, encoding='unicode')
    txt = html2text(simplified_html, single_line_break=True).strip()
    txt = txt.replace('\ue000', '\n\t')
    if iswindows:
        txt = os.linesep.join(txt.splitlines())
    # print(simplified_html)
    # print(txt)
    md.setText(txt)
    c.setMimeData(md)
Not sure why anyone would want to simplify the html by removing tables. So copy in calibre creates two different formats, one html and the second a simplified html.

PageEdit on Windows seems to default to the latter one based on BetterRed's testing.

I will create a debug PageEdit version to list and dump all of the formats in the QClipboard QMimedata when Edit->Paste is invoked just to verify.

Last edited by KevinH; 10-12-2023 at 10:09 AM.
KevinH is offline   Reply With Quote