MobileRead Forums - View Single Post - PageEdit-2.0.0 Released

KevinH · 10-12-2023, 10:06 AM

I am a little worried that this might be a calibre related issue. I searched the calibre github site for QMimedata usage and found the following code snippet that actually removes tables from html when copying text to the clipboard.

As I am unfamiliar with the calibre code base, I have no idea if this routine is being invoked at all when copying out of calibre's Normal view.

Code:

def copy_all(text_browser):
    mf = getattr(text_browser, 'details', text_browser)
    c = QApplication.clipboard()
    md = QMimeData()
    html = mf.toHtml()
    md.setHtml(html)
    from html5_parser import parse
    from lxml import etree
    root = parse(html)
    tables = tuple(root.iterdescendants('table'))
    for tag in root.iterdescendants(('table', 'tr', 'tbody')):
        tag.tag = 'div'
    parent = root
    is_vertical = getattr(text_browser, 'vertical', True)
    if not is_vertical:
        parent = tables[1]
    for tag in parent.iterdescendants('td'):
        for child in tag.iterdescendants('br'):
            child.tag = 'span'
            child.text = '\ue000'
        tt = etree.tostring(tag, method='text', encoding='unicode')
        tag.tag = 'span'
        for child in tuple(tag):
            tag.remove(child)
        tag.text = tt.strip()
    if not is_vertical:
        for tag in root.iterdescendants('td'):
            tag.tag = 'div'
    for tag in root.iterdescendants('a'):
        tag.attrib.pop('href', None)
    from calibre.utils.html2text import html2text
    simplified_html = etree.tostring(root, encoding='unicode')
    txt = html2text(simplified_html, single_line_break=True).strip()
    txt = txt.replace('\ue000', '\n\t')
    if iswindows:
        txt = os.linesep.join(txt.splitlines())
    # print(simplified_html)
    # print(txt)
    md.setText(txt)
    c.setMimeData(md)

Not sure why anyone would want to simplify the html by removing tables. So copy in calibre creates two different formats, one html and the second a simplified html.

PageEdit on Windows seems to default to the latter one based on BetterRed's testing.

I will create a debug PageEdit version to list and dump all of the formats in the QClipboard QMimedata when Edit->Paste is invoked just to verify.