|
|
#1 |
|
Enthusiast
![]() Posts: 28
Karma: 10
Join Date: May 2010
Device: Kindle
|
Understanding html input plugin
Can someone point me to the documentation or source for the html input plugin? I need to understand better what it is doing. Sorry for the stupid questions, but I am learning as I go. Deeply appreciative that Calibre exists and of what it does. Fred
|
|
|
|
|
|
#2 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,626
Karma: 28549046
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Look for plugins/html_input.py in the source code.
|
|
|
|
| Advert | |
|
|
|
|
#3 |
|
Enthusiast
![]() Posts: 28
Karma: 10
Join Date: May 2010
Device: Kindle
|
So it looks to me that the function doing all the file rewriting is rewrite_links which is defined in base.py and removes all absolute links. Then it looks for a CSS file and cssutils parses that. Is this broadly correct?
Code:
def rewrite_links(root, link_repl_func, resolve_base_href=False):
'''
Rewrite all the links in the document. For each link
``link_repl_func(link)`` will be called, and the return value
will replace the old link.
Note that links may not be absolute (unless you first called
``make_links_absolute()``), and may be internal (e.g.,
``'#anchor'``). They can also be values like
``'mailto:email'`` or ``'javascript:expr'``.
If the ``link_repl_func`` returns None, the attribute or
tag text will be removed completely.
'''
from cssutils import parseString, parseStyle, replaceUrls, log
log.setLevel(logging.WARN)
if resolve_base_href:
resolve_base_href(root)
for el, attrib, link, pos in iterlinks(root, find_links_in_css=False):
new_link = link_repl_func(link.strip())
if new_link == link:
continue
if new_link is None:
# Remove the attribute or element content
if attrib is None:
el.text = ''
else:
del el.attrib[attrib]
continue
if attrib is None:
new = el.text[:pos] + new_link + el.text[pos+len(link):]
el.text = new
else:
cur = el.attrib[attrib]
if not pos and len(cur) == len(link):
# Most common case
el.attrib[attrib] = new_link
else:
new = cur[:pos] + new_link + cur[pos+len(link):]
el.attrib[attrib] = new
def set_property(v):
if v.CSS_PRIMITIVE_VALUE == v.cssValueType and \
v.CSS_URI == v.primitiveType:
v.setStringValue(v.CSS_URI,
link_repl_func(v.getStringValue()))
for el in root.iter():
try:
tag = el.tag
except UnicodeDecodeError:
continue
if tag == XHTML('style') and el.text and \
(_css_url_re.search(el.text) is not None or '@import' in
el.text):
stylesheet = parseString(el.text)
replaceUrls(stylesheet, link_repl_func)
repl = stylesheet.cssText
if isbytestring(repl):
repl = repl.decode('utf-8')
el.text = '\n'+ repl + '\n'
if 'style' in el.attrib:
text = el.attrib['style']
if _css_url_re.search(text) is not None:
try:
stext = parseStyle(text)
except:
# Parsing errors are raised by cssutils
continue
for p in stext.getProperties(all=True):
v = p.cssValue
if v.CSS_VALUE_LIST == v.cssValueType:
for item in v:
set_property(item)
elif v.CSS_PRIMITIVE_VALUE == v.cssValueType:
set_property(v)
repl = stext.cssText.replace('\n', ' ').replace('\r',
' ')
if isbytestring(repl):
repl = repl.decode('utf-8')
el.attrib['style'] = repl
|
|
|
|
|
|
#4 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,626
Karma: 28549046
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
That is the resolving of linked resources, yes. You are probably more interested in HTML parsing. Look at parse_utils.py and preprocess.py for that.
|
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Plugin not customizable: Plugin: HTML Output does not need customization | flyingfoxlee | Conversion | 2 | 02-24-2012 03:24 AM |
| telling the input plugin to allow a rel=nofollow | nimblebooks | Conversion | 0 | 02-22-2012 06:01 PM |
| HTML input plugin stripping text within toc tags in child html file | nimblebooks | Conversion | 3 | 02-21-2012 04:24 PM |
| Plugin which uses net as input and output | medve | Development | 0 | 12-04-2011 04:20 PM |
| Looking For MHT Input Conversion Plugin | FlooseMan Dave | Plugins | 4 | 03-30-2010 06:52 PM |