@pittendrigh...It would't be difficult to write a python command line script to extract head and html tags from each epub file. Just export all the epub xhtml files from Sigil to a directory and then run a python script from that directory that does something like this(using python 3.4):
Code:
from bs4 import BeautifulSoup
# get the file list(separated by spaces) as a string from the command prompt
print("\nInput html file names:\n")
files = raw-input()
# convert input string to file list
file_list = []
file_list = files.split()
# remove the html and head tags
for file in file_list:
outfile = 'new_' + file
outfp = open(outfile, "wt", encoding="utf-8")
html = open(file, 'rt', encoding='utf-8').read()
soup = BeautifulSoup(html, 'html5lib') # parses like a web brower
for html_tag in soup.find('html', limit=1)
if html_tag:
html_tag.extract()
for head_tag in soup.find_all('head', limit=1)
if head_tag:
head_tag.extract()
outfp.writelines(str(soup))
outfp.close()
....Or you could perhaps even do it without using BeautifulSoup like this:
Code:
# get the file list as a string from the command prompt
print("\nInput html file names:\n")
files = raw-input()
# convert input string to file list
file_list = []
file_list = files.split()
# remove the html and head tags
for file in file_list:
outfile = 'new_' + file
outfp = open(outfile, 'wt', encoding=utf-8)
with open(file, 'rt', encoding='utf-8') as infp:
for line in infp:
if line.lstrip().startstwith('<html') or \
line.lstrip().startstwith('</html>') or \
line.lstrip().startstwith('<head') or \
line.lstrip().startstwith('</head'>):
continue
else:
outfp.write(line)
outfp.close()
You could probably also use Tidy in a script to do exactly the same as the above.