Splitting multiple html files?

nqk · 11-23-2015, 08:57 PM

Hi you guys,

I know that Editor can split a single html files using xpath. It is great. But I wonder if there is a way to split all the html files at the same time (something like "split mark" in Sigil).

Before I saved all the footnotes at the end of the respective htmls, now I want to merge them into a single endnote file. I have to move to every html and split and merge...

Ah, I used file_name in Regex Function and it returns the whole html path (I can use regex to strip off the unwanted part) but is there a way to get only the name, not the extension? (I use it for note IDs)

jbacelar · 11-24-2015, 02:19 AM

I also use file_name (full) for note IDs. But, because you need to remove the extension?

To extract notes from all files and dump them in a specific file (notas.xhtml), as I have not sufficient knowledge of Python, I do the following:
1- I make notas.xhtml
2- I use this regex-function

Code:

#Searching: (<p class="nota".+?>.+?</p>)

def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    notas = open('e:/Libros/Taller/En curso/notas.txt', 'a')
    texto = match.group()+'\n'  
    notas.write(texto)
    return ''
replace.file_order = 'spine'

3- I do copy-paste of notas.txt to notas.xhtml

And sorry for my english.

nqk · 11-24-2015, 02:23 AM

Thank you. I will try to play with that. I'm no programmer, though.

What I did is search for "#n(\d+)" (in #n1, for example)

Code:

def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    text='#'
    text2= '_'
    return text + file_name + text2 + match.group(1)

And it would return:
#OEBPS/1.html_1

I would only want:
#1_1

(of course, i could use regex to clean the unwanted portion afterward, but I would be nicer to have it done in one regex function, and I could learn something as well)

jbacelar · 11-25-2015, 02:19 AM

You can use:
file_name = file_name [6:len(file_name)-5]

nqk · 11-26-2015, 08:38 PM

Quote:

Originally Posted by jbacelar

You can use:
file_name = file_name [6:len(file_name)-5]

Lovely.

jbacelar · 11-27-2015, 02:08 AM

You are welcome.

11-24-2015, 02:19 AM	#2
jbacelar Interested in the matter Posts: 421 Karma: 426094 Join Date: Dec 2011 Location: Spain, south coast Device: Pocketbook InkPad 3	I also use file_name (full) for note IDs. But, because you need to remove the extension? To extract notes from all files and dump them in a specific file (notas.xhtml), as I have not sufficient knowledge of Python, I do the following: 1- I make notas.xhtml 2- I use this regex-function Code: #Searching: (<p class="nota".+?>.+?</p>) def replace(match, number, file_name, metadata, dictionaries, data, functions, args, *kwargs): notas = open('e:/Libros/Taller/En curso/notas.txt', 'a') texto = match.group()+'\n' notas.write(texto) return '' replace.file_order = 'spine' 3- I do copy-paste of notas.txt to notas.xhtml And sorry for my english.

11-24-2015, 02:23 AM	#3
nqk Guru Posts: 607 Karma: 32228 Join Date: Feb 2012 Device: Onyx Boox Leaf	Thank you. I will try to play with that. I'm no programmer, though. What I did is search for "#n(\d+)" (in #n1, for example) Code: def replace(match, number, file_name, metadata, dictionaries, data, functions, args, kwargs): text='#' text2= '_' return text + file_name + text2 + match.group(1) And it would return: #OEBPS/1.html_1 I would only want: #1_1 (of course, i could use regex to clean the unwanted portion afterward, but I would be nicer to have it done in one regex function, and I could learn something as well) Last edited by nqk; 11-24-2015 at 02:32 AM.*

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Merging multiple HTML files into one HTML file	skoobwoman	Workshop	45	07-11-2014 10:46 AM
splitting html files?	NASCARaddicted	ePub	8	01-22-2013 04:13 AM
How To Stop It From Splitting HTML Files?	Ransom	Calibre	8	06-12-2011 02:08 PM
Does splitting EPUB among more HTML files improve Performance?	purcelljf	ePub	2	10-01-2010 01:15 AM
Splitting the Bible into Multiple Files	SciFiGal777	Ectaco jetBook	3	03-27-2010 09:35 PM

11-23-2015, 08:57 PM	#1
nqk Guru Posts: 607 Karma: 32228 Join Date: Feb 2012 Device: Onyx Boox Leaf	Splitting multiple html files? Hi you guys, I know that Editor can split a single html files using xpath. It is great. But I wonder if there is a way to split all the html files at the same time (something like "split mark" in Sigil). Before I saved all the footnotes at the end of the respective htmls, now I want to merge them into a single endnote file. I have to move to every html and split and merge... Ah, I used file_name in Regex Function and it returns the whole html path (I can use regex to strip off the unwanted part) but is there a way to get only the name, not the extension? (I use it for note IDs)

11-25-2015, 02:19 AM	#4
jbacelar Interested in the matter Posts: 421 Karma: 426094 Join Date: Dec 2011 Location: Spain, south coast Device: Pocketbook InkPad 3	You can use: file_name = file_name [6:len(file_name)-5]

11-27-2015, 02:08 AM	#6
jbacelar Interested in the matter Posts: 421 Karma: 426094 Join Date: Dec 2011 Location: Spain, south coast Device: Pocketbook InkPad 3	You are welcome.