MobileRead Forums - View Single Post - How do I use HTML headings as a source for metadata

Axius27 · 07-26-2023, 11:20 PM

A while ago, I made an archive of a website that I needed to use offline. On this website are around 700 articles stored in individual HTML files that are full of useful info that I now want to load into my eReader app. However, I have hit an issue with importing them directly, as the title on the webpage isn't the name of the article, it's the website itself. An unedited import leads to 700 identical entries. However, the actual titles of the articles are stored within the HTML files under the headings tags (specifically,

Code:

<h1>

for the title, and

Code:

<h2>

for the author's name).

Is there any way to get Calibre to automatically extract these headings and use them for metadata entries, or at least change the title to the name of the article?

07-26-2023, 11:20 PM	#1
Axius27 Junior Member Posts: 5 Karma: 10 Join Date: Jul 2023 Device: Android Phone	How do I use HTML headings as a source for metadata A while ago, I made an archive of a website that I needed to use offline. On this website are around 700 articles stored in individual HTML files that are full of useful info that I now want to load into my eReader app. However, I have hit an issue with importing them directly, as the title on the webpage isn't the name of the article, it's the website itself. An unedited import leads to 700 identical entries. However, the actual titles of the articles are stored within the HTML files under the headings tags (specifically, Code: <h1> for the title, and Code: <h2> for the author's name). Is there any way to get Calibre to automatically extract these headings and use them for metadata entries, or at least change the title to the name of the article?