MobileRead Forums - View Single Post - Get metadata from Libgen based on file md5sum

iamashwin99 · 07-26-2020, 08:40 AM

I have quite a few books that don't have proper metadat and is thus is hard to search and maintain them. I wanted to make a plugin that automatically calculates md5 of a given book and load its metadata from a libgen search.

I have already completed the code to get the md5 and to get data from the libgen, I just wanted some help to wrap it up into a pluggin so that i can clikc a button and the plugin will do the rest. I would really appreciate if some one can guide me to the proper way to make the plugin. (Or if you can point to some other plugin that is similar to this so that I can hack it to do my steps) I have put the code I have come up with so far over here

Here is the main idea

Code:

import hashlib
import os
import requests
import pandas as pd

def getmd5sum(filename):

    md5_hash = hashlib.md5()
    a_file = open(filename, "rb")
    content = a_file.read()
    md5_hash.update(content)
    digest = md5_hash.hexdigest()
    return digest

def getdatafrommd5(md5):
    url ='http://gen.lib.rus.ec/book/index.php?md5=' +md5
    print(url)
    r = requests.get(url)
    if(r.text == "No record with such MD5 hash has been found</body></html>"):
        return  [-1, -1, -1, -1, -1]
    df_list = pd.read_html(r.text) # this parses all the tables in webpages to a list
    table = df_list[0]
    title = table.loc[table[1] == 'Title:',2].tolist()[0]
    author = table.loc[table[1] == 'Author(s):',2].tolist()[0]
    publisher = table.loc[table[1] == 'Publisher:',2].tolist()[0]
    series = table.loc[table[1] == 'Series:',2].tolist()[0]
    year = table.loc[table[1] == 'Year:',2].tolist()[0]
    return  [title, author, publisher, series, year]