Class BookNames

java.lang.Object
  extended by BookNames

public class BookNames
extends java.lang.Object

Used to get ISBNs for books.

Author:
Matthew Einhorn

Field Summary
static java.lang.String cmd1
          To run a cmd prompt in java on windows rt.excec(new String[]{"", "", ""...}) input is "cmd", "/c"...
static java.lang.String cmd2
          To run a cmd prompt in java on windows rt.excec(new String[]{"", "", ""...}) input is "cmd", "/c"...
static java.lang.String fISBN
          The text following the ISBN number but preceding the extension in the renamed filenames.
private static boolean going
          Used as a flag in isbnDriver().
static long maxDvdSize
          The size in bytes of the DVD used.
static int numberOfISBNsPerBook
          The maximum number of ISBNs to look for/retain per each book.
static int numberOfLines
          The maximum number of lines in a text file that is read into an array when reading the file.
static int numberOfPreviousLines
          The default number of lines that is printed before and after each ISBN found in the book.
static java.lang.String pISBN
          The text preceding the ISBN number in the renamed filenames.
 
Constructor Summary
BookNames()
           
 
Method Summary
private static int badISBN(java.lang.String[] lines, java.lang.String[][] list, java.lang.String tempPath)
          If no ISBNs were found in the book, this more desperate method is used to get anything that is remotely possibly an ISBN.
private static java.lang.String[] checkDupISBN(java.lang.String[][] list, int k, java.lang.String nome)
          This method decides if there are any duplicates in any of many ways.
private static boolean compareETitle(java.lang.String a, java.lang.String b)
          Compares two book titles for equality.
private static boolean comparesToTitle(java.lang.String title, java.lang.String amazonTitle)
          Attempts to compare the title downloaded from amazon or isbndb to the filename of the current book.
static void delDuplicates(java.lang.String filesPath, java.lang.String moveToPath)
          Searches through the folder for duplicate files and moves them to another folder.
static void deleteDvdFiles(java.lang.String listFile, java.lang.String txtFilesPath, java.lang.String tag, java.lang.String movedFilesPath)
          This continues from from the above method prepereDVD().
static void extractCHM(java.lang.String chmOriginPath, java.lang.String txtResultPath)
          Generates a DOS cmd command for the program minetext (http://text-mining-tool.com/) to extract all the text from each chm file found in the chmOriginPath directory and save it as a text file in txtResultPath.
static java.lang.String extractISBN(java.lang.String line)
          Searches for and extracts an ISBN number from line.
static void genPDFTKcat(java.lang.String originalPath, java.lang.String pdftkOutputPath, java.lang.String isbnOutputPath)
          Generates the DOS cmd pdftk command to catenate the first 20 pages of a pdf file for all files in the folder.
static java.lang.String getISBN(java.lang.String filename, java.lang.String nome, int t, java.lang.String tempPath)
          Searches through a text file and returns the correct ISBN number for this book.
static boolean isbn1013check(java.lang.String isbn1, java.lang.String isbn2)
          Compares two ISBNs to see if one is the ISBN-13 and the other is ISBN-10 but both represent the same book.
static void isbnDriver(java.lang.String txtFilesPath, java.lang.String ext, java.lang.String pdfFilesPathOrigin, java.lang.String pdfFilesPathtarget, java.lang.String smallPDFFilePath, java.lang.String backupPath, java.lang.String txtResultPath)
          This is the main program that gets ISBNs and renames the books to their ISBNs.
static void main(java.lang.String[] args)
           
static java.io.File matchesString(java.io.File dir, java.lang.String regex)
          Tries to match regex to the file names of all files in this directory and its sub-directories.
static java.lang.String[] moveFile(java.lang.String pathOrigin, java.lang.String pathTarget, java.lang.String[] files)
          Generates a move command Windows DOS cmd to move a file or directory to a new location.
static void moveFileWithISBNTitle(java.lang.String dirPath, java.lang.String resultPath)
          Given directory dirPath it looks through all the folders and files in dirPath for a file with a legitimate ISBN in the filename.
static void moveSingleFiles(java.lang.String dirPath, java.lang.String resultPath, int fileOrDir)
          It looks through a directory for folders containing only one file or for folders without any files.
static void moveWhatever(java.lang.String dirPath, java.lang.String resultPath, java.lang.String regex)
          Given a directory dirPath it will match each of its files and files in its sub-dirs to string regex.
static java.lang.String pdftkCat(java.lang.String[] originalPDFs, java.lang.String outputFile, java.lang.String tempPath)
          Generates the DOS cmd pdftk (http://www.accesspdf.com/pdftk/) command to catenate the first 20 pages of a list of pdf files into one file.
static void pdftkWhatever(java.lang.String dirPath, java.lang.String resultPath, java.lang.String regex)
          Given a directory dirPath it will match each of its files and files in its sub-dirs to string regex.
static void prepereDVD(java.lang.String listFile, java.lang.String opfPath, java.lang.String txtFilesPath, java.lang.String tag)
          This is used to backup books in calibre to a DVD.
private static void printISBNwithLines(java.lang.String[][] isbns, java.lang.String[] lines, int numberOfLines, int k)
          Used to print the ISBNs found in the book so we can select the correct ISBN.
static java.lang.String[] removeExt(java.lang.String[] s)
          Removes extension of each file name in the array.
static java.lang.String[] renameFile(java.lang.String path, java.lang.String[] original, java.lang.String[] renamed)
          Generates a rename command for Windows DOS cmd.
private static int selectISBN(java.lang.String nome, java.lang.String[][] list, int k)
          Another method to decide which ISBN to select from the list.
static java.lang.String titleIsISBN(java.io.File dir)
          Looks for a ISBN number in the name of the file or directory.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

numberOfLines

public static final int numberOfLines
The maximum number of lines in a text file that is read into an array when reading the file. It's limited due to the hard-coding of this value variable when creating arrays.

See Also:
Constant Field Values

numberOfPreviousLines

public static int numberOfPreviousLines
The default number of lines that is printed before and after each ISBN found in the book. See method printISBNwithLines().


numberOfISBNsPerBook

public static int numberOfISBNsPerBook
The maximum number of ISBNs to look for/retain per each book.


going

private static boolean going
Used as a flag in isbnDriver().


maxDvdSize

public static final long maxDvdSize
The size in bytes of the DVD used. See method prepereDVD()

See Also:
Constant Field Values

pISBN

public static final java.lang.String pISBN
The text preceding the ISBN number in the renamed filenames. i.e. t;4444444444444.

See Also:
Constant Field Values

fISBN

public static final java.lang.String fISBN
The text following the ISBN number but preceding the extension in the renamed filenames. i.e. 4444444444444whatever.rar

See Also:
Constant Field Values

cmd1

public static final java.lang.String cmd1
To run a cmd prompt in java on windows rt.excec(new String[]{"", "", ""...}) input is "cmd", "/c"... On linux it's "sh", "-c"... Select the correct values for your OS.

See Also:
Constant Field Values

cmd2

public static final java.lang.String cmd2
To run a cmd prompt in java on windows rt.excec(new String[]{"", "", ""...}) input is "cmd", "/c"... On linux it's "sh", "-c"... Select the correct values for your OS.

See Also:
Constant Field Values
Constructor Detail

BookNames

public BookNames()
Method Detail

main

public static void main(java.lang.String[] args)

removeExt

public static java.lang.String[] removeExt(java.lang.String[] s)
Removes extension of each file name in the array. Doesn't check for filenames w/o extension.

Parameters:
s - An array of filenames.
Returns:
The same array but with each extension removed from filename.

renameFile

public static java.lang.String[] renameFile(java.lang.String path,
                                            java.lang.String[] original,
                                            java.lang.String[] renamed)
Generates a rename command for Windows DOS cmd.

Parameters:
The - full Path of the directory where the original files are present.
original - The list of filenames of the original files. Only the names - excluding the path.
renamed - The new filename for the original files. Only the names - excluding the path. renamed[i] corresponds to original[i].
Returns:
A list of commands to rename these files.

moveFile

public static java.lang.String[] moveFile(java.lang.String pathOrigin,
                                          java.lang.String pathTarget,
                                          java.lang.String[] files)
Generates a move command Windows DOS cmd to move a file or directory to a new location.

Parameters:
pathOrigin - The full Path of the directory where the files are present.
pathTarget - The full Path of the directory where the files will be moved to.
files - The list of filenames to be moved. Only the names - excluding the path.
Returns:
A list of commands to move these files.

pdftkCat

public static java.lang.String pdftkCat(java.lang.String[] originalPDFs,
                                        java.lang.String outputFile,
                                        java.lang.String tempPath)
Generates the DOS cmd pdftk (http://www.accesspdf.com/pdftk/) command to catenate the first 20 pages of a list of pdf files into one file. Input can be up to and including 26 files. It will extract the first 20 pages of each file and catenate them into one file. In order to properly prepare the command, some info on the PDFs has to be determined. Consequently the command prompt is run within this method - make sure the cmd1, cmd2 variables are correctly set.

Parameters:
originalPDFs - a list of filenames including full path i.e. C:\myName1.pdf which will be extracted and catnated into one pdf file
outputFile - The full path and name of the file that the generated pdf will be saved as i.e. C:\newFile.pdf.
tempPath - A full path to a directory where some temp files will be saved.
Returns:
the DOS cmd ready pdftk cat command.

genPDFTKcat

public static void genPDFTKcat(java.lang.String originalPath,
                               java.lang.String pdftkOutputPath,
                               java.lang.String isbnOutputPath)
Generates the DOS cmd pdftk command to catenate the first 20 pages of a pdf file for all files in the folder. It also checks each pdf file name if it contains an ISBN, if it does, it's renamed and moved to a new folder without pdftk catenation on that file.

Parameters:
originalPath - The directory/full path where the pdf files to be extracted are.
pdftkOutputPath - the full path where the files to be extracted by pdftk will be saved.
isbnOutputPath - The full path where the files with ISBNs in their names will be moved to when renamed. It saves 3 text files. One the generated pdftk command. Two, a list of files that had ISBNs listedas the old name followed by the new name i.e. File3333333333333.pdf :-->: t;3333333333333.pdf for backup. Third, a list of files with ISBNs where rename failed and pdftk wasn't generated for that file.

delDuplicates

public static void delDuplicates(java.lang.String filesPath,
                                 java.lang.String moveToPath)
Searches through the folder for duplicate files and moves them to another folder. Assumes that duplicates are written in format - abcd.*, abcd (1).*, abcd (2).*, abcd (3).*... The method only works for files with upto 9 duplicates per file. Wouldn't detect any duplicates above 9.

Parameters:
filesPath - The full path of the directory where it should search for duplicates.
moveToPath - The full path of the directory where the duplicates will be moved. It also saves a text file listing all the duplicates that failed to be deleted.

isbnDriver

public static void isbnDriver(java.lang.String txtFilesPath,
                              java.lang.String ext,
                              java.lang.String pdfFilesPathOrigin,
                              java.lang.String pdfFilesPathtarget,
                              java.lang.String smallPDFFilePath,
                              java.lang.String backupPath,
                              java.lang.String txtResultPath)
This is the main program that gets ISBNs and renames the books to their ISBNs. There are three elements, the original book, the extracted text file from the original book and, with PDFs, the smaller sized PDFs from which the text files were extracted. This assumes that aside from the extension all three elements have the same name for the files. i.e. a pdf file, its corresponding text and smaller pdf file all have the same names. The program is given the paths for these files and it then looks at each text file and if it finds the ISBN in the file all three elements are renamed to the ISBN based on the above variables pISBN and fISBN and is than moved to their corresponding folders. If it was unable to rename and move the file the original and proposed filenames are saved to a text file named failed. If it didn't fail than the old name and the new name of the files are saved in a backup.txt file. When the program starts running it iterates between the text files and there are some options when getting the ISBNs to control it. The options are listed in the beginning of each session. In all the paths, never end with the path separator i.e. never write C:\newBooks\ and always write instead C:\newBooks since I assume that you don't. Since all the renames and moves are done instantly if the program crashes the only thing lost would the backup and failed file so it's good to save frequently - as shown in the options. Running the program on the same folder or resuming a session after save - Since the files are moved instantly it is safe to frequently save and start again the program, however, because the the backup and failed text files are always appended to the same text files this could potentially result in the deletion of the backup file from previous sessions. I only saw it happen if the new/appended text were empty in which case it cleared the whole text file. Perhaps the backup and failed text file should be moved or renamed after each save.

Parameters:
txtFilesPath - The full path of the directory where the extracted text files were saved to and currently are in.
ext - The extension of the original book. i.e. pdf, chm, djv and NOT .pdf, .chm etc.
pdfFilesPathOrigin - The full path of the directory where the original books are.
pdfFilesPathtarget - The full path of the directory where the original books will be moved to after they are renamed to their ISBNs
smallPDFFilePath - The full path of the directory where the smaller sized extracted pdfs are in. If there are no such files or the book isn't PDF set to null
backupPath - The full path of the directory where the smaller sized PDFs and extracted text files will be moved to for backup purposes after the ISBN has been extracted.
txtResultPath - The full path of the directory where the text files for failed and backup will be saved to.

getISBN

public static java.lang.String getISBN(java.lang.String filename,
                                       java.lang.String nome,
                                       int t,
                                       java.lang.String tempPath)
Searches through a text file and returns the correct ISBN number for this book. Decisions as to what is the correct ISBN are made by user or program. The exact user options in selecting the correct ISBN is printed in the beginning. To see them again type in help when some input is required. When input is required the commands to enter are this. help - to get help. z[ISBN Number] - such as z978-0-676767876 or z987678987x to set the numbers following the letter z as the ISBN. it's safe to have "-", among the numbers. m[any number] - such as m5 to reprint the list with all isbn numbers but precede and follow the ISBN with [any number] of lines of text which precede and follow the ISBN in the text file. This helps decide on the nature of the ISBN Number. The default number of lines printed are set in the class variables. [any number] - such as 3, 5... to select that ISBN for this book. When listed, each ISBN is listed with a number, use that number to select the desired ISBN number. "" - i.e. just hit enter without input and it will select the ISBN that was listed last in the list. n - to indicate that this book should be skipped and not renamed or moved. When n is selected the book is going to be searched again for ISBNs but with more depth. If n is selected the second time, this book will be skipped. s - to indicate that you wish to end this session. When entering s, you will be asked again for a command since you need to decide on the current book. Once the second command is entered the program will terminate and all files will be saved. For all other input it will keep on asking for a command until any correct input is entered.

Parameters:
filename - The full path and filename of the txt file that potentially contains an ISBN number.
nome - The name of the book that is searched for an ISBN.
t - The number of the book in the list of books, used for printing purposes
tempPath - A directory where some temporary files will be saved to.
Returns:
An ISBN found in the book. The specific ISBN returned is determined by the program or the user. The returned isbn is only numbers and x,X all other chars are removed. If none are found - return null.

checkDupISBN

private static java.lang.String[] checkDupISBN(java.lang.String[][] list,
                                               int k,
                                               java.lang.String nome)
This method decides if there are any duplicates in any of many ways. If there are it returns the duplicate. Generally if given the option between a paperbook format and an ebook format it will select the ISBN for the paperbook. See above notes. It also tries to return the ISBN-13 when possible.

Parameters:
list - The matrix that contains the ISBN numbers and associated info of a single book.
k - roughly the number of ISBNs in list.
nome - The file name of the book.
Returns:
An array containing the duplicate and it associated info in the format used by getISBN(). If no duplicate - returns null.

selectISBN

private static int selectISBN(java.lang.String nome,
                              java.lang.String[][] list,
                              int k)
Another method to decide which ISBN to select from the list.

Parameters:
nome - The file name of the book.
list - The matrix that contains the ISBN numbers and associated info of a single book.
k - roughly the number of ISBNs in list.
Returns:
the index number in list of the ISBN selected in this method.

isbn1013check

public static boolean isbn1013check(java.lang.String isbn1,
                                    java.lang.String isbn2)
Compares two ISBNs to see if one is the ISBN-13 and the other is ISBN-10 but both represent the same book. If both are ISBN-10 or ISBN-13 it will return false even if both are identical.

Parameters:
isbn1 - One ISBN
isbn2 - Another ISBN
Returns:
true if both represent the same book AND one is ISBN-10 and the other is ISBN-13, else false.

compareETitle

private static boolean compareETitle(java.lang.String a,
                                     java.lang.String b)
Compares two book titles for equality. Since an ebook has appended to its title "_-_ebbookk", titles are compared with "_-_ebbookk" removed. i.e. the two titles differing only by "_-_ebbookk" are equal. Goal is to see if they represent the same book.

Parameters:
a - title a
b - title b
Returns:
true if titles are for same book, else false.

comparesToTitle

private static boolean comparesToTitle(java.lang.String title,
                                       java.lang.String amazonTitle)
Attempts to compare the title downloaded from amazon or isbndb to the filename of the current book. Filename is assumed to end with .txt. Equality computed here is very weak, see code for how equality is computed. Even if true, don't assume that it's really true by itself. This should only be used as confirmation.

Parameters:
title - book filename
amazonTitle - Title downloaded from amazon
Returns:
true if they are equal, else false.

printISBNwithLines

private static void printISBNwithLines(java.lang.String[][] isbns,
                                       java.lang.String[] lines,
                                       int numberOfLines,
                                       int k)
Used to print the ISBNs found in the book so we can select the correct ISBN. The printed output is the selected number of lines of text preceding the ISBN in the book followed by the ISBN number and followed by same number of lines of text. The preceding and following text are to help decide if ISBN is correct.

Parameters:
isbns - The matrix representing all the ISBNs found in the book and their associated info.
lines - The whole text file of the current book.
numberOfLines - Selects the number of lines to print before and after the ISBN number.
k - is output from the function calling it, it's roughly the number of ISBNs found in the book.

extractISBN

public static java.lang.String extractISBN(java.lang.String line)
Searches for and extracts an ISBN number from line. If there are multiple ISBNs in line it only extracts the first, to get the second you have to break line after the first ISBN and try again.

Parameters:
line - The string potentially containing an ISBN number.
Returns:
The ISBN number without any chars but only numbers or letters x,X. If there was no ISBN - returns null.

badISBN

private static int badISBN(java.lang.String[] lines,
                           java.lang.String[][] list,
                           java.lang.String tempPath)
If no ISBNs were found in the book, this more desperate method is used to get anything that is remotely possibly an ISBN.

Parameters:
lines - The text of the book.
list - The matrix representing all the ISBNs found in the book and their associated info.
tempPath - A directory where some temporary files will be saved to.
Returns:
the number of ISBNs found in the book. 0 means none...

extractCHM

public static void extractCHM(java.lang.String chmOriginPath,
                              java.lang.String txtResultPath)
Generates a DOS cmd command for the program minetext (http://text-mining-tool.com/) to extract all the text from each chm file found in the chmOriginPath directory and save it as a text file in txtResultPath. The resulting commands are saved to a text file in the chmOriginPath directory.

Parameters:
chmOriginPath - The directory where the chm files are present.
txtResultPath - The directory where the extracted text files should be saved to.

prepereDVD

public static void prepereDVD(java.lang.String listFile,
                              java.lang.String opfPath,
                              java.lang.String txtFilesPath,
                              java.lang.String tag)
This is used to backup books in calibre to a DVD. It looks through the calibre library folder, when it finds a book it checks if it is a proper book to add to the DVD, for example a book without a publisher won't be added. If it is, than the book will be tagged in calibre with tag to indicate that it should be exported and saved to a DVD. In order not to prepare more books than the DVD can hold, class variable maxDvdSize is the size limit for the size of the books prepared. Once the limit is reached, it's terminated.

Parameters:
listFile - This is a list file of all the FILES in calibre's library. It should not contain metadata.db and the files listed should have the full path followed by the filename. "PrintFolder" can generates such a list. It should also not list empty directories ONLY FILES.
opfPath - a temporary directory where opf files are saved for the duration of the session.
txtFilesPath - a temporary directory where text files are saved for the duration of the session.
tag - The tag used in calibre to indicate that this files is on DVD x. For example, DVD_Books_5. May not be completely windows independent. Use at your own risk. Though it worked for me.

deleteDvdFiles

public static void deleteDvdFiles(java.lang.String listFile,
                                  java.lang.String txtFilesPath,
                                  java.lang.String tag,
                                  java.lang.String movedFilesPath)
This continues from from the above method prepereDVD(). After the books has been saved to the DVD they need to be deleted. This moves the book files that were saved to the DVD from calibre's folder to another folder and than saves there a text file with a list of all the books that failed to be moved. It knows which file has been saved to the DVD based on the tag it was given above in calibre. make sure the books are saved to the DVD before actually deleting them...

Parameters:
listFile - The same listFile as above.
txtFilesPath - A temporary directory where text files are saved for the duration of the session.
tag - The tag given for the book in the above method
movedFilesPath - The target path where the books to be deleted are moved to. May not be completely windows independent. Use at your own risk. Though it worked for me.

moveWhatever

public static void moveWhatever(java.lang.String dirPath,
                                java.lang.String resultPath,
                                java.lang.String regex)
Given a directory dirPath it will match each of its files and files in its sub-dirs to string regex. If it's a file and it matches, than it will be moved. If the file is in a sub-dir of dirPath and it matches than the top sub-folder and its contents will be moved. It does NOT try to match directories, only files. A text file with a list of files/dir that failed to be moved will also be generated.

Parameters:
dirPath - The directory where it'll look for matches.
resultPath - The directory where matching files/directories will be moved.
regex - The string regex with what to match the files.

pdftkWhatever

public static void pdftkWhatever(java.lang.String dirPath,
                                 java.lang.String resultPath,
                                 java.lang.String regex)
Given a directory dirPath it will match each of its files and files in its sub-dirs to string regex. If it's a file and it matches, than the pdftk cat command will be generated for this file. If the file is in a sub-dir of dirPath and it matches than the pdftk cat command will be generated for this file, however, the pdftk output file name isn't going to be the input file name but the name of the top sub-dir of dirPath in which it was found. It does NOT try to match directories, only files. The point of this is is that if you have books that are broken down into many parts in a folder and you suspect that in each of these folders, one file will have the ISBN number in it. Than you can extract that file using pdftk and get the ISBN from it instead of having to do it for every single file in the folder. Since it's renamed to the name of the top sub-dir, it'll be easy after finding the ISBN # to rename the top sub-dir to the standard ISBN name.

Parameters:
dirPath - he directory where it'll look for matches.
resultPath - The directory where pdftk will save the extracted files.
regex - The string regex with what to match the files.

moveFileWithISBNTitle

public static void moveFileWithISBNTitle(java.lang.String dirPath,
                                         java.lang.String resultPath)
Given directory dirPath it looks through all the folders and files in dirPath for a file with a legitimate ISBN in the filename. If any of the files have such a name it's renamed to the standard name with ISBN in it as set in the class variables. If it's a directory than it will search through the directory for a file or sub-directory with ISBN in their names. Than, if found, it renames the top directory in dirPath to the standard isbn name and moves it. It always saves a backup of the original file or folder name and the new name as well as a list of the files that failed when renamed to a text file.

Parameters:
dirPath - The full path to the directory to search for files or folders with ISBNs in their names.
resultPath - The path where files or directories with ISBNs in their names should be moved to.

matchesString

public static java.io.File matchesString(java.io.File dir,
                                         java.lang.String regex)
Tries to match regex to the file names of all files in this directory and its sub-directories. Only tries to match files NOT directory names. Returns the first file that matched

Parameters:
dir - The directory to search for a matching file
contains - The regex used to try to match the filename.
Returns:
If it matched a filename - the first such file, else null.

moveSingleFiles

public static void moveSingleFiles(java.lang.String dirPath,
                                   java.lang.String resultPath,
                                   int fileOrDir)
It looks through a directory for folders containing only one file or for folders without any files. Set using fileOrDir. All the files in dirPath will be moved and all the directories IN dirPath will be searched for emptiness. It's not dirPath that will moved if empty but it's sub-directories. It also saves a "Failed to move single files.txt" listing the files that failed to be moved.

Parameters:
dirPath - The containing directory which will have its folders and files searched through.
resultPath - The full path where the files or directory should be moved to.
fileOrDir - 0 to move all the empty folders, even if they have empty sub-folders. 1 to move the files that are the only files in their directory i.e. folder containing only one file

titleIsISBN

public static java.lang.String titleIsISBN(java.io.File dir)
Looks for a ISBN number in the name of the file or directory. If the input is a directory, it will also search for an ISBN in all the sub-directories and files.

Parameters:
dir - The directory or file with a potential ISBN number in their name.
Returns:
If an ISBN was found - the ISBN with all extraneous chars removed except those within the ISBN number, else null. It will return anything that is formatted as an ISBN even if it really isn't an ISBN but just a random number.