Very good, finally got the script going:
Of course, it didn't have to be all fair sailing and here is the output from my first conversion:
Quote:
:~/Desktop/untitled folder/CleanMe!!!/gutlrf$ ./gutlrf.pl http://www.gutenberg.org/files/17297/17297-h.zip
... 0KBytes
Extracting files...
Book Title: British Highways And Byways From A Motor Car
Author : Thomas D Murphy
Cleaning HTML...
Wrote cleaned HTML "/tmp/17297-h/new.htm"
Converting to BBeB...
Processing u'new.htm'
Parsing HTML...
Converting to BBeB...
An error occurred while processing a table: AttributeError("'module' object has no attribute 'tt0011m_'",). Ignoring table markup.
An error occurred while processing a table: AttributeError("'module' object has no attribute 'tt0011m_'",). Ignoring table markup.
Rationalizing font sizes...
Output written to /tmp/17297-h/British Highways And Byways From A Motor Car.lrf
Segmentation fault
Died at ./gutlrf.pl line 261.
|