OK; the new script behaved differently, with 126 and the latest gutlrf.pl:
The contents links all point to page one according to LRF viewer.
Output:
Quote:
:~/Desktop/untitled folder/CleanMe!!!/gutlrf$ ./gutlrf.pl http://www.gutenberg.org/files/17297/17297-h.zip
Extracting files...
Book Title: British Highways And Byways From A Motor Car
Author : Thomas D Murphy
Cleaning HTML...
Wrote cleaned HTML "/tmp/17297-h/new.htm"
Converting to LRF BBeB...
Processing u'new.htm'
Parsing HTML...
Converting to BBeB...
Rationalizing font sizes...
Output written to /tmp/17297-h/British Highways And Byways From A Motor Car.lrf
Segmentation fault
Died at ./gutlrf.pl line 261.
|