View Single Post
Old 03-09-2014, 08:37 PM   #6
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by Doitsu View Post
I'd try to open the .mdb file with Excel or write a VB macro for Access that exports all pages along with the image references in an easy to handle output format.
Just an update on this.

SNP itself just seems to be a Microsoft proprietary format designed to compete along the lines of PDF. I would avoid working backwards from SNP at all costs.

Me and EdK have done some communications over email. He was able to export the entire database as an Excel spreadsheet.

From there, we were able to pull out the relevant text columns. It looked to be a form of rough HTML plopped into the cells:

Quote:
<div align=center><font size=7><strong>PREFACE</strong></font> </div>_x000D_
_x000D_
<div>&nbsp;</div>_x000D_
_x000D_
<div><font size=3><strong>&quot;</strong></font><font face="Lucida Calligraphy"_x000D_
size=3><strong><u>Bonzer</u></strong></font><font size=3><strong>&quot; (definition, Macquarie Dictionary, 1985) - <em>/'bonza/</em>, adj. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n bsp;&nbsp;&nbsp;&nbsp;Colloq. &nbsp;Excellent, attractive, pleasing. &nbsp;Also, bonza, boshter&quot;</strong></font></div>
But, with a little bit of elbow grease, you could export the cells and regex this into some clean/proper XHTML. Still would require a lot of manpower to rereplicate everything wanted (headings, captions, images, etc. etc.). But at least all of the text transferred to the spreadsheet fine.
Tex2002ans is offline   Reply With Quote