View Single Post
Old 10-04-2014, 06:11 PM   #1
cyanic
File format tinkerer
cyanic has read War And Peace ... all of itcyanic has read War And Peace ... all of itcyanic has read War And Peace ... all of itcyanic has read War And Peace ... all of itcyanic has read War And Peace ... all of itcyanic has read War And Peace ... all of itcyanic has read War And Peace ... all of itcyanic has read War And Peace ... all of itcyanic has read War And Peace ... all of itcyanic has read War And Peace ... all of itcyanic has read War And Peace ... all of it
 
Posts: 19
Karma: 66468
Join Date: Oct 2014
Device: PC
VitalBook format (.vbk)

Hi everyone! I've recently been exposed to VitalSource as a course I'm taking has its primary textbook only available from VitalSource. In any case, I've found Bookshelf to be quite annoying and slow, so I took out some time to figure out the format.

.vbk Format
The .vbk format is, at its core, a container format. It contains all the components of the book, laid out as files in the container. It's sort of like the type of large resource packs you see in games these days. The file is read from the end: there is a "magic number" that identifies the file as in .vbk format, the size of the header, and the header itself. The header is an XML string that contains basic information about the file, such as version, built date, file table info, sanity info, and some metadata. The rest of the data are file data as laid out by the filemap.

There are currently four distinct categories of book formats that can be contained by the .vbk file. They are vitalbook, epubbook, picturebook, and pdfbook.

VitalBook
The vitalbook format (aka DashML) is basically an XML document, stored with VitalSource's proprietary serialization format. The document has a custom XML schema. Images and videos are stored as file entries in the .vbk container. Patent for the serialization format can be found here. Also of interest is this WIPO application, which includes an old copy of the DTD for the XML, starting on page 35. Last known DTD version is 3.2 3.4.

EPUBBook
The epubbook format is just the contents of an EPUB file put inside a .vbk container, along with search indexing data. The files can be extracted and packed to a Zip file for a valid EPUB book.

PictureBook
The picturebook format contains images of each page of a book. Alongside those images are a manifest, text contents of each page, metadata file for position of glyphs (for text selection), linebreaks, and links, and index files. This format may be harder to convert to e-book reader formats, considering they're just images. Probably the best format to convert to is PDF.

PDFBook
The pdfbook format is basically the same as picturebook, except instead of images of pages, there is a single PDF of the book. The text content and page metadata files are still generated and included (for whatever reason). The PDF appears to be the same file as is submitted by the publisher.

DRM
The DRM used in this system is pretty straightforward. The individual streams inside the .vbk are encrypted with AES-256 CBC as necessary (i.e. everything except for cover and cover thumbnail). The keys for the files are stored in a license file (oddly, also a .vbk; its password is encrypted with RSA-2048 OAEP and the encrypted password stored in the .vbk's header). The license file is delivered from VitalSource's server, and contains basic account information (name, email, device ID) and book information (password, print and copy limits, expiration date). The patent for the DRM system can be seen here. (Interesting to note, the system in practice deviates slightly from the patent by using a single, static RSA keypair instead of a unique keypair for each user.) While it appears no .vbk is distributed without encryption, it is entirely possible to read one that is not encrypted (I tried it on Bookshelf; although it'll prevent you from opening the file if it doesn't find its ID (derived from file name...) in the license file, a rename did allow the decrypted book to work just fine).

Next steps
What I'd like is to get VitalBook support in calibre. I've made a utility to decrypt, extract, and convert .vbk files (convert meaning vitalbook to XML, not other types to other formats, though I do have an epubbook to EPUB converter). Unfortunately, it's all written in C#, and I don't know Python, so I can't write a plugin for calibre. Would anyone be interested in bringing support for it to calibre? I'm thinking most basically extracting epubbooks and PDFs for import, and getting metadata from the .vbk header. (Although the metadata doesn't contain publisher name, publication date, or category info.)

Existing converters
Although there are no "offline" converters publicly available right now, there are still ways to convert certain VitalBook variants into other formats.
  • PDFBook: Use the dumping DLL to dump the embedded PDF file.
  • EPUBBook: Use Extractor.epub to convert to EPUB.
  • VitalBook and PictureBook: Use a PDF printer to convert to PDF (example procedure). Note this may be slow, and the resulting pages will include a watermark header and the page quality may be lower.

Last edited by cyanic; 03-31-2015 at 10:13 PM. Reason: Added converters section
cyanic is offline   Reply With Quote