Hello everyone,
for the GUI for k2pdfopt I develop I need to find out the number of pages in a DjVu-file. Currently I'm doing it by just counting the number of instances of the "DJVU" string in the file which works well. The problem is that I need to load the whole file and iterate through it which takes both a lot of memory and time for large files.
I know that there is the DIRM directory at the beginning of the file which "should" contain the number of files or pages in the document according to the format specifications.
The DIRM string should be followed by one byte of flags and then an INT16 containing the number of pages. However I can not see that in the file. The only occurence which makes a little sense arebytes 6/7 (or 7/8 depending on big/little endian) which contain an int16 at least in the region of the page count (but not exactly).
Can someone tell me where my error lies?
As an example, the first 16 byte including the DIRM string for a file with 57 pages:
Code:
D I R M
44 49 52 4D 00 00 02 0F 81 00 3C 00 00 02 D2 00 00 1B 7A 00 00 47 30 00
As you can see bytes 2/3 after the the DIRM contain the number 512 and 7/8 contain the number 60 which is close but not quite there.
I really don't understand this.
You can find the format specs here:
https://github.com/barak/djvulibre/tree/master/doc
Thank you in advance, any help is greatly appreciated!
- Jens
PS: I found an example but I'm really terrible in C++. Could someone tell me what exactly this does? How does the << and + operators work on the char datatype in C++?
Code:
unsigned int
ByteStream::read16()
{
unsigned char c[2];
if (readall((void*)c, sizeof(c)) != sizeof(c))
G_THROW( ByteStream::EndOfFile );
return (c[0]<<8)+c[1];
}