07-06-2013, 04:29 AM | #1 |
Enthusiast
Posts: 29
Karma: 81500
Join Date: Apr 2013
Device: Kindle 4
|
DJVU: Extract number of pages
Hello everyone,
for the GUI for k2pdfopt I develop I need to find out the number of pages in a DjVu-file. Currently I'm doing it by just counting the number of instances of the "DJVU" string in the file which works well. The problem is that I need to load the whole file and iterate through it which takes both a lot of memory and time for large files. I know that there is the DIRM directory at the beginning of the file which "should" contain the number of files or pages in the document according to the format specifications. The DIRM string should be followed by one byte of flags and then an INT16 containing the number of pages. However I can not see that in the file. The only occurence which makes a little sense arebytes 6/7 (or 7/8 depending on big/little endian) which contain an int16 at least in the region of the page count (but not exactly). Can someone tell me where my error lies? As an example, the first 16 byte including the DIRM string for a file with 57 pages: Code:
D I R M 44 49 52 4D 00 00 02 0F 81 00 3C 00 00 02 D2 00 00 1B 7A 00 00 47 30 00 I really don't understand this. You can find the format specs here: https://github.com/barak/djvulibre/tree/master/doc Thank you in advance, any help is greatly appreciated! - Jens PS: I found an example but I'm really terrible in C++. Could someone tell me what exactly this does? How does the << and + operators work on the char datatype in C++? Code:
unsigned int ByteStream::read16() { unsigned char c[2]; if (readall((void*)c, sizeof(c)) != sizeof(c)) G_THROW( ByteStream::EndOfFile ); return (c[0]<<8)+c[1]; } Last edited by JensW; 07-06-2013 at 05:08 AM. |
07-06-2013, 01:21 PM | #2 |
Wizard
Posts: 2,986
Karma: 18343081
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
|
"<<8" shifts left by 8 bits, which is the same as multiplying by 256. So, the function returns 256*c[0] + c[1], i.e. the program is interpreting those two bytes as being uint16_t in big endian order.
The registers in which the math takes place are probably at least 32 bits wide, so the addition of the temporary c[0]<<8 and c[1] values is done without truncating back to 8-bits (size of unsigned char) before it is returned as an unsigned int. Last edited by rkomar; 07-06-2013 at 06:37 PM. Reason: Got the endianness wrong :P |
Advert | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Number of pages left in chapter not working | 3rdDegree | Conversion | 3 | 02-22-2013 09:38 PM |
Do the number of pages in an ebook differ from the number of pages in a physical book | Phoebemy | General Discussions | 12 | 07-19-2012 09:25 AM |
number of pages/locations/words? | egg | Calibre | 3 | 11-25-2010 04:47 AM |
Conversion - Can you keep same number of pages | goldberry | Calibre | 4 | 09-12-2010 12:11 AM |
How are the page numbers/number of pages defined? | kennyc | ePub | 8 | 09-27-2009 11:23 AM |