Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Other formats

Notices

Reply
 
Thread Tools Search this Thread
Old 07-06-2013, 04:29 AM   #1
JensW
Enthusiast
JensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trolls
 
Posts: 29
Karma: 81500
Join Date: Apr 2013
Device: Kindle 4
DJVU: Extract number of pages

Hello everyone,
for the GUI for k2pdfopt I develop I need to find out the number of pages in a DjVu-file. Currently I'm doing it by just counting the number of instances of the "DJVU" string in the file which works well. The problem is that I need to load the whole file and iterate through it which takes both a lot of memory and time for large files.

I know that there is the DIRM directory at the beginning of the file which "should" contain the number of files or pages in the document according to the format specifications.
The DIRM string should be followed by one byte of flags and then an INT16 containing the number of pages. However I can not see that in the file. The only occurence which makes a little sense arebytes 6/7 (or 7/8 depending on big/little endian) which contain an int16 at least in the region of the page count (but not exactly).

Can someone tell me where my error lies?

As an example, the first 16 byte including the DIRM string for a file with 57 pages:
Code:
D  I  R  M
44 49 52 4D 00 00 02 0F 81 00 3C 00 00 02 D2 00 00 1B 7A 00 00 47 30 00
As you can see bytes 2/3 after the the DIRM contain the number 512 and 7/8 contain the number 60 which is close but not quite there.

I really don't understand this.

You can find the format specs here: https://github.com/barak/djvulibre/tree/master/doc

Thank you in advance, any help is greatly appreciated!

- Jens


PS: I found an example but I'm really terrible in C++. Could someone tell me what exactly this does? How does the << and + operators work on the char datatype in C++?

Code:
unsigned int 
ByteStream::read16()
{
  unsigned char c[2];
  if (readall((void*)c, sizeof(c)) != sizeof(c))
    G_THROW( ByteStream::EndOfFile );
  return (c[0]<<8)+c[1];
}

Last edited by JensW; 07-06-2013 at 05:08 AM.
JensW is offline   Reply With Quote
Old 07-06-2013, 01:21 PM   #2
rkomar
Wizard
rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.
 
Posts: 2,986
Karma: 18343081
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
"<<8" shifts left by 8 bits, which is the same as multiplying by 256. So, the function returns 256*c[0] + c[1], i.e. the program is interpreting those two bytes as being uint16_t in big endian order.

The registers in which the math takes place are probably at least 32 bits wide, so the addition of the temporary c[0]<<8 and c[1] values is done without truncating back to 8-bits (size of unsigned char) before it is returned as an unsigned int.

Last edited by rkomar; 07-06-2013 at 06:37 PM. Reason: Got the endianness wrong :P
rkomar is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Number of pages left in chapter not working 3rdDegree Conversion 3 02-22-2013 09:38 PM
Do the number of pages in an ebook differ from the number of pages in a physical book Phoebemy General Discussions 12 07-19-2012 09:25 AM
number of pages/locations/words? egg Calibre 3 11-25-2010 04:47 AM
Conversion - Can you keep same number of pages goldberry Calibre 4 09-12-2010 12:11 AM
How are the page numbers/number of pages defined? kennyc ePub 8 09-27-2009 11:23 AM


All times are GMT -4. The time now is 11:59 PM.


MobileRead.com is a privately owned, operated and funded community.