Wednesday, January 21, 2009

J2ME: Thumbnails Extraction of JPEG (Exif) Images made with Mobile Phone Camera

Mobile Phones Cameras, being digital, make pictures/images in JPEG format particularly ‘Exif’ flavor of JPEG format. Exif has been decided as a standard for Digital Cameras. We are not going to discuss the process involved in the creation of JPEG images i.e. the encoding process of a raw image into compressed JPEG image which essentially involves three steps in order of Discrete Cosine Transform(DCT), Quantization and Entropy (Huffman) Encoding. Here we are concerned about how data is organized that is the ‘FORMAT’ once a JPEG image has been created. However we do need a JPEG decoder for thumbnail extraction, there is one freely available. First we talk a little about JPEG format and its Exif flavor.
Few Words on JPEG format
A JPEG (jpg) file is organized in order of markers along with their contents. Each marker itself takes 2 bytes. The very first marker (0xFFD8) stands for Start of Image (SOI). This declares that this is a JPEG file. The second marker is (APPn) that depends upon the application using JPEG hence the marker contains an identifier in its contents indicating the application. The marker have any value from APP0 (0xFFE0) to APP15 (0xFFEF) both inclusive.
In JPEG format, immediate 2 bytes after each marker contain the length of the marker’s contents including the length field itself, so is the case with APPn marker. APP0 (0xFFE0) belongs to ‘JFIF’ marker while APP1 (0xFFE1) to ‘Exif’ marker (We are interested in APPn marker particularly APP1 (0xFFE1) marker for extracting thumbnail which is already embedded in the file within this marker). In APPn, after length field, following bytes contain the ASCII code equivalent of the identifier name (5 bytes for ‘JFIF’ and 6bytes for ‘Exif’). Please see the NOTE below. The Exif identifier is 45, 78, 69, 66, 00, 00 (6 bytes). From there on, Exif format is same as TIFF image format is. More detail on Exif (and its embedded TIFF) please see here. After reading a specific number of offset bytes when thumbnail offset is reached, it could be in one of three formats JPEG compressed (most commonly used), RGB TIFF or YCbCr TIFF (The number of offset bytes depends upon ‘byte align’ discussed below). If JPEG compressed, it is just like another JPEG image of a cut down scale which is decoded for display. (I have tried on Sony Ericson K800i and Nokia 6630, both of them had the thumbnail in JPEG Compressed format hence our discussion pertains to this only). Another important thing about TIFF header (8 bytes), embedded in Exif format, is that its first 2 byte informs you about the byte align of TIFF data to be followed that is either little endian (used by Intel) or big endian (used by Motorola). So you have got to look for this thing to calculate the offset while reading Exif file in general and TIFF file in particular. JPEG generally uses big endian however Exif allows both of them. Moreover, most of the digital cameras using Exif format follow little endian. Also remember that all the offset in TIFF are calculated from the first byte of the TIFF header.
NOTE
In case of JFIF, last byte out of 5 contains zero while last two bytes in case of ‘Exif’ contain zero, for example, 4A, 46, 49, 46, 00 (5 bytes)for JFIF. The JFIF v1.02 and above have JFIF extension according to which APP0 has extension part which again starts with application marker hence making two APP markers. The thumbnail may be located either under first marker or under second marker For more detail on JFIF , please see here.
...continued

Wednesday, January 14, 2009

JDOM vs DOM4J

Bear in mind, JDOM and DOM4J are java API for XML processing. JDOM is open source pure Java API for XML processing. JDOM is not an extension or wrapper over W3C DOM model. JDOM is an object model in java that stands for XML document that is the same model as of DOM. JDOM does use SAX API with XML parser (Xerces or Crimson or any other) at the back end to establish the in-memory tree based model.
Note: JDOM uses the defaults parser of JAXP as it calls the parser through JAXP, however, it can be changed to any other parser (Please see the previous post).
You can find this description on the following this link. Like DOM and unlike to SAX, JDOM provides random access to contents of the document hence allowing to update it. JDOM is compatible for conversion to DOM Model or SAX Events and vice versa. Please see this. Unlike DOM, JDOM doesn’t work through interfaces rather it has concrete classes for different components of the model i.e. node, element, attributes etc.
DOM4J, on the other hand like DOM, has done it other way round than JDOM. DOM4J is also XML Processing APIs and follow the same tress based object model for XML document. DOM4J works through interfaces rather than concrete classes. That’s the major difference between DOM4J and JDOM. DOM4J makes navigating along the tree easier than JDOM due to this difference. DOM4J also provides some support for XPath (which, unfortunately, I didn’t get a chance to explore).
The purpose of this and the last three posts is to have a clear understanding of what is what in the area of XML Parsing. I haven’t provided any code here because there is huge coding stuff available on the web for this purpose. I’ll get back to each of them with more detail some other time (fingers crossed!!).

Friday, January 2, 2009

JAXP: Another Player in Java-XML Game

JAXP stands for ‘Java API for XML Processing’. Please, don’t confuse it with other APIs like SAX, DOM, JDOM etc. JAXP is a set of APIs for XML processing making a rather abstract layer in the overall scheme of XML processing. SAX, DOM and JDOM are parsing APIs only, used in parsing of XML document, while JAXP has a broader spectrum of its application, it lets you parse, validate and transform XML documents, here we are concerned with the parsing feature of it however. Remember, JAXP is not parsing APIs nor is it a new way of XML handling in java. As a matter of fact, JAXP provides a convenient way of XML processing. As said before, JAXP exists at the abstract layer; it uses a parser behind the scenes for parsing purposes. JAXP is a bundle of some APIs and a parser. Previously, JAXP was bundled with Crimson parser as the default one; however, JAXP provides the facility to change to some other parser without recompiling the application which has several benefits. (Details can be found on the following link.) These parsers in turn implement SAX, DOM or some other parsing APIs. We are not going into much detail here; please refer to All About JAXP for detail on JAXP along with example code too, this tutorial on JAXP provides quite a comprehensive detail on JAXP highlighting its parsing and validation feature.