STAR JSON

Center for Satellite Applications and Research (STAR)
NOAA Center for Weather and Climate Prediction (NCWCP)
5830 University Research Court
College Park, MD 20740
Version 0.3.
Pedro Vicente

Acknowledgements
This version of STAR JSON was made with contributions of Chris Barker (NOAA) and Charlie Zender (UCI).
It is a modified version of the CF-JSON format, with the difference that attributes have the same JSON objects as datasets.
STAR JSON is a JSON schema that is used to share commonly used scientific data formats, such as HDF5 and netCDF.

It is used in client/server applications, where the JSON representation of the HDF5 file is transmitted over the network.

A client applicattion has access to remote HDF5 or netCDF data stored a server application. The client extracts metadata and data from the remote files, transmitted in STAR JSON format.
The STAR JSON file is a simplified reproduction of the netCDF/HDF5 file in JSON format. In STAR JSON format version 0.3, reproduced elements are
The mapping of these HDF5 elements to JSON format (i.e, the JSON schema) is described below

HDF5 to JSON mapping

HDF5 JSON
Groups JSON object with key named "groups". The object value is another JSON object where the key is the group name.
Example: A group named "g1" with empty contents.

{
   "groups":{
      "g1":{
      }
   }
}

Datasets JSON object with key named "variables". The object value is is another JSON object where the key is the dataset name.
Example: A dataset named "var_1", where the value of object "var_1" is not specified here. See below for a list of object values for datasets.

{
  "variables":{
    "var_1":{
    }
  }
}

Attributes JSON object with key named "attributes". The object value is is another JSON object where the key is the attribute name.
Example: An attribute named "attr_1", where the value of object "attr_1" is not specified here. See below for a list of object values for attributes.

{
  "attributes":{
    "attr_1":{
    }
  }
}


To better understand this mapping a brief description follows.

HDF5 to JSON mapping in detail

JSON primer

JavaScript Object Notation (JSON) is a lightweight, text-based, language-independent data interchange format. JSON defines a small set of formatting rules for the portable representation of structured data. JSON is built on two structures: JSON values can be: A JSON text is a sequence of tokens. The set of tokens includes six structural characters, strings, numbers, and three literal names (true, false, null). The six structural characters are:

Token Description Character
Array square brackets [ ]
Objects curly brackets { }
Entry separation in arrays and objects comma ,
Separator between keys and values in an object colon :


STAR JSON representation of HDF5

A STAR JSON document consists always of one and only one main object. This corresponds to the HDF5 data model of having one and only one root group. An empty HDF5 file is thus represented as

{     
}

that is, an empty JSON object. To note that the root HDF5 group "/" is implicit and not represented.

There are 3 separate entities of a STAR JSON HDF5 document, all represent as JSON objects in the main JSON object (root group): These 3 entities correspond to the corresponding HDF5 entities. For netCDF there is an addtional main JSON object to represent netCDF dimensions.

STAR JSON Group objects

An HDF5 group can itself have the 3 separate entities groups, datasets, attributes. An example of a root group with 2 groups named "g1" (empty) and "g2" with variables and attributes is:

{
 "groups":{
  "g1":{
  },
  "g2":{
   "variables":{
   },
   "attributes":{
   }
  }
 }
}

An example of a root group with a group named "g1" that contains a subgroup "g11" is:

{
  "groups":{
    "g1":{
      "groups":{
        "g11":{
        }
      }
    }
  }
}

This representation in fact allows for the representation of the hierarchy of a HDF5 file. The HDF5 model consists of groups and datasets. Groups can be nested with other groups and datasets.

STAR JSON HDF5 dataset objects

Dataset objects in the HDF5 file are represented as a JSON key/value pair (i.e, a JSON object entry) with the format:

Key Value
"shape" 1D JSON array of numbers. For netCDF 1D JSON array of strings, each string being a dimension
"type" JSON string that identifies the HDF5 numeric type
"data" JSON array of numbers or strings
"attributes" list of JSON objects. Each attribute object has the same format as the dataset object: objects with "shape", "type" and "data" keys


Example of a dataset named "var_1", that has a rank of 2, with dimensions 2 and 3, of type float, that contains a string attribute:

{
  "variables": {
    "var_1": {
      "shape": [2, 3],
      "type": "float",
      "data": [
        [1, 2, 3],
        [4, 5, 6]
      ],
      "attributes": {
        "char_att": {
          "shape": [3],
          "type": "char",
          "data": ["foo"]
        }
      }
    }
  }
}

STAR JSON datum types

Datum C type HDF5 type
char char H5T_NATIVE_CHAR
schar signed char H5T_NATIVE_SCHAR
uchar unsigned char H5T_NATIVE_UCHAR
short short H5T_NATIVE_SHORT
ushort unsigned short H5T_NATIVE_USHORT
int int H5T_NATIVE_INT
uint unsigned int H5T_NATIVE_UINT
long long H5T_NATIVE_LONG
long unsigned long H5T_NATIVE_ULONG
llong long long H5T_NATIVE_LLONG
ullong unsigned long long H5T_NATIVE_ULLONG
float float H5T_NATIVE_FLOAT
double double H5T_NATIVE_DOUBLE
ldouble long double H5T_NATIVE_LDOUBLE

References