STAR JSON
Center for Satellite Applications and Research (STAR)
NOAA Center for Weather and Climate Prediction (NCWCP)
5830 University Research Court
College Park, MD 20740
Version 0.3.
Pedro Vicente
Acknowledgements
This version of STAR JSON was made with contributions of Chris Barker (NOAA) and Charlie Zender (UCI).
It is a modified version of the
CF-JSON
format, with the difference that attributes have the same JSON objects as datasets.
STAR JSON is a JSON schema that is used to share commonly
used scientific data formats, such as HDF5 and netCDF.
It is used in client/server applications, where the JSON representation of the HDF5 file is transmitted over the network.
A client applicattion has access to remote HDF5 or netCDF data stored a server application. The client extracts metadata and data from
the remote files, transmitted in STAR JSON format.
The STAR JSON file is a simplified reproduction of the netCDF/HDF5 file in JSON format.
In STAR JSON format version 0.3, reproduced elements are
- HDF5 file hierarchy, i.e., groups
- All HDF5 atomic datatypes for datasets and attributes, i.e. strings, signed and unsigned integers and floating point values.
The mapping of these HDF5 elements to JSON format (i.e, the JSON schema) is described below
HDF5 to JSON mapping
HDF5 |
JSON |
Groups |
JSON object with key named "groups". The object value is another JSON object where the key is the group name.
Example: A group named "g1" with empty contents.
{
"groups":{
"g1":{
}
}
}
|
Datasets |
JSON object with key named "variables". The object value is is another JSON object where the key is the dataset name.
Example: A dataset named "var_1", where the value of object "var_1" is not specified here. See below for a list of object values
for datasets.
{
"variables":{
"var_1":{
}
}
}
|
Attributes |
JSON object with key named "attributes". The object value is is another JSON object where the key is the attribute name.
Example: An attribute named "attr_1", where the value of object "attr_1" is not specified here. See below for a list of object values
for attributes.
{
"attributes":{
"attr_1":{
}
}
}
|
To better understand this mapping a brief description follows.
HDF5 to JSON mapping in detail
JSON primer
JavaScript Object Notation (JSON) is a lightweight, text-based, language-independent data interchange format.
JSON defines a small set of formatting rules for the portable representation of structured data.
JSON is built on two structures:
- A collection of key/value pairs, called an object
- A list of values, called an array
JSON values can be:
- a string. A string is a sequence of characters, enclosed in double quotes.
- a number. To note that the JSON number format makes no distinction between integer and floating-point.
A JSON number is a signed decimal number that may contain a fractional part.
It is up to a particular programming language or application implementation to interpret a JSON number as desired.
- a boolean (true or false)
- null, an empty value
- an object
- an array.
A JSON text is a sequence of tokens. The set of tokens includes six structural characters, strings, numbers, and three literal names (true, false, null).
The six structural characters are:
Token |
Description |
Character |
Array |
square brackets |
[ ] |
Objects |
curly brackets |
{ } |
Entry separation in arrays and objects |
comma |
, |
Separator between keys and values in an object |
colon |
: |
STAR JSON representation of HDF5
A STAR JSON document consists always of one and only one main object.
This corresponds to the HDF5 data model of having one and only one root group.
An empty HDF5 file is thus represented as
{
}
that is, an empty JSON object. To note that the root HDF5 group "/" is implicit and not represented.
There are 3 separate entities of a STAR JSON HDF5 document, all represent as JSON objects in the main JSON object (root group):
- Groups
- Datasets
- Attributes
These 3 entities correspond to the corresponding HDF5 entities.
For netCDF there is an addtional main JSON object to represent netCDF dimensions.
STAR JSON Group objects
An HDF5 group can itself have the 3 separate entities groups, datasets, attributes.
An example of a root group with 2 groups named "g1" (empty) and "g2" with variables and attributes is:
{
"groups":{
"g1":{
},
"g2":{
"variables":{
},
"attributes":{
}
}
}
}
An example of a root group with a group named "g1" that contains a subgroup "g11" is:
{
"groups":{
"g1":{
"groups":{
"g11":{
}
}
}
}
}
This representation in fact allows for the representation of the hierarchy of a HDF5 file.
The HDF5 model consists of groups and datasets. Groups can be nested with other groups and datasets.
STAR JSON HDF5 dataset objects
Dataset objects in the HDF5 file are represented as a JSON key/value pair (i.e, a JSON object entry) with the format:
Key |
Value |
"shape" |
1D JSON array of numbers. For netCDF 1D JSON array of strings, each string being a dimension |
"type" |
JSON string that identifies the HDF5 numeric type |
"data" |
JSON array of numbers or strings |
"attributes" |
list of JSON objects. Each attribute object has the same format as the dataset object: objects with "shape", "type" and "data" keys |
Example of a dataset named "var_1", that has a rank of 2, with dimensions 2 and 3, of type float, that contains a string attribute:
{
"variables": {
"var_1": {
"shape": [2, 3],
"type": "float",
"data": [
[1, 2, 3],
[4, 5, 6]
],
"attributes": {
"char_att": {
"shape": [3],
"type": "char",
"data": ["foo"]
}
}
}
}
}
STAR JSON datum types
Datum |
C type |
HDF5 type |
char |
char |
H5T_NATIVE_CHAR |
schar |
signed char |
H5T_NATIVE_SCHAR |
uchar |
unsigned char |
H5T_NATIVE_UCHAR |
short |
short |
H5T_NATIVE_SHORT |
ushort |
unsigned short |
H5T_NATIVE_USHORT |
int |
int |
H5T_NATIVE_INT |
uint |
unsigned int |
H5T_NATIVE_UINT |
long |
long |
H5T_NATIVE_LONG |
long |
unsigned long |
H5T_NATIVE_ULONG |
llong |
long long |
H5T_NATIVE_LLONG |
ullong |
unsigned long long |
H5T_NATIVE_ULLONG |
float |
float |
H5T_NATIVE_FLOAT |
double |
double |
H5T_NATIVE_DOUBLE |
ldouble |
long double |
H5T_NATIVE_LDOUBLE |
References
-
[RFC7159] The JavaScript Object Notation (JSON) Data Interchange Format. https://tools.ietf.org/html/rfc7159
-
HDF5 Predefined Datatypes. https://www.hdfgroup.org/HDF5/doc/RM/PredefDTypes.html