Saving `Datasets` and `DataArrays` to NetCDF¶

Objectives¶

Introduce an easy method for saving Datasets and DataArrays objects to NetCDF

Introduction¶

Saving your Datasets and DataArrays objects to NetCDF files couldn’t be simpler. The xarray module that we’ve been using to load NetCDF files provides methods for saving your Datasets and DataArrays as NetCDF files.

Here is the manual page on the subjet: http://xarray.pydata.org/en/stable/generated/xarray.Dataset.to_netcdf.html

The method ._to_netcdf( ) is available to both Datasets and DataArrays objects. So useful!

To complete this tutorial, you will need the monthly mean SSH and temperature/salinity output for March 2010, as well as the model grid parameters file. The ShortNames of the required datasets are:

ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4
ECCO_L4_TEMP_SALINITY_LLC0090GRID_MONTHLY_V4R4
ECCO_L4_GEOMETRY_LLC0090GRID_V4R4

The ecco_access Python package (install it if needed) will handle download or retrieval of the necessary data.

Syntax¶

your_dataset.to_netcdf('/your_filepath/your_netcdf_filename.nc')

Saving an existing `Dataset` to NetCDF¶

First, let’s set up the environment and load a Dataset

[1]:

import numpy as np
import xarray as xr
from os.path import join,expanduser
import sys
import matplotlib.pyplot as plt
import json
import glob

import ecco_v4_py as ecco
import ecco_access as ea


# are you working in the AWS Cloud?
incloud_access = False

# indicate mode of access from PO.DAAC
# options are:
# 'download': direct download from internet to your local machine
# 'download_ifspace': like download, but only proceeds
#                     if your machine have sufficient storage
# 's3_open': access datasets in-cloud from an AWS instance
# 's3_open_fsspec': use jsons generated with fsspec and
#                   kerchunk libraries to speed up in-cloud access
# 's3_get': direct download from S3 in-cloud to an AWS instance
# 's3_get_ifspace': like s3_get, but only proceeds if your instance
#                   has sufficient storage
user_home_dir = expanduser('~')
download_dir = join(user_home_dir,'Downloads','ECCO_V4r4_PODAAC')
if incloud_access:
    access_mode = 's3_open_fsspec'
    download_root_dir = None
    jsons_root_dir = join(user_home_dir,'MZZ')
else:
    access_mode = 'download_ifspace'
    download_root_dir = download_dir
    jsons_root_dir = None

Now load a single tile (model tile 2) of monthly averaged THETA for March 2010.

[4]:

## Access datasets needed for this tutorial

ShortNames_list = ["ECCO_L4_GEOMETRY_LLC0090GRID_V4R4",\
                   "ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4",\
                   "ECCO_L4_TEMP_SALINITY_LLC0090GRID_MONTHLY_V4R4"]

ds_dict = ea.ecco_podaac_to_xrdataset(ShortNames_list,\
                                              StartDate='2010-03',EndDate='2010-03',\
                                              mode=access_mode,\
                                              download_root_dir=download_root_dir,\
                                              jsons_root_dir=jsons_root_dir,\
                                              max_avail_frac=0.5)

[5]:

# load Mar 2010 temperature/salinity monthly mean
ds_temp_sal_201003 = ds_dict[ShortNames_list[2]]

# select only potential temperature on tile 2 and load it into memory
# Note: the extra set of brackets around THETA creates an xarray Dataset
# (with a single data variable) rather than a DataArray
theta_dataset = ds_temp_sal_201003[['THETA']].isel(tile=2).load()

[6]:

type(theta_dataset)

[6]:

xarray.core.dataset.Dataset

Now that we’ve loaded theta_dataset, let’s save it in the current file directory with a new name.

[7]:

new_filename_1 = './test_output.nc'
print ('saving to ', new_filename_1)
theta_dataset.to_netcdf(path=new_filename_1)
print ('finished saving')

saving to  ./test_output.nc
finished saving

It’s really that simple!

Verifying our new NetCDF files¶

To verify that to_netcdf() worked, load them and compare with the originals.

Compare theta_dataset with dataset_1¶

[11]:

# the first test dataset
dataset_1 = xr.open_dataset(new_filename_1)

# release the file handle (not necessary but generally a good idea)
dataset_1.close()

The np.allclose method does element-by-element comparison of variables

[12]:

# loop through the data variables in dataset_1
for key in dataset_1.keys():
    print ('checking %s ' % key)
    print ('-- identical in dataset_1 and theta_dataset : %s' % \
           np.allclose(dataset_1[key], theta_dataset[key], equal_nan=True))

# note: ``equal_nan`` means nan==nan (default nan != nan)

checking THETA
-- identical in dataset_1 and theta_dataset : True

THETA is the same in both datasets.

Compare custom_dataset with dataset_2¶

[13]:

# our custom dataset
dataset_2 = xr.open_dataset(new_filename_2)
dataset_2.close()
print ('finished loading')

finished loading

[14]:

for key in dataset_2.keys():
    print ('checking %s ' % key)
    print ('-- identical in dataset_2 and custom_dataset : %s'\
           % np.allclose(dataset_2[key], custom_dataset[key], equal_nan=True))

checking SSH
-- identical in dataset_2 and custom_dataset : True
checking THETA
-- identical in dataset_2 and custom_dataset : True
checking CS
-- identical in dataset_2 and custom_dataset : True
checking SN
-- identical in dataset_2 and custom_dataset : True
checking rA
-- identical in dataset_2 and custom_dataset : True
checking dxG
-- identical in dataset_2 and custom_dataset : True
checking dyG
-- identical in dataset_2 and custom_dataset : True
checking Depth
-- identical in dataset_2 and custom_dataset : True
checking rAz
-- identical in dataset_2 and custom_dataset : True
checking dxC
-- identical in dataset_2 and custom_dataset : True
checking dyC
-- identical in dataset_2 and custom_dataset : True
checking rAw
-- identical in dataset_2 and custom_dataset : True
checking rAs
-- identical in dataset_2 and custom_dataset : True
checking drC
-- identical in dataset_2 and custom_dataset : True
checking drF
-- identical in dataset_2 and custom_dataset : True
checking PHrefC
-- identical in dataset_2 and custom_dataset : True
checking PHrefF
-- identical in dataset_2 and custom_dataset : True
checking hFacC
-- identical in dataset_2 and custom_dataset : True
checking hFacW
-- identical in dataset_2 and custom_dataset : True
checking hFacS
-- identical in dataset_2 and custom_dataset : True
checking maskC
-- identical in dataset_2 and custom_dataset : True
checking maskW
-- identical in dataset_2 and custom_dataset : True
checking maskS
-- identical in dataset_2 and custom_dataset : True

SSH and THETA are the same in both datasets, as are most of the grid parameters. But what’s happening with the masks (maskC, maskW, maskS)? Let’s check their data type in each of the datasets:

[15]:

print('custom_dataset.maskC.dtype: ' + str(custom_dataset.maskC.dtype))
print('custom_dataset.maskW.dtype: ' + str(custom_dataset.maskW.dtype))
print('custom_dataset.maskS.dtype: ' + str(custom_dataset.maskS.dtype))
print('dataset_2.maskC.dtype: ' + str(dataset_2.maskC.dtype))
print('dataset_2.maskW.dtype: ' + str(dataset_2.maskW.dtype))
print('dataset_2.maskS.dtype: ' + str(dataset_2.maskS.dtype))

custom_dataset.maskC.dtype: bool
custom_dataset.maskW.dtype: bool
custom_dataset.maskS.dtype: bool
dataset_2.maskC.dtype: bool
dataset_2.maskW.dtype: bool
dataset_2.maskS.dtype: bool

So for some reason after these boolean (bool) fields were saved in the file, they were re-opened as float32 floating-point numbers. Let’s fix this and then compare the masks again.

[16]:

dataset_2.maskC.data = dataset_2.maskC.data.astype('bool')
dataset_2.maskW.data = dataset_2.maskW.data.astype('bool')
dataset_2.maskS.data = dataset_2.maskS.data.astype('bool')

for key in dataset_2.keys():
    print ('checking %s ' % key)
    print ('-- identical in dataset_2 and custom_dataset : %s'\
           % np.allclose(dataset_2[key], custom_dataset[key], equal_nan=True))

checking SSH
-- identical in dataset_2 and custom_dataset : True
checking THETA
-- identical in dataset_2 and custom_dataset : True
checking CS
-- identical in dataset_2 and custom_dataset : True
checking SN
-- identical in dataset_2 and custom_dataset : True
checking rA
-- identical in dataset_2 and custom_dataset : True
checking dxG
-- identical in dataset_2 and custom_dataset : True
checking dyG
-- identical in dataset_2 and custom_dataset : True
checking Depth
-- identical in dataset_2 and custom_dataset : True
checking rAz
-- identical in dataset_2 and custom_dataset : True
checking dxC
-- identical in dataset_2 and custom_dataset : True
checking dyC
-- identical in dataset_2 and custom_dataset : True
checking rAw
-- identical in dataset_2 and custom_dataset : True
checking rAs
-- identical in dataset_2 and custom_dataset : True
checking drC
-- identical in dataset_2 and custom_dataset : True
checking drF
-- identical in dataset_2 and custom_dataset : True
checking PHrefC
-- identical in dataset_2 and custom_dataset : True
checking PHrefF
-- identical in dataset_2 and custom_dataset : True
checking hFacC
-- identical in dataset_2 and custom_dataset : True
checking hFacW
-- identical in dataset_2 and custom_dataset : True
checking hFacS
-- identical in dataset_2 and custom_dataset : True
checking maskC
-- identical in dataset_2 and custom_dataset : True
checking maskW
-- identical in dataset_2 and custom_dataset : True
checking maskS
-- identical in dataset_2 and custom_dataset : True

Now we can confirm that all the variables in the datasets match!

Summary¶

Saving Datasets and DataArrays to disk as NetCDF files is fun and easy with xarray!

Saving `Datasets` and `DataArrays` to NetCDF¶

Objectives¶

Introduction¶

Syntax¶

Saving an existing `Dataset` to NetCDF¶

Saving a new custom `Dataset` to NetCDF¶

Verifying our new NetCDF files¶

Compare theta_dataset with dataset_1¶

Compare custom_dataset with dataset_2¶

Saving the results of calculations¶

Calculations in the form of `DataArrays`¶

Calculations in the form of `numpy ndarrays`¶

Summary¶

Saving Datasets and DataArrays to NetCDF¶

Objectives¶

Introduction¶

Syntax¶

Saving an existing Dataset to NetCDF¶

Saving a new custom Dataset to NetCDF¶

Verifying our new NetCDF files¶

Compare theta_dataset with dataset_1¶

Compare custom_dataset with dataset_2¶

Saving the results of calculations¶

Calculations in the form of DataArrays¶

Calculations in the form of numpy ndarrays¶

Summary¶

Saving `Datasets` and `DataArrays` to NetCDF¶

Saving an existing `Dataset` to NetCDF¶

Saving a new custom `Dataset` to NetCDF¶

Calculations in the form of `DataArrays`¶

Calculations in the form of `numpy ndarrays`¶