Saving Datasets and DataArrays to NetCDF

Objectives

Introduce an easy method for saving Datasets and DataArrays objects to NetCDF

Introduction

Saving your Datasets and DataArrays objects to NetCDF files couldn’t be simpler. The xarray module that we’ve been using to load NetCDF files provides methods for saving your Datasets and DataArrays as NetCDF files.

Here is the manual page on the subjet: http://xarray.pydata.org/en/stable/generated/xarray.Dataset.to_netcdf.html

The method ._to_netcdf( ) is available to both Datasets and DataArrays objects. So useful!

Syntax

your_dataset.to_netcdf('/your_filepath/your_netcdf_filename.nc')

Saving an existing Dataset to NetCDF

First, let’s set up the environment and load a Dataset

[1]:
import numpy as np
import xarray as xr
import sys
import matplotlib.pyplot as plt
import json
import sys
[2]:
## Import the ecco_v4_py library into Python
## =========================================

#import ecco_v4_py as ecco

## -- If ecco_v4_py is not installed in your local Python library,
##    tell Python where to find it.  For example, if your ecco_v4_py
##    files are in /Users/ifenty/ECCOv4-py/ecco_v4_py, then use:
sys.path.append('/home/ifenty/ECCOv4-py')
import ecco_v4_py as ecco

load a single tile, monthly averaged THETA for March 2010 for model tile 2.

[3]:
## Set top-level file directory for the ECCO NetCDF files
## =================================================================
# base_dir = '/home/username/'
base_dir = '/home/ifenty/ECCOv4-release'

## define a high-level directory for ECCO fields
ECCO_dir = base_dir + '/Release3_alt'
[4]:
## LOAD NETCDF FILE
## ================

# directory of the file
data_dir= ECCO_dir + '/nctiles_monthly/THETA/'

# filename
fname = 'THETA_2010.nc'

# load the dataset file using xarray
theta_dataset = xr.open_dataset(data_dir + fname).load()

Now that we’ve loaded theta_dataset, let’s save it in the /tmp file directory with a new name.

[5]:
new_filename_1 = './test_output.nc'
print ('saving to ', new_filename_1)
theta_dataset.to_netcdf(path=new_filename_1)
print ('finished saving')
saving to  ./test_output.nc
finished saving

It’s really that simple!

Saving a new custom Dataset to NetCDF

Now let’s create a new custom Dataset that with THETA, SSH and model grid parameter variables for a few tiles and depth level 10.

[6]:
data_dir= ECCO_dir + '/nctiles_monthly/'

SSH_THETA_201003 = \
    ecco.recursive_load_ecco_var_from_years_nc(data_dir, \
                                              ['SSH', 'THETA'], \
                                              tiles_to_load = [0,1,2],
                                              years_to_load = 2010)
grid_dir = ECCO_dir + '/nctiles_grid/'
grid = ecco.load_ecco_grid_nc(grid_dir, 'ECCOv4r3_grid.nc')
grid.close()

custom_dataset = xr.merge([SSH_THETA_201003, grid])
loading files of  THETA
loading files of  SSH

and now we can easily save it:

[7]:
new_filename_2 = './test_output_2.nc'
print ('saving to ', new_filename_2)
custom_dataset.to_netcdf(path=new_filename_2)
custom_dataset.close()
print ('finished saving')
saving to  ./test_output_2.nc
finished saving
[8]:
custom_dataset
[8]:
<xarray.Dataset>
Dimensions:    (i: 90, i_g: 90, j: 90, j_g: 90, k: 50, k_l: 50, k_p1: 51, k_u: 50, nv: 2, tile: 13, time: 12)
Coordinates:
  * tile       (tile) int64 0 1 2 3 4 5 6 7 8 9 10 11 12
  * j          (j) int32 0 1 2 3 4 5 6 7 8 9 ... 80 81 82 83 84 85 86 87 88 89
  * i          (i) int32 0 1 2 3 4 5 6 7 8 9 ... 80 81 82 83 84 85 86 87 88 89
  * k          (k) int32 0 1 2 3 4 5 6 7 8 9 ... 40 41 42 43 44 45 46 47 48 49
    Z          (k) float32 -5.0 -15.0 -25.0 -35.0 ... -5039.25 -5461.25 -5906.25
    PHrefC     (k) float32 49.05 147.15 245.25 ... 49435.043 53574.863 57940.312
    drF        (k) float32 10.0 10.0 10.0 10.0 10.0 ... 387.5 410.5 433.5 456.5
    XC         (tile, j, i) float32 -111.60647 -111.303 ... -111.86579
    YC         (tile, j, i) float32 -88.24259 -88.382515 ... -88.07871 -88.10267
    rA         (tile, j, i) float32 362256450.0 363300960.0 ... 361119100.0
    hFacC      (tile, k, j, i) float32 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0
    time_bnds  (time, nv) datetime64[ns] dask.array<chunksize=(12, 2), meta=np.ndarray>
    iter       (time) int32 dask.array<chunksize=(12,), meta=np.ndarray>
  * time       (time) datetime64[ns] 2010-01-16T12:00:00 ... 2010-12-16T12:00:00
  * i_g        (i_g) int64 0 1 2 3 4 5 6 7 8 9 ... 80 81 82 83 84 85 86 87 88 89
  * j_g        (j_g) int64 0 1 2 3 4 5 6 7 8 9 ... 80 81 82 83 84 85 86 87 88 89
  * k_u        (k_u) int64 0 1 2 3 4 5 6 7 8 9 ... 40 41 42 43 44 45 46 47 48 49
  * k_l        (k_l) int64 0 1 2 3 4 5 6 7 8 9 ... 40 41 42 43 44 45 46 47 48 49
  * k_p1       (k_p1) int64 0 1 2 3 4 5 6 7 8 9 ... 42 43 44 45 46 47 48 49 50
    XG         (tile, j_g, i_g) float32 -115.0 -115.0 ... -102.928925 -108.95171
    YG         (tile, j_g, i_g) float32 -88.17569 -88.31587 ... -88.02409
    CS         (tile, j, i) float32 0.06157813 0.06675376 ... -0.9983638
    SN         (tile, j, i) float32 -0.99810225 -0.9977695 ... -0.057182025
    Zp1        (k_p1) float32 0.0 -10.0 -20.0 -30.0 ... -5244.5 -5678.0 -6134.5
    Zu         (k_u) float32 -10.0 -20.0 -30.0 -40.0 ... -5244.5 -5678.0 -6134.5
    Zl         (k_l) float32 0.0 -10.0 -20.0 -30.0 ... -4834.0 -5244.5 -5678.0
    dxG        (tile, j_g, i) float32 15584.907 15589.316 ... 23142.107
    dyG        (tile, j, i_g) float32 23210.262 23273.26 ... 15595.26 15583.685
    Depth      (tile, j, i) float32 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
    rAz        (tile, j_g, i_g) float32 179944260.0 180486990.0 ... 364150620.0
    dxC        (tile, j, i_g) float32 15583.418 15588.104 ... 23406.256
    dyC        (tile, j_g, i) float32 11563.718 11593.785 ... 15578.138
    rAw        (tile, j, i_g) float32 361699460.0 362790240.0 ... 364760350.0
    rAs        (tile, j_g, i) float32 179944260.0 180486990.0 ... 364150620.0
    drC        (k_p1) float32 5.0 10.0 10.0 10.0 ... 399.0 422.0 445.0 228.25
    PHrefF     (k_p1) float32 0.0 98.1 196.2 ... 51448.547 55701.18 60179.445
    hFacW      (k, tile, j, i_g) float32 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0
    hFacS      (k, tile, j_g, i) float32 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0
    maskC      (k, tile, j, i) bool False False False ... False False False
    maskW      (k, tile, j, i_g) bool False False False ... False False False
    maskS      (k, tile, j_g, i) bool False False False ... False False False
    maskCtrlW  (k, tile, j, i_g) bool False False False ... False False False
    maskCtrlS  (k, tile, j_g, i) bool False False False ... False False False
    maskCtrlC  (k, tile, j, i) bool False False False ... False False False
Dimensions without coordinates: nv
Data variables:
    THETA      (time, tile, k, j, i) float32 dask.array<chunksize=(12, 13, 50, 90, 90), meta=np.ndarray>
    SSH        (time, tile, j, i) float32 dask.array<chunksize=(12, 13, 90, 90), meta=np.ndarray>

Verifying our new NetCDF files

To verify that to_netcdf() worked, load them and compare with the originals.

Compare theta_dataset with dataset_1

[9]:
# the first test dataset
dataset_1 = xr.open_dataset(new_filename_1)

# release the file handle (not necessary but generally a good idea)
dataset_1.close()

The np.allclose method does element-by-element comparison of variables

[10]:
# loop through the data variables in dataset_1
for key in dataset_1.keys():
    print ('checking %s ' % key)
    print ('-- identical in dataset_1 and theta_dataset : %s' % \
           np.allclose(dataset_1[key], theta_dataset[key], equal_nan=True))

# note: ``equal_nan`` means nan==nan (default nan != nan)
checking THETA
-- identical in dataset_1 and theta_dataset : True

THETA is the same in both datasets.

Compare custom_dataset with dataset_2

[11]:
# our custom dataset
dataset_2 = xr.open_dataset(new_filename_2)
dataset_2.close()
print ('finished loading')
finished loading
[12]:
for key in dataset_2.keys():
    print ('checking %s ' % key)
    print ('-- identical in dataset_2 and custom_dataset : %s'\
           % np.allclose(dataset_2[key], custom_dataset[key], equal_nan=True))
checking THETA
-- identical in dataset_2 and custom_dataset : True
checking SSH
-- identical in dataset_2 and custom_dataset : True

THETA and SSH are the same in both datasets.

So nice to hear!

Saving the results of calculations

Calculations in the form of DataArrays

Often we would like to store the results of our calculations to disk. If your operations are made at the level of DataArray objects (and not the lower ndarray level) then you can use these same methods to save your results. All of the coordinates will be preserved (although attributes be lost). Let’s demonstrate by making a dummy calculation on SSH

SSH_{sq}(i) = SSH(i)^2

[13]:
SSH_sq = custom_dataset.SSH * custom_dataset.SSH

SSH_sq
[13]:
<xarray.DataArray 'SSH' (time: 12, tile: 13, j: 90, i: 90)>
dask.array<mul, shape=(12, 13, 90, 90), dtype=float32, chunksize=(12, 13, 90, 90), chunktype=numpy.ndarray>
Coordinates:
  * tile     (tile) int64 0 1 2 3 4 5 6 7 8 9 10 11 12
  * j        (j) int32 0 1 2 3 4 5 6 7 8 9 10 ... 80 81 82 83 84 85 86 87 88 89
  * i        (i) int32 0 1 2 3 4 5 6 7 8 9 10 ... 80 81 82 83 84 85 86 87 88 89
    XC       (tile, j, i) float32 -111.60647 -111.303 ... -105.58465 -111.86579
    YC       (tile, j, i) float32 -88.24259 -88.382515 ... -88.07871 -88.10267
    rA       (tile, j, i) float32 362256450.0 363300960.0 ... 361119100.0
    iter     (time) int32 158532 159204 159948 160668 ... 165084 165804 166548
  * time     (time) datetime64[ns] 2010-01-16T12:00:00 ... 2010-12-16T12:00:00
    CS       (tile, j, i) float32 0.06157813 0.06675376 ... -0.9983638
    SN       (tile, j, i) float32 -0.99810225 -0.9977695 ... -0.057182025
    Depth    (tile, j, i) float32 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0

SSH_sq is itself a DataArray.

Before saving, let’s give our new SSH_sq variable a better name and descriptive attributes.

[14]:
SSH_sq.name = 'SSH^2'
SSH_sq.attrs['long_name'] = 'Square of Surface Height Anomaly'
SSH_sq.attrs['units'] = 'm^2'

# Let's see the result
SSH_sq
[14]:
<xarray.DataArray 'SSH^2' (time: 12, tile: 13, j: 90, i: 90)>
dask.array<mul, shape=(12, 13, 90, 90), dtype=float32, chunksize=(12, 13, 90, 90), chunktype=numpy.ndarray>
Coordinates:
  * tile     (tile) int64 0 1 2 3 4 5 6 7 8 9 10 11 12
  * j        (j) int32 0 1 2 3 4 5 6 7 8 9 10 ... 80 81 82 83 84 85 86 87 88 89
  * i        (i) int32 0 1 2 3 4 5 6 7 8 9 10 ... 80 81 82 83 84 85 86 87 88 89
    XC       (tile, j, i) float32 -111.60647 -111.303 ... -105.58465 -111.86579
    YC       (tile, j, i) float32 -88.24259 -88.382515 ... -88.07871 -88.10267
    rA       (tile, j, i) float32 362256450.0 363300960.0 ... 361119100.0
    iter     (time) int32 158532 159204 159948 160668 ... 165084 165804 166548
  * time     (time) datetime64[ns] 2010-01-16T12:00:00 ... 2010-12-16T12:00:00
    CS       (tile, j, i) float32 0.06157813 0.06675376 ... -0.9983638
    SN       (tile, j, i) float32 -0.99810225 -0.9977695 ... -0.057182025
    Depth    (tile, j, i) float32 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
Attributes:
    long_name:  Square of Surface Height Anomaly
    units:      m^2

much better! Now we’ll save.

[15]:
new_filename_3 = './ssh_sq_DataArray.nc'
print ('saving to ', new_filename_3)

SSH_sq.to_netcdf(path=new_filename_3)
print ('finished saving')
saving to  ./ssh_sq_DataArray.nc
finished saving

Calculations in the form of numpy ndarrays

If calculations are made at the ndarray level then the results will also be ndarrays.

[16]:
SSH_dummy_ndarray = custom_dataset.SSH.values *  custom_dataset.SSH.values

type(SSH_dummy_ndarray)
[16]:
numpy.ndarray

You’ll need to use different methods to save these results to NetCDF files, one of which is described here: http://pyhogs.github.io/intro_netcdf4.html

Summary

Saving Datasets and DataArrays to disk as NetCDF files is fun and easy with xarray!