{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# The Dataset and DataArray objects used in the ECCOv4 Python package.\n", "\n", "## Objectives\n", "\n", "To introduce the two high-level data structures, `Dataset` and `DataArray`, that are used in by the `ecco_v4_py` Python package to load and store the ECCO v4 model grid parameters and state estimate variables.\n", "\n", "## Introduction\n", "\n", "The ECCO version 4 release 4 (v4r4) files are provided as NetCDF files. This tutorial shows you how to download and open these files using Python code, and takes a look at the structure of these files. The ECCO output is available as a number of **datasets** that each contain a few variables. Each dataset consists of files corresponding to a single time coordinate (monthly mean, daily mean, or snapshot). Each dataset file that represents a single time is called a **granule**.\n", "\n", "In this first tutorial we will start slowly, providing detail at every step. Later tutorials will assume knowledge of some basic operations introduced here.\n", "\n", "Let's get started.\n", "\n", "## Import external packages and modules\n", "\n", "Before using Python libraries we must import them. Usually this is done at the beginning of every Python program or interactive Juypter notebook instance but one can import a library at any point in the code. Python libraries, called **packages**, contain subroutines and/or define data structures that provide useful functionality.\n", "\n", "Before we go further, let's import some packages needed for this tutorial:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# NumPy is the fundamental package for scientific computing with Python. \n", "# It contains among other things:\n", "# a powerful N-dimensional array object\n", "# sophisticated (broadcasting) functions\n", "# tools for integrating C/C++ and Fortran code\n", "# useful linear algebra, Fourier transform, and random number capabilities\n", "# http://www.numpy.org/\n", "#\n", "# make all functions from the 'numpy' module available with the prefix 'np'\n", "import numpy as np\n", "\n", "# xarray is an open source project and Python package that aims to bring the \n", "# labeled data power of pandas to the physical sciences, by providing\n", "# N-dimensional variants of the core pandas data structures.\n", "# Our approach adopts the Common Data Model for self- describing scientific \n", "# data in widespread use in the Earth sciences: xarray.Dataset is an in-memory\n", "# representation of a netCDF file.\n", "# http://xarray.pydata.org/en/stable/\n", "#\n", "# import all function from the 'xarray' module available with the prefix 'xr'\n", "import xarray as xr\n", "\n", "# are you working in the AWS Cloud, region us-west-2?\n", "incloud_access = False" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load the ECCO Version 4 Python package\n", "\n", "The *ecco_v4_py* is a Python package written specifically for working with ECCO NetCDF output.\n", "\n", "See the \"Getting Started\" page in the tutorial for instructions about installing the *ecco_v4_py* module on your machine." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from os.path import expanduser,join,isdir\n", "\n", "## Import the ecco_v4_py library into Python\n", "## =========================================\n", "## -- If ecco_v4_py is not installed in your local Python library, \n", "## tell Python where to find it using sys.path.append.\n", "## For example, if your ecco_v4_py files are in ~/ECCOv4-py/ecco_v4_py,\n", "## you can use:\n", "# import sys\n", "# user_home_dir = expanduser('~')\n", "# ecco_v4_py_dir = join(user_home_dir,'ECCOv4-py')\n", "# if isdir(ecco_v4_py_dir):\n", "# sys.path.insert(0,ecco_v4_py_dir)\n", "\n", "import ecco_v4_py as ecco\n", "import ecco_access as ea" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The syntax \n", "\n", "```Python\n", " import XYZ package as ABC\n", "```\n", "\n", "allows you to access all of the subroutines and/or objects in a package with perhaps a long complicated name with a shorter, easier name.\n", "\n", "Here, we import `ecco_v4_py` as `ecco` because typing `ecco` is easier than `ecco_v4_py` every time. Also, `ecco_v4_py` is actually comprised of multiple python modules and by importing just `ecco_v4_py` we can actually access all of the subroutines in those modules as well. Fancy. Furthermore, we import the `ecco_access` package using `ea` as shorthand: `import ecco_access as ea`.\n", "\n", "\n", "## Downloading and opening state estimate NetCDF files (datasets)\n", "\n", "You can access the ECCOv4r4 files through PO.DAAC, either by downloading them to your own machine, or downloading or opening them while working in the Amazon Web Services (AWS) Cloud. The [ecco_access package](https://ecco-access.readthedocs.io) helps with data access using a variety of modes; see the [ecco_access modes](https://ecco-access.readthedocs.io/ECCO_access_modes.html) tutorial for more information about each of these different modes. If you are not working in the AWS Cloud, the `download_ifspace` mode is good to use, since it will prevent download of files if they will take up more than a specified fraction of your available storage. Directories can be appended to your path using `sys.path.append`.\n", "\n", "To open ECCO v4's NetCDF files we will use the *open_mfdataset* command from the Python package [xarray](http://xarray.pydata.org/en/stable/index.html). `xarray` has the *open_dataset* routine which creates a `Dataset` object and loads the contents of the NetCDF file, including its metadata, into a data structure. The *open_mfdataset* routine does the same thing, but also concatenates multiple netCDF files with compatible dimensions and coordinates--a very handy feature!\n", "\n", "Let's download and open the monthly mean temperature/salinity files for 2010." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "created download directory C:\\Users\\adelman\\Downloads\\ECCO_V4r4_PODAAC\\ECCO_L4_TEMP_SALINITY_LLC0090GRID_MONTHLY_V4R4\n", "\n", "Total number of matching granules: 12\n", "DL Progress: 100%|#########################| 12/12 [00:15<00:00, 1.29s/it]\n", "\n", "=====================================\n", "total downloaded: 208.75 Mb\n", "avg download speed: 13.48 Mb/s\n", "Time spent = 15.480218887329102 seconds\n" ] } ], "source": [ "# indicate mode of access\n", "# options are:\n", "# 'download': direct download from internet to your local machine\n", "# 'download_ifspace': like download, but only proceeds \n", "# if your machine have sufficient storage\n", "# 's3_open': access datasets in-cloud from an AWS instance\n", "# 's3_open_fsspec': use jsons generated with fsspec and \n", "# kerchunk libraries to speed up in-cloud access\n", "# 's3_get': direct download from S3 in-cloud to an AWS instance\n", "# 's3_get_ifspace': like s3_get, but only proceeds if your instance \n", "# has sufficient storage\n", "\n", "\n", "\n", "## Set top-level directory for the ECCO NetCDF files \n", "## (or json lookup files if using mode = 's3_open_fsspec')\n", "\n", "\n", "if incloud_access:\n", " access_mode = 's3_open_fsspec'\n", " download_root_dir = None\n", " jsons_root_dir = join(user_home_dir,'MZZ')\n", "else:\n", " access_mode = 'download_ifspace'\n", " download_root_dir = join(user_home_dir,'Downloads','ECCO_V4r4_PODAAC')\n", " jsons_root_dir = None\n", "\n", "\n", "\n", "ShortName = \"ECCO_L4_TEMP_SALINITY_LLC0090GRID_MONTHLY_V4R4\"\n", " \n", "# # Method 1: use ecco_podaac_access\n", "# # \n", "# # retrieve files\n", "# files_dict = ea.ecco_podaac_access(ShortName,\\\n", "# StartDate='2010-01',EndDate='2010-12',\\\n", "# mode=access_mode,\\\n", "# download_root_dir=download_root_dir,\\\n", "# jsons_root_dir=jsons_root_dir,\\\n", "# max_avail_frac=0.5)\n", "# # load file into workspace\n", "# ds = xr.open_mfdataset(files_dict[ShortName],parallel=True,\\\n", "# data_vars='minimal',coords='minimal',compat='override')\n", "\n", "# # Method 2: use ecco_podaac_to_xrdataset\n", "\n", "ds = ea.ecco_podaac_to_xrdataset(ShortName,\\\n", " StartDate='2010-01',EndDate='2010-12',\\\n", " mode=access_mode,\\\n", " download_root_dir=download_root_dir,\\\n", " jsons_root_dir=jsons_root_dir,\\\n", " max_avail_frac=0.5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What is *ds*? It is a `Dataset` object which is defined somewhere deep in the `xarray` package:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "xarray.core.dataset.Dataset" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(ds)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Dataset object \n", "\n", "According to the xarray documentation, a [Dataset](http://xarray.pydata.org/en/stable/generated/xarray.Dataset.html) is a Python object designed as an \"in-memory representation of the data model from the NetCDF file format.\"\n", "\n", "What does that mean? NetCDF files are *self-describing* in the sense that they [include information about the data they contain](https://www.unidata.ucar.edu/software/netcdf/docs/faq.html). When `Datasets` are created by loading a NetCDF file they load all of the same data and metadata.\n", "\n", "Just as a NetCDF file can contain many variables, a `Dataset` can contain many variables. These variables are referred to as `Data Variables` in the `xarray` nomenclature.\n", "\n", "`Datasets` contain three main classes of fields:\n", "\n", "1. **Coordinates** : arrays identifying the coordinates of the data variables\n", "2. **Data Variables**: the data variable arrays and their associated coordinates\n", "3. **Attributes** : metadata describing the dataset\n", "\n", "Now that we've loaded the 2010 monthly mean files of potential temperature and salinity as the *ds* `Dataset` object, let's examine its contents. \n", "\n", "> **Note:** *You can get information about objects and their contents by typing the name of the variable and hitting **enter** in an interactive session of an IDE such as Spyder or by executing the cell of a Jupyter notebook.*\n", "\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.Dataset> Size: 510MB\n",
       "Dimensions:    (i: 90, i_g: 90, j: 90, j_g: 90, k: 50, k_u: 50, k_l: 50,\n",
       "                k_p1: 51, tile: 13, time: 12, nv: 2, nb: 4)\n",
       "Coordinates: (12/22)\n",
       "  * i          (i) int32 360B 0 1 2 3 4 5 6 7 8 9 ... 81 82 83 84 85 86 87 88 89\n",
       "  * i_g        (i_g) int32 360B 0 1 2 3 4 5 6 7 8 ... 81 82 83 84 85 86 87 88 89\n",
       "  * j          (j) int32 360B 0 1 2 3 4 5 6 7 8 9 ... 81 82 83 84 85 86 87 88 89\n",
       "  * j_g        (j_g) int32 360B 0 1 2 3 4 5 6 7 8 ... 81 82 83 84 85 86 87 88 89\n",
       "  * k          (k) int32 200B 0 1 2 3 4 5 6 7 8 9 ... 41 42 43 44 45 46 47 48 49\n",
       "  * k_u        (k_u) int32 200B 0 1 2 3 4 5 6 7 8 ... 41 42 43 44 45 46 47 48 49\n",
       "    ...         ...\n",
       "    Zu         (k_u) float32 200B dask.array<chunksize=(50,), meta=np.ndarray>\n",
       "    Zl         (k_l) float32 200B dask.array<chunksize=(50,), meta=np.ndarray>\n",
       "    time_bnds  (time, nv) datetime64[ns] 192B dask.array<chunksize=(1, 2), meta=np.ndarray>\n",
       "    XC_bnds    (tile, j, i, nb) float32 2MB dask.array<chunksize=(13, 90, 90, 4), meta=np.ndarray>\n",
       "    YC_bnds    (tile, j, i, nb) float32 2MB dask.array<chunksize=(13, 90, 90, 4), meta=np.ndarray>\n",
       "    Z_bnds     (k, nv) float32 400B dask.array<chunksize=(50, 2), meta=np.ndarray>\n",
       "Dimensions without coordinates: nv, nb\n",
       "Data variables:\n",
       "    THETA      (time, k, tile, j, i) float32 253MB dask.array<chunksize=(1, 25, 7, 45, 45), meta=np.ndarray>\n",
       "    SALT       (time, k, tile, j, i) float32 253MB dask.array<chunksize=(1, 25, 7, 45, 45), meta=np.ndarray>\n",
       "Attributes: (12/62)\n",
       "    acknowledgement:                 This research was carried out by the Jet...\n",
       "    author:                          Ian Fenty and Ou Wang\n",
       "    cdm_data_type:                   Grid\n",
       "    comment:                         Fields provided on the curvilinear lat-l...\n",
       "    Conventions:                     CF-1.8, ACDD-1.3\n",
       "    coordinates_comment:             Note: the global 'coordinates' attribute...\n",
       "    ...                              ...\n",
       "    time_coverage_duration:          P1M\n",
       "    time_coverage_end:               2010-02-01T00:00:00\n",
       "    time_coverage_resolution:        P1M\n",
       "    time_coverage_start:             2010-01-01T00:00:00\n",
       "    title:                           ECCO Ocean Temperature and Salinity - Mo...\n",
       "    uuid:                            f4291248-4181-11eb-82cd-0cc47a3f446d
" ], "text/plain": [ " Size: 510MB\n", "Dimensions: (i: 90, i_g: 90, j: 90, j_g: 90, k: 50, k_u: 50, k_l: 50,\n", " k_p1: 51, tile: 13, time: 12, nv: 2, nb: 4)\n", "Coordinates: (12/22)\n", " * i (i) int32 360B 0 1 2 3 4 5 6 7 8 9 ... 81 82 83 84 85 86 87 88 89\n", " * i_g (i_g) int32 360B 0 1 2 3 4 5 6 7 8 ... 81 82 83 84 85 86 87 88 89\n", " * j (j) int32 360B 0 1 2 3 4 5 6 7 8 9 ... 81 82 83 84 85 86 87 88 89\n", " * j_g (j_g) int32 360B 0 1 2 3 4 5 6 7 8 ... 81 82 83 84 85 86 87 88 89\n", " * k (k) int32 200B 0 1 2 3 4 5 6 7 8 9 ... 41 42 43 44 45 46 47 48 49\n", " * k_u (k_u) int32 200B 0 1 2 3 4 5 6 7 8 ... 41 42 43 44 45 46 47 48 49\n", " ... ...\n", " Zu (k_u) float32 200B dask.array\n", " Zl (k_l) float32 200B dask.array\n", " time_bnds (time, nv) datetime64[ns] 192B dask.array\n", " XC_bnds (tile, j, i, nb) float32 2MB dask.array\n", " YC_bnds (tile, j, i, nb) float32 2MB dask.array\n", " Z_bnds (k, nv) float32 400B dask.array\n", "Dimensions without coordinates: nv, nb\n", "Data variables:\n", " THETA (time, k, tile, j, i) float32 253MB dask.array\n", " SALT (time, k, tile, j, i) float32 253MB dask.array\n", "Attributes: (12/62)\n", " acknowledgement: This research was carried out by the Jet...\n", " author: Ian Fenty and Ou Wang\n", " cdm_data_type: Grid\n", " comment: Fields provided on the curvilinear lat-l...\n", " Conventions: CF-1.8, ACDD-1.3\n", " coordinates_comment: Note: the global 'coordinates' attribute...\n", " ... ...\n", " time_coverage_duration: P1M\n", " time_coverage_end: 2010-02-01T00:00:00\n", " time_coverage_resolution: P1M\n", " time_coverage_start: 2010-01-01T00:00:00\n", " title: ECCO Ocean Temperature and Salinity - Mo...\n", " uuid: f4291248-4181-11eb-82cd-0cc47a3f446d" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Examining the Dataset object contents\n", "\n", "Let's go through *ds* piece by piece, starting from the top.\n", "\n", "#### 1. Object type\n", "``\n", "\n", "The top line tells us what type of object the variable is. *ds* is an instance of a`Dataset` defined in `xarray`.\n", "\n", "#### 2. Dimensions\n", "```Dimensions: (i: 90, i_g: 90, j: 90, j_g: 90, k: 50, k_u: 50, k_l: 50, k_p1: 51, tile: 13, time: 12, nv: 2, nb: 4)```\n", "\n", "The *Dimensions* list shows all of the different dimensions used by all of the different arrays stored in the NetCDF file (and now loaded in the `Dataset` object).\n", " \n", "Arrays may use any combination of these dimensions. In the case of this *grid* datasets, we find 1D (e.g., depth), 2D (e.g., lat/lon), and 3D (e.g., mask) arrays.\n", " \n", "The lengths of these dimensions are next to their name. There are 50 vertical levels in the ECCO v4 model grid, and `k` corresponds to the vertical dimension centered in the middle of each grid cell. `k_u`, `k_l`, and `k_p1` are also vertical dimensions, just centered on the bottom, top, and outside of each grid cell respectively. (`k_p1` has length 1 greater than the others since it includes both the bottom and top of each grid cell.) `i` and `j` correspond to horizontal dimensions centered in the middle of each grid cell, while `i_g` and `j_g` are centered on the `u` and `v` faces of each grid cell respectively. The lat-lon-cap grid has 13 tiles. This dataset has 12 monthly-mean records for 2010. The dimension `nv` is a time dimension that corresponds to the start and end times of the monthly-mean averaging periods. In other words, for every 1 month, there are 2 (nv = 2) time bounds `time_bnds`, one describing when the month started and the other when the month ended. SImilarly, the dimension `nb` corresponds to the horizontal corners of each grid cell, with each grid cell having 4 corners with coordinates given in `XC_bnds` and `YC_bnds`.\n", "\n", "> **Note:** Each tile in the llc90 grid used by ECCO v4 has 90x90 horizontal grid points. That's where the 90 in llc**90** comes from! \n", "\n", "#### 3. Coordinates\n", "\n", "Some coordinates have an asterisk **\"\\*\"** in front of their names. They are known as *dimension coordinates* and are always one-dimensional arrays of length $n$ which specify the length of arrays in the dataset in different dimensions.\n", "\n", "```\n", "Coordinates:\n", " * j (j) int32 0 1 2 3 4 5 6 7 8 9 ... 80 81 82 83 84 85 86 87 88 89\n", " * i (i) int32 0 1 2 3 4 5 6 7 8 9 ... 80 81 82 83 84 85 86 87 88 89\n", " * k (k) int32 0 1 2 3 4 5 6 7 8 9 ... 40 41 42 43 44 45 46 47 48 49\n", " * tile (tile) int32 0 1 2 3 4 5 6 7 8 9 10 11 12\n", " * time (time) datetime64[ns] 2010-01-16T12:00:00 ... 2010-12-16T12:00:00\n", "``` \n", " \n", "These [coordinates](http://xarray.pydata.org/en/stable/data-structures.html#coordinates) are arrays whose values *label* each grid cell in the arrays. They are used for *label-based indexing* and *alignment*.\n", "\n", "Let's look at the three primary spatial coordiates, i, j, k. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "grid index in x for variables at tracer and 'v' locations\n", "grid index in y for variables at tracer and 'u' locations\n", "grid index in z for tracer variables\n" ] } ], "source": [ "print(ds.i.long_name)\n", "print(ds.j.long_name)\n", "print(ds.k.long_name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`i` indexes (or labels) the tracer grid cells in the `x` direction, `j` indexes the tracer grid cells in the `y` direction, and similarly `k` indexes the tracer grid cells in the `z` direction." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 4. Data Variables\n", "```\n", "Data variables:\n", " THETA (time, tile, k, j, i) float32 ...\n", "``` \n", "The *Data Variables* are one or more `xarray.DataArray` objects. `DataArray` objects are labeled, multi-dimensional arrays that may also contain metadata (attributes). `DataArray` objects are very important to understand because they are container objects which store the numerical arrays of the state estimate fields. We'll investigate these objects in more detail after completing our survey of this `Dataset`.\n", "\n", "In this NetCDF file one *Data variables*, `THETA`, which is stored as a five dimensional array (**time, tile, k,j,i**) field of average potential temperature. The llc grid has 13 tiles. Each tile has two horizontal dimensions (i,j) and one vertical dimension (k).\n", " \n", "`THETA` is stored here as a 32 bit floating point precision.\n", " \n", "> **Note:** The meaning of all MITgcm grid parameters can be found [here](https://mitgcm.readthedocs.io/en/latest/algorithm/horiz-grid.html).\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 5. Attributes\n", "```\n", "Attributes:\n", " acknowledgement: This research was carried out by the Jet...\n", " author: Ian Fenty and Ou Wang\n", " cdm_data_type: Grid\n", " comment: Fields provided on the curvilinear lat-l...\n", " Conventions: CF-1.8, ACDD-1.3\n", " coordinates_comment: Note: the global 'coordinates' attribute...\n", " creator_email: ecco-group@mit.edu\n", " creator_institution: NASA Jet Propulsion Laboratory (JPL)\n", " creator_name: ECCO Consortium\n", " creator_type: group\n", " creator_url: https://ecco-group.org\n", "```\n", " \n", "The `attrs` variable is a Python [dictionary object](https://www.python-course.eu/dictionaries.php) containing metadata or any auxilliary information.\n", " \n", "Metadata is presented as a set of dictionary `key-value` pairs. Here the `keys` are *description, A, B, ... missing_value.* while the `values` are the corresponding text and non-text values. \n", " \n", "To see the metadata `value` associated with the metadata `key` called \"Conventions\" we can print the value as follows:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CF-1.8, ACDD-1.3\n" ] } ], "source": [ "print (ds.attrs['Conventions'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\"CF-1.8\" tells us that ECCO NetCDF output conforms to the [**Climate and Forecast Conventions version 1.8**](http://cfconventions.org/). How convenient. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Map of the `Dataset` object\n", "\n", "Now that we've completed our survey, we see that a `Dataset` is a really a kind of *container* comprised of (actually pointing to) many other objects. \n", "\n", "+ dims: A `dict` that maps dimension names (keys) with dimension lengths (values)\n", "+ coords: A `dict` that maps dimension names (keys such as **k, j, i**) with arrays that label each point in the dimension (values) \n", "+ One or more *Data Variables* that are pointers to `DataArray` objects \n", "+ attrs A `dict` that maps different attribute names (keys) with the attributes themselves (values).\n", "\n", "![Dataset-diagram](../figures/Dataset-diagram.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The `DataArray` Object\n", "\n", "It is worth looking at the `DataArray` object in more detail because `DataArrays` store the arrays that store the ECCO output. Please see the [xarray documentation on the DataArray object](http://xarray.pydata.org/en/stable/data-structures.html#dataarray) for more information.\n", "\n", "`DataArrays` are actually very similar to `Datasets`. They also contain dimensions, coordinates, and attributes. The two main differences between `Datasets` and `DataArrays` is that `DataArrays` have a **name** (a string) and an array of **values**. The **values** array is a [numpy n-dimensional array](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.array.html), an `ndarray`.\n", "\n", "### Examining the contents of a `DataArray` \n", "\n", "Let's examine the contents of one of the coordinates `DataArrays` found in *ds*, *XC*. " ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'XC' (tile: 13, j: 90, i: 90)> Size: 421kB\n",
       "dask.array<open_dataset-XC, shape=(13, 90, 90), dtype=float32, chunksize=(13, 90, 90), chunktype=numpy.ndarray>\n",
       "Coordinates:\n",
       "  * i        (i) int32 360B 0 1 2 3 4 5 6 7 8 9 ... 81 82 83 84 85 86 87 88 89\n",
       "  * j        (j) int32 360B 0 1 2 3 4 5 6 7 8 9 ... 81 82 83 84 85 86 87 88 89\n",
       "  * tile     (tile) int32 52B 0 1 2 3 4 5 6 7 8 9 10 11 12\n",
       "    XC       (tile, j, i) float32 421kB dask.array<chunksize=(13, 90, 90), meta=np.ndarray>\n",
       "    YC       (tile, j, i) float32 421kB dask.array<chunksize=(13, 90, 90), meta=np.ndarray>\n",
       "Attributes:\n",
       "    long_name:              longitude of tracer grid cell center\n",
       "    units:                  degrees_east\n",
       "    coordinate:             YC XC\n",
       "    bounds:                 XC_bnds\n",
       "    comment:                nonuniform grid spacing\n",
       "    coverage_content_type:  coordinate\n",
       "    standard_name:          longitude
" ], "text/plain": [ " Size: 421kB\n", "dask.array\n", "Coordinates:\n", " * i (i) int32 360B 0 1 2 3 4 5 6 7 8 9 ... 81 82 83 84 85 86 87 88 89\n", " * j (j) int32 360B 0 1 2 3 4 5 6 7 8 9 ... 81 82 83 84 85 86 87 88 89\n", " * tile (tile) int32 52B 0 1 2 3 4 5 6 7 8 9 10 11 12\n", " XC (tile, j, i) float32 421kB dask.array\n", " YC (tile, j, i) float32 421kB dask.array\n", "Attributes:\n", " long_name: longitude of tracer grid cell center\n", " units: degrees_east\n", " coordinate: YC XC\n", " bounds: XC_bnds\n", " comment: nonuniform grid spacing\n", " coverage_content_type: coordinate\n", " standard_name: longitude" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds.XC" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Examining the `DataArray`\n", "\n", "The layout of `DataArrays` is very similar to those of `Datasets`. Let's examine each part of *ds.XC*, starting from the top.\n", "\n", "#### 1. Object type\n", "``\n", "\n", "This is indeed a `DataArray` object from the `xarray` package.\n", "\n", "> Note: You can also find the type of an object with the `type` command: `print type(ds.XC)`" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "print (type(ds.XC))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 2. Object Name\n", "`XC`\n", "\n", "The top line shows `DataArray` name, `XC`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3. Dimensions\n", "`(tile: 13, j: 90, i: 90)` \n", "\n", "Unlike *THETA*, *XC* does not have time or depth dimensions which makes sense since the longitude of the grid cell centers do not vary with time or depth." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 4. The `numpy` Array\n", "````\n", "array([[[-111.60647 , -111.303 , -110.94285 , ..., 64.791115,\n", " 64.80521 , 64.81917 ],\n", " [-104.8196 , -103.928444, -102.87706 , ..., 64.36745 ,\n", " 64.41012 , 64.4524 ],\n", " [ -98.198784, -96.788055, -95.14185 , ..., 63.936497,\n", " 64.008224, 64.0793 ],\n", " ...,\n", "````\n", "\n", "In `Dataset` objects there are *Data variables*. In `DataArray` objects we find `numpy` **arrays**. Python prints out a subset of the entire array. \n", "\n", "> **Note**: `DataArrays` store **only one** array while `DataSets` can store **one or more** `DataArrays`.\n", "\n", "We access the `numpy` array by invoking the `.values` command on the `DataArray`." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "print(type(ds.XC.values))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The array that is returned is a numpy n-dimensional array:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "numpy.ndarray" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(ds.XC.values)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Being a numpy array, one can use all of the numerical operations provided by the numpy module on it.\n", "\n", "\n", "> ** Note: ** You may find it useful to learn about the operations that can be made on numpy arrays. Here is a quickstart guide: \n", "https://docs.scipy.org/doc/numpy-dev/user/quickstart.html\n", "\n", "We'll learn more about how to access the values of this array in a later tutorial. For now it is sufficient to know how to access the arrays!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 4. Coordinates\n", "\n", "The dimensional coordinates (with the asterixes) are\n", "```\n", "Coordinates:\n", " * j (j) int32 0 1 2 3 4 5 6 7 8 9 10 ... 80 81 82 83 84 85 86 87 88 89\n", " * i (i) int32 0 1 2 3 4 5 6 7 8 9 10 ... 80 81 82 83 84 85 86 87 88 89\n", " * tile (tile) int32 0 1 2 3 4 5 6 7 8 9 10 11 12\n", "```\n", "\n", "We find three 1D arrays with coordinate labels for **j**, **i**, and **tile**." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Coordinates:\n", " * i (i) int32 360B 0 1 2 3 4 5 6 7 8 9 ... 81 82 83 84 85 86 87 88 89\n", " * j (j) int32 360B 0 1 2 3 4 5 6 7 8 9 ... 81 82 83 84 85 86 87 88 89\n", " * tile (tile) int32 52B 0 1 2 3 4 5 6 7 8 9 10 11 12\n", " XC (tile, j, i) float32 421kB dask.array\n", " YC (tile, j, i) float32 421kB dask.array" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds.XC.coords" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "two other important coordinates here are `tile` and `time`" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tile: \n", "[ 0 1 2 3 4 5 6 7 8 9 10 11 12]\n", "time: \n", "['2010-01-16T12:00:00.000000000' '2010-02-15T00:00:00.000000000'\n", " '2010-03-16T12:00:00.000000000' '2010-04-16T00:00:00.000000000'\n", " '2010-05-16T12:00:00.000000000' '2010-06-16T00:00:00.000000000'\n", " '2010-07-16T12:00:00.000000000' '2010-08-16T12:00:00.000000000'\n", " '2010-09-16T00:00:00.000000000' '2010-10-16T12:00:00.000000000'\n", " '2010-11-16T00:00:00.000000000' '2010-12-16T12:00:00.000000000']\n" ] } ], "source": [ "print('tile: ')\n", "print(ds.tile.values)\n", "print('time: ')\n", "print(ds.time.values)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The files we loaded contain the monthly-mean potential temperature and salinity fields for each month in 2010. Here the time coordinates are the center of the averaging periods." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 5. Attributes\n", "```\n", "Attributes:\n", " units: degrees_east\n", " long_name: longitude at center of tracer cell\n", " standard_name: longitude_at_c_location\n", " valid_range: -180., 180.\n", "```\n", "\n", "The `XC` variable has a `long_name` (longitude at center of tracer cell) and units (degrees_east) and other information. This metadata was loaded from the NetCDF file. The entire attribute dictionary is accessed using `.attrs`." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'long_name': 'longitude of tracer grid cell center',\n", " 'units': 'degrees_east',\n", " 'coordinate': 'YC XC',\n", " 'bounds': 'XC_bnds',\n", " 'comment': 'nonuniform grid spacing',\n", " 'coverage_content_type': 'coordinate',\n", " 'standard_name': 'longitude'}" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds.XC.attrs" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "'degrees_east'" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds.XC.attrs['units']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Map of the `DataArray` Object\n", "\n", "The `DataArray` can be mapped out with the following diagram:\n", "\n", "![DataArray-diagram](../figures/DataArray-diagram.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary\n", "\n", "Now you know the basics of the `Dataset` and `DataArray` objects that will store the ECCO v4 model grid parameters and state estimate output variables. Go back and take a look athe grid $ds$ object that we originally loaded. It should make a lot more sense now!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.3" } }, "nbformat": 4, "nbformat_minor": 4 }