A Pseudoproxy Experiment with GraphEM

Expected time to run through: 10 mins

This tutorial demonstrates how to get a reconstruction using GraphEM, leveraging a simple pseudoproxy dataset generated from a subset of iCESM gridded dataset. The pseudoproxy are generated based on the original iCESM simulated tas plus white noise with SNR=1, using below code block:

# gen pseudoproxy

SNR = 1

pp = np.copy(tas_sub)
nt, nlat, nlon = np.shape(pp)

k = 0
for i in range(nlat):
    for j in range(nlon):
        tas_std = np.std(tas_sub[:,i,j])
        noise_std = tas_std / SNR

        np.random.seed(k)
        pp[:,i,j] += np.random.normal(0, noise_std, size=nt)

        k += 1

Test data preparation

To go through this tutorial, please prepare test data following the steps:

Download the test case named “PPE” with this link. Create a directory named “testcases” in the same directory where this notebook sits. Put the unzipped direcotry “PPE” into “testcases”.

Below, we first load some useful packages, including our GraphEM.

[1]:

%load_ext autoreload
%autoreload 2

import LMRt
import GraphEM
import os
import numpy as np
import pandas as pd
import xarray as xr
import matplotlib.pyplot as  plt

Reconstruction

[2]:

job = GraphEM.ReconJob()

[3]:

job.load_configs('./testcases/PPE/configs.yml', verbose=True)

GraphEM: job.load_configs() >>> loading reconstruction configurations from: ./testcases/PPE/configs.yml
GraphEM: job.load_configs() >>> job.configs created
GraphEM: job.load_configs() >>> job.configs["job_dirpath"] = /Users/fzhu/Github/GraphEM/docsrc/tutorial/testcases/PPE/recon
GraphEM: job.load_configs() >>> /Users/fzhu/Github/GraphEM/docsrc/tutorial/testcases/PPE/recon created
{'anom_period': [1951, 1980],
 'calib_period': [1750, 1849],
 'job_dirpath': '/Users/fzhu/Github/GraphEM/docsrc/tutorial/testcases/PPE/recon',
 'job_id': 'LMRt_quickstart',
 'obs_path': {'tas': './data/obs/iCESM_subset.nc'},
 'obs_varname': {'lat': 'lat', 'lon': 'lon', 'tas': 'tas'},
 'proxydb_path': './data/proxy/pseudoproxy_dataset.pkl',
 'ptype_list': ['coral.d18O'],
 'recon_period': [850, 1849]}

[4]:

job.load_proxydb(verbose=True)

GraphEM: job.load_proxydb() >>> job.configs["proxydb_path"] = /Users/fzhu/Github/GraphEM/docsrc/tutorial/testcases/PPE/data/proxy/pseudoproxy_dataset.pkl
GraphEM: job.load_proxydb() >>> 100 records loaded
GraphEM: job.load_proxydb() >>> job.proxydb created

[5]:

job.filter_proxydb(verbose=True)

GraphEM: job.filter_proxydb() >>> filtering proxy records according to: ['coral.d18O']
GraphEM: job.filter_proxydb() >>> 100 records remaining

[6]:

job.load_obs(verbose=True)

GraphEM: job.load_obs() >>> loading instrumental observation fields from: {'tas': '/Users/fzhu/Github/GraphEM/docsrc/tutorial/testcases/PPE/data/obs/iCESM_subset.nc'}
Time axis not overlap with the reference period [1951, 1980]; use its own time period as reference [850.00, 1849.00].
GraphEM: job.load_obs() >>> job.obs created

[7]:

print(job.obs)

Dataset Overview
-----------------------

     Name:  tas
   Source:  /Users/fzhu/Github/GraphEM/docsrc/tutorial/testcases/PPE/data/obs/iCESM_subset.nc
    Shape:  time:1000, lat:10, lon:10

Since the loaded iCESM simulation and the pseudoproxy dataset are already annualized, we can skip the .seasonalize() steps and run .prep_data() directly.

[8]:

job.prep_data(verbose=True)

GraphEM: job.prep_data() >>> job.recon_time created
GraphEM: job.prep_data() >>> job.calib_time created
GraphEM: job.prep_data() >>> job.calib_idx created
GraphEM: job.prep_data() >>> job.temp created
GraphEM: job.prep_data() >>> job.df_proxy created
GraphEM: job.prep_data() >>> job.proxy created
GraphEM: job.prep_data() >>> job.lonlat created

[9]:

job.df_proxy

[9]:

	pp_000	pp_001	pp_002	pp_003	pp_004	pp_005	pp_006	pp_007	pp_008	pp_009	...	pp_090	pp_091	pp_092	pp_093	pp_094	pp_095	pp_096	pp_097	pp_098	pp_099
850.0	300.751783	300.699385	299.718651	300.717691	299.793416	299.939352	299.472350	300.551011	299.553801	299.413664	...	299.249156	299.027787	299.182322	299.526141	299.161496	298.628304	298.802706	298.107757	298.818966	298.544758
851.0	300.496749	300.009875	300.193606	300.365040	300.347096	299.853183	300.380611	299.650771	300.502845	299.614965	...	299.103565	298.778987	299.141563	299.335989	299.410467	298.670001	298.838717	298.689713	299.370283	299.494514
852.0	300.902166	300.334244	299.602851	300.671260	300.102458	301.886292	300.641600	300.462630	299.210613	299.595454	...	299.289380	299.249218	299.134434	299.942948	298.988984	299.277950	298.905532	298.840735	299.414093	299.140887
853.0	301.729269	300.347924	301.572955	299.850196	301.104819	300.567805	300.145062	300.817268	299.709402	300.448917	...	299.573146	299.649931	298.838391	299.710779	300.040319	299.609617	299.473822	299.656169	299.019936	299.719184
854.0	301.378359	300.997108	299.769165	300.429641	300.280460	300.495473	298.998074	299.858000	298.895901	299.967372	...	300.078972	300.040279	299.551374	300.088576	298.790187	299.790875	299.441056	299.184892	299.035147	299.345002
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
1845.0	300.370235	300.113197	300.144971	299.726555	299.921271	300.382517	299.913085	299.789860	299.300107	299.261318	...	299.482236	299.063248	299.152310	298.784250	298.994037	298.300974	299.091353	298.409069	298.584221	298.305443
1846.0	299.985712	298.994364	300.207191	299.883894	300.684450	300.181971	300.745885	299.919197	298.775608	299.324020	...	299.544706	299.056843	299.350263	299.165946	299.459059	299.227588	299.156746	299.367724	299.111197	298.400678
1847.0	300.292753	300.177510	300.033955	300.236235	299.559872	299.877648	299.141349	299.497879	299.952182	299.577197	...	299.724081	300.424495	299.179294	299.375003	299.475237	299.947260	300.159204	299.223168	299.384652	299.483399
1848.0	299.823016	300.400975	300.223044	300.913630	300.261049	300.159285	299.311807	299.449942	299.660750	298.979497	...	299.829125	299.592597	300.013318	299.155958	299.107011	299.086289	299.878042	299.307766	299.593688	298.706442
1849.0	300.161352	300.159181	300.246488	299.820504	299.517483	299.951131	299.332212	299.316922	299.630661	299.881939	...	299.587589	298.878817	298.997188	298.895755	299.134630	299.284257	299.369434	299.400078	299.026855	299.096824

1000 rows × 100 columns

[10]:

print(np.shape(job.temp))
print(np.shape(job.proxy))
print(np.shape(job.lonlat))

(1000, 100)
(1000, 100)
(200, 2)

[11]:

job.save(verbose=True)

LMRt: job.save_job() >>> Prepration data saved to: /Users/fzhu/Github/GraphEM/docsrc/tutorial/testcases/PPE/recon/job.pkl
LMRt: job.save_job() >>> job.configs["prep_savepath"] = /Users/fzhu/Github/GraphEM/docsrc/tutorial/testcases/PPE/recon/job.pkl

[12]:

%%time

save_path = './testcases/PPE/recon/G.pkl'
job.run_solver(save_path=save_path, verbose=True)

Estimating graph using neighborhood method
Running GraphEM:

Iter     dXmis     rdXmis

001     0.0514     3.1516
002     0.4586     8.5151
003     0.0562     0.1136
004     0.0292     0.0566
005     0.0149     0.0287
006     0.0123     0.0235
007     0.0107     0.0203
008     0.0092     0.0174
009     0.0081     0.0152
010     0.0073     0.0137
011     0.0067     0.0126
012     0.0063     0.0117
013     0.0059     0.0110
014     0.0055     0.0103
015     0.0053     0.0098
016     0.0050     0.0094
017     0.0048     0.0090
018     0.0047     0.0087
019     0.0045     0.0084
020     0.0044     0.0082
021     0.0043     0.0079
022     0.0041     0.0077
023     0.0040     0.0075
024     0.0039     0.0072
025     0.0038     0.0070
026     0.0036     0.0068
027     0.0035     0.0065
028     0.0034     0.0063
029     0.0033     0.0061
030     0.0032     0.0059
031     0.0031     0.0057
032     0.0030     0.0055
033     0.0029     0.0053
034     0.0028     0.0051
035     0.0027     0.0050
GraphEM: job.run_solver() >>> job.G created and saved to: ./testcases/PPE/recon/G.pkl
GraphEM: job.run_solver() >>> job.recon created
CPU times: user 18.2 s, sys: 3.42 s, total: 21.6 s
Wall time: 3.45 s

[13]:

job.save_recon('./testcases/PPE/recon/recon.nc', verbose=True)

LMRt: job.save_recon() >>> Reconstruction saved to: ./testcases/PPE/recon/recon.nc

Validation

[14]:

with xr.open_dataset('./testcases/PPE/recon/recon.nc') as ds:
    print(ds)

<xarray.Dataset>
Dimensions:  (lat: 10, lon: 10, year: 1000)
Coordinates:
  * year     (year) int64 850 851 852 853 854 855 ... 1845 1846 1847 1848 1849
  * lat      (lat) float32 -4.737 -2.842 -0.9474 0.9474 ... 8.526 10.42 12.32
  * lon      (lon) float32 162.5 165.0 167.5 170.0 ... 177.5 180.0 182.5 185.0
Data variables:
    recon    (year, lat, lon) float64 ...

[15]:

target = job.obs.fields['tas'].value

Mean Statistics

[76]:

nt = np.size(ds['year'])
temp_r = job.recon.reshape((nt, -1))
V = GraphEM.solver.verif_stats(temp_r,target.reshape((nt, -1)), job.calib_idx)
print(V)

Mean MSE = 0.0206, Mean RE = 0.9000, Mean CE = 0.8999, Mean R2 = 0.9022

Map of CE

[16]:

ce = LMRt.utils.coefficient_efficiency(target, ds['recon'])
print(np.shape(ce))

(10, 10)

[73]:

import seaborn as sns
import cartopy.crs as ccrs
import cartopy.feature as cfeature
from cartopy.mpl.ticker import LongitudeFormatter, LatitudeFormatter

fig = plt.figure(figsize=[5, 5])
ax = plt.subplot(projection=ccrs.PlateCarree(central_longitude=180))
ax.set_title('Coefficient of Efficiency')
latlon_range = [50, 300, -45, 45]
transform=ccrs.PlateCarree()

ax.set_extent(latlon_range, crs=transform)
lon_formatter = LongitudeFormatter(zero_direction_label=False)
lat_formatter = LatitudeFormatter()
ax.xaxis.set_major_formatter(lon_formatter)
ax.yaxis.set_major_formatter(lat_formatter)

lon_ticks=[60, 120, 180, 240, 300]
lat_ticks=[-90, -45, 0, 45, 90]
lon_ticks = np.array(lon_ticks)
lat_ticks = np.array(lat_ticks)
lon_min, lon_max, lat_min, lat_max = latlon_range
mask_lon = (lon_ticks >= lon_min) & (lon_ticks <= lon_max)
mask_lat = (lat_ticks >= lat_min) & (lat_ticks <= lat_max)
ax.set_xticks(lon_ticks[mask_lon], crs=ccrs.PlateCarree())
ax.set_yticks(lat_ticks[mask_lat], crs=ccrs.PlateCarree())

levels = np.linspace(0, 1, 21)
cbar_labels = np.linspace(0, 1, 11)
cbar_title = 'CE'
extend = 'both'
cmap = 'Reds'
cbar_pad=0.1
cbar_orientation='horizontal'
cbar_aspect=10
cbar_fraction=0.35
cbar_shrink=0.8
font_scale=1.5
land_color=sns.xkcd_rgb['light grey']
ocean_color=sns.xkcd_rgb['white']

ax.add_feature(cfeature.LAND, facecolor=land_color, edgecolor=land_color)
ax.add_feature(cfeature.OCEAN, facecolor=ocean_color, edgecolor=ocean_color)
ax.coastlines()
im = ax.contourf(ds['lon'].values, ds['lat'].values, ce, levels, transform=transform, cmap=cmap, extend=extend)
cbar = fig.colorbar(
    im, ax=ax, orientation=cbar_orientation, pad=cbar_pad, aspect=cbar_aspect,
    fraction=cbar_fraction, shrink=cbar_shrink)
cbar.set_ticks(cbar_labels)
cbar.ax.set_title(cbar_title, x=-0.05, y=0.1)

LMRt.showfig(fig)

../_images/tutorial_quickstart_ppe_25_0.png

Mean timeseries

[79]:

import pyleoclim as pyleo

[81]:

def geo_mean(field, lat):
    m = np.average(
        np.average(field, axis=-1), axis=-1, weights=np.cos(np.deg2rad(lat))
    )
    return m

[87]:

m_target = geo_mean(job.obs.fields['tas'].value, job.obs.fields['tas'].lat)
ts_target = pyleo.Series(time=job.obs.fields['tas'].time, value=m_target)

m_recon = geo_mean(ds['recon'].values, ds['lat'].values)
ts_recon = pyleo.Series(time=ds['year'].values, value=m_recon)

fig, ax = ts_target.plot(mute=True, label='target')
ts_recon.plot(ax=ax, label='recon')
ax.plot(ds['year'].values, m_target-m_recon, label='diff')
ax.legend()
pyleo.showfig(fig)

../_images/tutorial_quickstart_ppe_29_0.png

[ ]: