Skip to main content

SIF Observational Record

1.1.1 Objective 1: SIF Observational Record

Our main priority in delivering a harmonized SIF observational record is to ensure consistency between satellite sensors and retrieval algorithms. The proposal devotes significant effort to evaluating and reducing retrieval biases. For a harmonized SIF record, it is also necessary to set standards across sensors with respect to correction factors for overpass time, orbital drifts, detector degradation and wavelength dependence to facilitate satellite intercomparison and evaluation of trends and anomalies between satellite records. Retrieval algorthim consistency and standards are addressed in Sec 1.2.2.1 (Obj 1A). We also seek to gap fill and integrate calibrated satellite datasets into a single, global spatially continuous SIF record. This requires data fusion of SIF with ancillary vegetation and environmental datasets (Sec 1.2.2.2, Obj 1B). Finally, we will produce a SIF product targeted at the specific location and footprint of tower and airborne locations to facilitate SIF cal/val and input to external fusion products such as FLUXCOM (www.fluxcom.org) under guidance by Collaborator Jung (Sec 1.2.2.3, Obj 1C). These analyses will produce 3 unique SIF ESDRs: Orbital SIF (Obj 1A), Global SIF (Obj 1B), and Network SIF (Obj 1C).

1.1.1.1 Orbital SIF: Objective 1A

We will evaluate retrieval algorithm biases by comparing JOINER & KÖHLER retrieval methods designed for coarse spectral resolution (GOME-1/2, SCIAMACHY) and validating against the FRANKENBERG method designed for fine spectral resolution (GOSAT, OCO-2), which are less prone to biases. We will provide detailed examination of the relative GOME-2 biases from two different algorithms including the most likely underlying causes (e.g., differences in spectral fitting windows and other algorithm differences). We will examine the implemention of statistical and empirical approaches to harmonize these data sets. This analysis will include use of known reference targets with zero fluorescence (deserts, ice, certain ocean areas) over a wide range of total intensity, to quantify potential detector non-linearities or stray-light, and solar zenith angle, to exclude potential impacts of angle-dependent effects (Rotational Raman Scattering).

No single current retrieval algorithm works without corrections in post-processing, mostly using barren reference targets or time-dependent solar spectra. The difference in these methods (e.g., choice of targets, temporal domain) will thus be a crucial part in the inter-comparison. Again, OCO-2 will be key as it allows us to perform calibrations on a daily basis, owing to the vast number of daily observations (1M) and the fact that most data (even cloudy) can be used for bias correction. However, due to the short OCO-2 record, we will use a staggered calibration strategy to anchor calibration of the full satellite time series. Leveraging the short overlap with GOME-2 from 2014-present, we will use OCO-2 and FRANKENBERG retrievals to evaluate and calibrate JOINER and KÖHLER retrievals for GOME-2, use calibrated GOME-2 retrievals to calibrate SCIAMACHY during the overlap period of 2007-12, and GOME during the period 1996-2011.

Retrieval Methods for High Spectral Resolution (< 0.2 nm): FRANKENBERG

The first SIF retrievals from space were performed independently by Joiner (2011) and Frankenberg (2011b) using data from GOSAT, which has high spectral resolution (<0.05nm). This retrieval technique uses a very small (few nm) microwindow centered around 757 or 771nm, covering Fraunhofer lines with negligible overlapping atmospheric absorption. The use of microwindows for disentangling SIF signals from the background is very robust and insensitive to atmospheric scattering (Frankenberg 2012). For OCO-2, we have validated uncertainty estimates, which compare very well with observed scatter within the dataset, and provided the first reliable and successful SIF validation against CFIS (Sun 2017). Thus, we adopt OCO-2 SIF as our reference standard. However, OCO-2 SIF is bias-corrected in postprocessing to account for time-dependent buildup of an ice layer on the detector (inducing straylight) and signal-level dependent bias (constant in time). To evaluate impacts, we will try different correction methods by varying reference targets in semi-arid, desert, and low cloud regions.

Retrieval Methods for Coarse Spectral Resolution (~ 0.5 nm): JOINER and KÖHLER

In order to improve on GOSAT coarse spatial coverage, methods for retrieval of SIF from lower spectral resolution spectrometers like SCIAMACHY and GOME-1/2 were developed. For these sensors, wider spectral windows are needed, which makes the retrievals more susceptible to compounding effects from the atmosphere and surface. Cross-calibration with OCO-2 SIF observation is required to improve the consistency of JOINER and KÖHLER data records:

JOINER: Joiner (2013) proposed an algorithm to enable SIF retrievals with moderate spectral resolution by using filling-in of Fraunhofer lines across wider spectral fitting windows than GOSAT & OCO-2. The method has since been refined and expanded to red and far-red chlorophyll emission features (Joiner 2016) and other implementations have been constructed with different technical details (KÖHLER). The retrieval relies on a data-driven statistics-based approach to separate SIF emissions from spectral features related to atmospheric absorption, scattering, and surface reflectance. Our current version (v26) differs from KÖHLER in using (1) a narrower fitting window limited by the amount of water vapor absorption; (2) a 4th order polynomial to describe surface reflectance, and (3) a fixed number of principal components (PCs) to describe atmospheric absorption and other instrumental artifacts. It adjusts for known biases (e.g., stray light and dark current) identified by Köhler (2015b) using data over oceans where SIF is negligible.

Data records of more than ten years from the GOME-2A (GOME-2 on MetOp-A) and four years from GOME-2B have been created and disseminated to the public via the Aura Data Validation Center (avdc.gsfc.nasa.gov). These data sets have been used in a number of publications (e.g., Guanter 2014; Yang 2015; Guan 2015, 2016; Yoshida 2015; Sun 2015; Jeong 2016; Ma 2016; Zhang 2016; Chang 2016; Alden 2016; Wang 2016; Wagle 2016; Berkelhammer 2017; Ichii 2017; Luus 2017). These include studies on the effects of drought, water and carbon use efficiency, and phenology; GOME-2 SIF has also been used to evaluate models.

While the algorithm accounts for some types of instrumental artifacts, no matter which version, it is sensitive to the absolute calibration of the sensor. To alleviate this, one may used a fixed solar spectrum. However, this will not compensate for all of the issues, because (1) the solar spectrum varies with time over both short- and long-term solar cycles, and (2) instrument spectral response functions may change over time. An alternative is to use a measured solar spectrum (as is done in GOME-2 v26); however, this will be sensitive to instrument degradation due to solar degradation that is a known issue for GOME-2 instruments. We have attempted to account for this degradation in v26. However, the GOME-2A and GOME-2B SIF magnitudes do not perfectly align during their overlap period. We will attempt to reconcile this mismatch for the first time.

KÖHLER: Köhler (2015b) developed a variant of JOINER where a 3rd order polynomial in wavelength is combined with atmospheric PCs and a reference SIF emission spectrum to model low and high frequency components of the TOA radiance spectrum (720–758nm) in a linear way. Consequently, the linear forward model permits a backward elimination algorithm, selecting the required model parameters automatically with respect to goodness of fit balanced by model complexity. This approach provides a solution for an arbitrary selection of a fixed number of model parameters with respect to similar retrieval algorithms proposed by Guanter (2013) and Joiner (2013, 2014). The backward elimination algorithm ensures stable results, regardless of how many atmospheric PCs are initially provided to the retrieval. Results suggest (i) using far fewer PCs compared to Joiner (2013, 2014), and (ii) noise is reduced by selecting a subset of initial model parameters (overfitting is avoided), which positively affects retrieval accuracy and precision. The retrieval output includes a value for the 2nd of 2 characteristic peaks of the ChlF emission spectrum at 740nm. Results show very good agreement between SCIAMACHY and GOME-2 and only moderate cloud contamination affects. SIF values successively decrease by relaxing the cloud filter threshold, but seasonality is maintained in agreement with Frankenberg (2012), Guanter (2015), Köhler (2015a). Comparison with GOME-2 SIF provided by Joiner (2014) reveals a substantial difference in absolute values. These differences will undergo detailed examination in this proposal.

Correction Factors for Wavelength Scaling and Daily Integration

Using a modified leaf-level gas-exchange fluorescence instrument, Magney (2017) quantified the variation in ChlF spectra across a wide range of species and conditions. Singular value decomposition analysis (SVD) indicate that >85% of the variation in ChlF spectra across 30 species and over 200 induvial can be explained by the mean spectral shape in principal component 1. This is significant as it suggests that the shape of the ChlF emission curve is largely consistent across species, allowing us to apply the mean spectral shape from PC1 as a wavelength correction for tower, airborne and satellite measurements where full spectrum retrievals are not possible.

Following Frankenberg et al. (2011b), we convert measured instantaneous SIF to daily integrals taking into account variations in local overpass time, length of day, and solar zenith angle (SZA). Under cloud-free conditions and ignoring Rayleigh scattering and gas absorption, the downwelling solar radiation scales linearly with cos(SZA). The daily correction is applied to individual soundings and sensors, computing the integral numerically in 10 minute time-steps (using pyEphe http://rhodesmill.org/pyephem/ to compute SZA as a function of latitude, longitude, and time). Correcting for wavelength and daily integration to facilitate inter-sensor comparisons yields significantly improved agreement in the magnitude of variability across sensors and retrieval algorithms compared to measurements at overpass time and retrieved wavelength. Unresolved differences in trend, inter-annual variability, and amplitude will be addressed by retrieval algorithm evaluations.

Aggregation and Ancillary Data

We will report collocated MODIS vegetation (EVI, fPAR) and MERRA-2 climate (Ta, VPD, diffuse/direct PAR) ancillary data with calibrated SIF records. The ancillary data will be key to the scientific community, as well as Obj 1 B and C in this proposal, for validating SIF–GPP linear relationships. We will also deliver maps of SIF and ancillary data for each satellite sensor aggregrated monthly and to spatial resolutions of 1°´1° for GOME, 1.5°´1.5° for SCIAMACHY, 0.5°´0.5° for GOME-2, 2°´2° for GOSAT, and 1°´1° for OCO-2, corresponding to typical resolutions for global satellite composites (Guanter 2015).

1.1.1.2 Global SIF: Objective 1B

We seek an end-to-end SIF product with global, spatially continuous, high resolution, biweekly coverage to be provided to the scientific community, tailored specifically for benchmarking models and as input to data fusion products. GOME-2 provides near daily global coverage, but at coarse resolution (40´80 km prior to 2012, 40´40 km after 2012) and with significant spatial gaps that are alleviated by aggregating into 2 week windows.

We will adopt a multivariate geostatistical data fusion approach (Wackernagel 2013), which has been adapted by Co-I Vineet Yadav at JPL for use with SIF, to merge Orbital SIF records with MODIS vegetation and MERRA environmental datasets to produce a long term, gap-filled, high resolution (5 km), and temporally resolved (16-day), global SIF ESDR. This approach is based on modeling all covariances (e.g., Genton and Kleiber 2015) between all possible combinations of two or more variables at any sets of locations in space and time. Thus, we will model covariance between SIF, fPAR, leaf area index (LAI) and vapor pressure deficit (VPD) to estimate SIF at unknown locations in space and time. These variables are chosen due to their close correlation in explaining spatio-temporal variations in SIF. Separate covariance models will be built for each biome and time-period. A conditional approach (Mangion and Cressie 2016) will be used to construct these covariance models whose validity conditions are easy to check.

1.1.1.3 Network SIF: Objective 1C

We will deliver a network targeted SIF SIF product (led by CO-I Sun) at tower sites within FLUXNET (fluxnet.ornl.gov) and KISS towers at 1 km and 16 day periods, and along 1 km aggregated airborne tracks from CFIS & HyPlant campaigns. We will train Machine Learning (ML) algorithms along OCO-2 Orbital SIF tracks with MODIS and MERRA-2 datasets. Trained MLs will be applied to network locations to predict SIF using MODIS vegetation and MERRA-2 climate data (PAR, Ta, VPD) at those sites as predictors. The statistical relationships and associated parameter sets (obtained through training) will serve as predictive tools.

To build robust statistical relationships, we will first perform data screening to extract good quality pixels. Valid MODIS pixels will be determined by quality flags that classify the levels of cloud and aerosol contamination. OCO-2 SIF pixels will be filtered according to the criteria developed by Sun et al. (2017a). We will aggregate OCO-2 SIF to 5km for input to data fusion. The spatial aggregation will reduce retrieval uncertainty by a factor of 1/sqrt(N) (N the number of pixels for aggregation). Thus, both resolutions will reduce the random error of original retrieval (up to 20% of peak SIF value) via averaging but to different extent, i.e., by a factor of ~2.5 for 5km, corresponding to theoretical errors ~8% of peak SIF value. We will perform aggregation separately for each biome as the ML-based statistical models will be biome-specific.