Testing a novel approach to quantify soil Carbon using direct measurements with the Our Sci Reflectometer and soil metadata
Previously, we highlighted the potential of soil carbon sequestration to remove significant CO2 from the atmosphere and discussed the challenges in developing pathways for soil carbon sequestration credits. One of the key challenges is the difficulty in directly measuring soil Carbon. To address this issue, we have been developing a low-cost Reflectometer to measure soil Carbon.
Well, not to directly measure Carbon, but to correlate UV/VIS/NIR spectra to soil organic Carbon. And not alone either, we are collaborating with a number of partners. QuickCarbon (out of Yale University) are developing field methods for landscape scale soil Carbon assessment and have been a key partner in field testing the Reflectometer and co-developing software. We have also been working with the Real Food Campaign and Michigan State University to conduct proof-of-concept testing on the Reflectometer (also known as the Bionutrient Meter, which is currently being manufactured).
In our own proof-of-concept testing we have some successes and some lessons learned that we would like to share. Below are some results from two case studies conducted with our partners using the Our Sci Reflectometer, open data sources and open source software tools.
A Quick Note on Methods
All the soil samples in both case studies were scanned with the Our Sci reflectometer, which measures reflection at 10 distinct wavelengths from UV (365 nm) to NIR (940 nm). The reflectometer connects to an Our Sci app (Our SciKit, QuickCarbon, RFC Collect) via Bluetooth and uses ODK based surveys to collect metadata including: GPS location, answers to survey style questions, time/date of measurement, Device ID and Device battery level.
Carbon predictions were generated using a Random Forest ensemble machine learning model. 80% of the data was used in the training set, with 20% held back for validating the model. Each version of the model was run 5 times and the average coefficient of determination (commonly referred to as r2) and the standard deviation of the r2 are reported here.
Predicting soil organic Carbon for soil samples from the Real Food Campaign
This past summer we launched the Real Food Campaign lab. As part of this effort, we received 271 soil samples from 2 depths (0-6 in. and 6-12 in.) from 14 counties and 6 states in the eastern and midwest USA. The raw data used in this case study can be viewed and downloaded here. These samples were analyzed for total Carbon (using “Loss on Ignition”), soil respiration and total soil minerals (using X-Ray Fluorescence).
These samples did not come with a lot of metadata, just State, County, Depth and a few simple multiple-choice questions meant to provide a coarse characterization of the sample site (see screenshots below).
Using just spectral data from the reflectometer, the r2 for the model was 0.57 (+0.06). However, by adding coarse site characterization data the model fit increased 16% to 0.66 (+0.05). This was exciting for us because it showed that simple questions (you don’t need to be a soil scientist or expert to answer them) can noticeably increase the accuracy of the models with very little effort. Adding state and county to the model further increased its fit to 0.68 (+0.04).
Predicting soil organic Carbon on soybean farms across MI
For the second case study, we partnered with Dr. Sieglinde Snapp and James DeDecker at Michigan State University to predict soil organic Carbon on soybean fields in three regions in Michigan. In this dataset we had 111 samples, all from 0-6 inches. Using just the reflectance spectra the model fit was poor (0.40 +0.20). Unlike the previous case study, we had GPS coordinates which allowed us to add more meaningful metadata about the soil samples. Using the GPS coordinates, we pulled SoilGrids data (SoilGrids is an open data project providing gridded soil information globally) from their API to include in our models. Using SoilGrids, we added estimates of soil texture (percent sand, silt and clay), pH, CEC and Bulk Density to our soil samples, which increased the soil organic Carbon predictions by 50% (r2 = 0.60 +0.19).
Combining the datasets
The two case studies presented are based on fairly small datasets (many calibration datasets for spectroscopy include 1000’s of soil samples). So, we wanted to see if we could increase the accuracy of prediction by simply combining these 2 small datasets, which were developed by different research groups for different purposes. This increased our sample size to 381 samples.
First, just running the spectra did not improve the model fit over just the spectra for the RFC soils (r2 = 0.58 +0.07). Unfortunately, because the data collection protocols for the two projects were different, most of the metadata did not translate from one project to the next. The RFC project had answers to survey style questions but no GPS coordinates to pull SoilGrids data, while the Jumpstart project had GPS coordinates but no survey style questions characterizing the site. So, the only standard metadata across projects were state and county. Adding just the state data improved the model by 19% (r2 = 0.69 + 0.04). Adding further localized data (county) increased the r2 to 0.73 (+ 0.10) but also increased the variability.
These results make sense if we view them in the context of how soils are formed. There are 5 soil forming factors: time, climate, parent material (type of rock from which the soil formed), topography and organisms. Three of those 5 factors (time, climate, and parent material) are generally similar over fairly large distances, so knowing the state and county where the sample was collected will provide some significant insight into the soil’s condition. The other two factors can change dramatically over short distances, being able to capture those differences would help generate better predictions.
What does this mean?
So, we managed to get some decent soil Carbon predictions from a couple of datasets using the Our Sci Reflectometer. That’s great, it shows that the technology that we are developing has value.
How do we maximize that value?
One of the key constraints when combining these datasets was the lack of consistent metadata. To a certain extent, that will always be the case. Different projects will always have different data collection needs. However, can we develop a standard set of metadata that everyone collects? Not a lot, but just enough to ensure interoperability between them. For example, if we developed a standard data collection protocol that everyone used, then we could ask those simple questions (slope, soil type, etc). It only takes a few seconds to answer the questions and the pay off in terms of better predictions would far outweigh the added time. The same goes for collecting GPS coordinates, which opens the door for adding SoilGrids or other open data for a minimal time input.
There is a lot more development that needs to be done in this space, and we are looking forward to an exciting 2019. We’ll try to be better at keeping the updates coming, so stay tuned or sign up for email updates in the footer of this page.