You are here
Applying machine learning methods to the environmental sciences—opportunities and pitfalls
Room 002, University House 1, UVic
2489 Sinclair Rd.
Victoria , BC
University map
Google Map
The different origins and separate cultures led machine learning (ML) and statistics to occupy the yin and yang (i.e. dark and bright) hemispheres of data science. Environmental scientists are far more comfortable with statistics than ML, as tools in ML are often considered "black boxes". This unfortunately led to the under-utilization of ML in the environmental sciences, which otherwise could have made a contribution. A brief review of the advances in ML in the last decade (randomized neural networks, deep neural networks and generative adversarial networks) is given.
Data problems in the environmental sciences are not entirely similar to those in mainstream ML -- e.g. mainly continuous variables and regression problems in environmental sciences versus mainly discrete and categorical variables and classification problems in mainstream ML. When applying machine learning/statistical methods to the environmental sciences, nonlinear regression (NLR) models often perform only slightly better and occasionally worse than linear regression (LR). The proposed reason for this conundrum is that NLR models can give predictions much worse than LR when given input data which lie outside the domain used in model training, as nonlinear extrapolation is more unreliable than linear extrapolation.