EDS 232


Lesson 1

What is Machine Learning?

In this lesson


  • What machine learning is and how it differs from traditional programming


  • The relationship between AI, machine learning, and deep learning


  • Examples and limitations of machine learning applied to environmental problems

What is machine learning?

What is machine learning?



Machine learning (ML) is a branch of computer science focused on building systems that can learn from data to make predictions or decisions, without being explicitly programmed with rules.


Instead of a human writing down rules,

an algorithm discovers the rules from examples.

Two approaches to the same problem


Predicting whether a kelp forest has been degraded based on water temperature and sea urchin density.


Traditional programming

If temperature > 18°C and urchin density > 50/m², then: degraded


Machine learning

Give the algorithm many examples and let it figure out the rules itself.

How does a machine learn?



At its core, a machine learning algorithm does the following:


  1. Take a set of data examples that is used to train the model
  2. Find a function \(\hat{f}\) that maps inputs to outputs well on those examples
  3. Use \(\hat{f}\) to make predictions on new, unseen data


“Learning” means adjusting \(\hat{f}\) to minimize error on the training data.

Check-in

Think of a task in environmental science that would be very hard to solve with hand-written rules, but where you might have a lot of labeled data.

What are the inputs and outputs?

AI, Machine Learning, and Deep Learning

Three related terms



Artificial intelligence (AI)

The broadest term — any technique enabling machines to mimic human intelligence, including rule-based systems and search algorithms.


Machine learning

A subset of AI — systems that learn from data rather than hand-coded rules.


Deep learning

A subset of ML — methods based on neural networks with many layers, behind breakthroughs like image recognition and large language models.

This course



We will focus on machine learning, primarily the classical methods:


Regression · Support vector machines · Decision trees · Clustering · Dimension reduction


Probably some deep learning in the last two weeks.

ML in environmental science


Species distribution modeling

Algorithms like MaxEnt and random forests predict where species are likely to be found using occurrence records and environmental covariates.

Adapted from Helgen et al., 2013 for illustration purposes.

Mapping permafrost

Deep learning methods trained on high-resolution commercial satellite imagery have been used to map and track permafrost covered in the Arctic at sub-meter scale.

ML in environmental science


Bird detection from acoustics

Deep learning models are trained on North American and European bird species calls, enabling large-scale passive acoustic monitoring that would be impossible to do by hand.

We need to be aware of ML limitations

Limitations


  • Data hungry — many methods need large amounts of data to “learn”


  • Extrapolation risk — a model trained in one region or time period may fail under novel conditions


  • Interpretability — flexible models may predict well but be hard to interpret


  • Bias in training data — unrepresentative data produces biased models


These are not reasons to avoid ML — they are reasons to use it carefully!

Check-in

Construct a scenario in which developing and using a machine learning model with one or more of these limitations would be particularly consequential.