Course Description
Printable syllabus
Contents
Overview of the Course
Basic Info:
Professor
Jonathan Gilligan (they/them/theirs)Office: SC 5722 (7th Floor of Stevenson 5)
Email: jonathan.gilligan@vanderbilt.edu
Office Hours: M 10:00–11:00, T 11:00–12:00, or by appointment
Schedule
Class meetings: TR 8:00–9:15, SC
Catalog Description
**EES 4891/5891-01 ** Fundamentals of probability and statistics for the Earth & Envirommental Sciences, with applications in R. Probability distributions, descriptive statistics, statistical testing, regression analysis, elements of time-series analysis and multivariate statistics, principal components analysis, reproducible research methods, principles of statistical computing using R.
Prerequisites
You should be comfortable with differential calculus and linear algebra. I will review basic concepts, but the course will be difficult for you if you are completely unfamiliar with these areas of math.
This course will be mathematical and will make extensive use of the R software system, but I do not assume that you already know R or advanced mathematics beyond calculus and linear algebra.
Goals for the Course
By the end of the semester, you will:
- Understand theories of probability and be familiar with the properties of discrete and continuous probability distributions
- Understand what the Normal probability distribution is, and why it plays a central role in probability and statistics
- Understand how to perform descriptive statistical analyses of data
- How to use statistical tests to test propositions about data, such as identifying differences between data collected from different sources.
- How to estimate the parameters of probability distributions from observed data.
- How to analyze time-series data.
- How to analyze sets of multiple variables, such as measurements of multiple elements or isotopes from each of a number of samples.
- How to organize any data analysis project using Reproducible Research methods. Increasingly, journals and funding agencies require researchers to use reproducible research methods, so that other people can easily review and understand how the analysis was conducted, and so that if questions arise, even years later, it will be possible to go back and understand exactly how the analysis was conducted.
Reading Material
There are two required textbooks:
Textbooks
-
Julien Emile-Geay, Data Analysis for the Earth & Environmental Sciences , 5th Edition (USC, 2023), doi: 10.6084/m9.figshare.1014336
This will be the main textbook for most of the semester. The book is published open-access and you can download the PDF for free from https://doi.org/10.6084/m9.figshare.1014336 , Brightspace, or this website, at https://ees5891.jgilligan.org/files/course_files/DataAnalysisEarthEnvironmentalSci_2023.pdf
-
Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund, R for Data Science , 2nd Edition (O’Reilly, 2023). ISBN 978-1492097402
This book is the best practical introduction I have found for getting started with R and getting things done in data analysis. The author is the chief data scientist as the Posit company and wrote a huge number of widely used free packages to extend and enrich R. This book follows his philosophy of how to organize data sensibly for analyzing and presenting it.
You can buy a paper copy, if you wish, but the full text is available for free online at https://r4ds.hadley.nz
Class Web Site
In addition to Brightspace , I have set up a companion web site for this course at https://ees5891.jgilligan.org , where I post the reading and homework assignments, my slides from class, and other useful material. That web site will be the central place to keep up with material for the course during the semester. This web site will direct you to Brightspace if there is anything you need to find there.
Computer Software
For this class, we will work in R, and I strongly recommend that you install the free version of RStudio Desktop for working with R. All the software we will use this semester is free and can be downloaded and installed on Windows, Mac, and Linux systems. You can find details at the tools page of the course web site at ees5891.jgilligan.org
We will also use the git
revision control software as part of our
Reproducible Research practice. You will use this to manage files
for assignments and the semester research project.
I will spend a class explaining why we use git
, and how
to use it effectively for your homework and other projects.
Assignments
Overview of reading assignments
I will post detailed reading assignments to the course website ees5891.jgilligan.org
that give specific pages to read for each class and notes on important things you should understand. I expect you to complete the reading before you come to class on the day for which the reading is assigned, so you can participate in discussions of the assigned material and ask questions if there are things you don’t understand.
Graded Work
Homework
Homework assignments will be posted on the course web site, and must be submitted by the beginning of class on the day it’s due.
You will submit your homework to Brightspace or GitHub
as indicated on the assignment.
Project
In the second half of the semester, you will do a research project, in which you will choose a data set that’s interesting to you and apply statistical methods to analyze it. You will present the results of your project in class during the last week of the semester and turn in a written report about your project.
You may examine data from a research project you’re working on, or it can be data from a public data source that you are interested in understanding better.
Tests and Examinations
There will not be any tests or examinations in this course. Your grade will be based on class participation, homework, modeling projects, and in-class presentations.
Basis for Grading
Class participation | 5% | |
Homework | 45% | |
Research Project | 50% |
Final Note:
This is the first time I have taught this course, and during the term, I will assess how things are going, and may change the assignments and sequence of readings to help you get the most out of it.