David Shean
Civil and Environmental Engineering
University of Washington
https://dshean.github.io
This course explores geospatial data processing, analysis, interpretation, and visualization techniques using Python and open-source tools/libraries. We will explore fundamental concepts and real-world data science applications involving a variety of geospatial datasets.
- Aspects of both data engineering and data science, with exploratory data analysis approach
- Learn how to programatically answer real-world remote sensing and GIS questions (and how to ask new questions)
- Query and process geospatial data on-the-fly, without manual downloads
- Limited emphasis on machine learning, but some examples scattered throughout labs (e.g., K-means clustering)
- Examples focus on Washington state and Western U.S.
Estimating snow-covered area for Mt. Rainier from Landsat-8 multi-spectral satellite imagery (module 5)
The course is organized into 10 week-long modules. Each module contains background reading assignments and Jupyter notebooks with introduction, demo, and lab exercises. The material builds on content and datasets from previous weeks.
Clicking this badge will launch the GDA image and Jupyterlab environment on mybinder.org. This will provide the same environment that was available on the course Jupyterhub during winter 2021. You can use the file browser on the left side to navigate and launch interactive notebooks in the gda_2021/modules
directory.
Note: this session is ephemeral and the hardware resources are limited (only 2 GB of RAM). Your home directory will not persist, so use this only for exploration and demos. Within the Jupterylab environment, you can always right-click on a file and download locally if you want to preserve your changes, or use git/github!
- Download all course materials:
git clone https://github.com/UW-GDA/gda_course_2021.git
- See the Week 10 materials for instructions on how to set up your local environment to run the notebooks. Or, if you're already familiar with conda, here are the environment files:
- uwgda2020 (pinned version numbers)
- Notebooks should have instructions/code to download all necessary data
https://docs.google.com/document/d/1uaEMqANMU9NlvH2ELkGtALQ3MlGY1U9-uCqNKz5JOqk/edit?usp=sharing
-
Students independently complete online reading assignments or work through tutorials prior to lab
-
One remote, synchronous 1-hour lecture on Wednesday afternoon
- Will be recorded for students who cannot attend synchronously
-
One remote, synchronous 3-hour lab session on Friday afternoon
- Initial 15 minutes for students to meet and discuss lab exercises, ask questions without instructor
- Next 15-30 minutes logistics and some discussion around lingering questions/issues
- Next 30-60 minutes for introduction and demo for new material
- Intro, review and demo will be recorded for students who cannot attend synchronously
- Remainder of lab for students to work in small groups to go through lab notebook, write code, troubleshoot, talk, and try to answer discussion questions together
- Students finish exercises (and "extra credit" challenge problems) for homework (due the following week)
-
Students report ~6-12 hours outside of the 3-hour lab required to complete reading and homework
-
See weekly workflow document in instructor and student resources for technical details
- Students propose, refine, perform and present independent or group projects
- Final deliverables: Github repository and ~10 minute presentation
- Most current resources are intended for students enrolled in the class at the University of Washington
- I am planning to prepare additional resources for students attempting independent self-study, or those who are attempting individual modules rather than the full 10-week course (see syllabus for additional thoughts on philosophy and time commitment). The reality is that the exercises each week build on skills developed in previous weeks.
- I've started compiling resources, notes and recommendations for others who are or will be teaching similar material (or using similar approaches).
- If you find this content useful, please consider contributing upstream corrections, modifications or suggestions.
- The notebooks in this public repo are the "student" versions, with many empty cells and instructions for lab exercises. The completed notebooks with my solutions are archived in a private solutions repo. Enrolled students receive access to this repo after submitting their own solutions to the lab exercises each week. I have not released the solutions publicly, as I expect future students enrolled in the course to learn "the hard way" as they work through the problems on their own. If you have independently tried to work through these notebooks and would like to compare your answers, I can potentially add you as a collaborator.
- I wish that I had a better approach for distribution, as I know that these solutions to be a useful resource for those who can't dedicate weeks to learn the material. My priority right now is to preserve the learning experience for enrolled students, and to be able to reuse similar material in the coming years (developing these notebooks requires a considerable amount of time). I am open to suggestions on strategies that will enable students to "unlock" the solutions as they incrementally make progress.
If you find errors or have suggestions for improvements, please consider creating a Github Issue or submitting a Pull Request. I view the development of this material as an open, collaborative effort. I expect to teach this course in the coming years, and will continue refining/updating. I sincerely appreciate any help that I can get on this and I will acknowledge your contributions (see below)!
The primary objective of this course is to teach geospatial analysis concepts and to provide interesting problems to engage students as they learn how to use modern, open-source tools. Several examples make simplifying assumptions and/or use older datasets for analysis. There are more rigorous ways to approach all of these problems, and I encourage you to consult the peer-reviewed literature for more information or any official purposes. Also, the tools and methods outlined here will work for many problems, but may not always be suitable for very large datasets that require more efficient, distributed computing. I hope to integrate more of this in the future, but for now the focus remains on relatively small problems, as it's easy to get lost in the details of scaling.
Many individuals have contributed to the content and infrastructure development required for this course:
- First and foremost, the brave GDA students who enrolled in this course duing winter 2019 and winter 2020 provided critical feedback, suggestions and often elegant solutions to challenging problems
- Chris Land (UW-IT) and Scott Henderson (UW eScience/ESS) provided Jupyterhub configuration and support during 2020
- Amanda Tan (UW eScience) provided Jupyterhub configuration and support during 2019
- Bill Schaefer (UW-IT) and Rob Fatland (UW-IT/eScience) provided spport and management during 2020 and 2019, respectively
- Friedrich Knuth, Shashank Bhushan, and Michelle Hu provided assistance during lab periods in 2020. Friedrich Knuth provided initial material on conda.
- Anthony Arendt and the UW eScience Geohackweek leadership team for providing a foundation and resources for interactive education and software development
The content of this repository is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, and the embedded source code is licensed under the MIT license.
If you use content or code in a publication, please cite as:
Shean, D. (2020), Geospatial Data Analysis with Python: Course material from the Winter 2020 offering at the University of Washington (CEE498/CEWA599), Zenodo, http://doi.org/10.5281/zenodo.3978778
If you learn from this material, or you use some of this material in a different course, please show your support by clicking the "Star" button in upper right corner of the repo page. Thanks!