This repository contains the course material for the course "Introduction to Data Analysis and Visualization with Python" given at Bern University by the Science IT Support unit in the frame of the Transferable Skills program of University of Bern's Vice-Rectorate for Development. This content has been developed by Guillaume Witz from the Microscopy Imaging Center and Science IT Support, University of Bern.
After a first session covering the basics of Python and programming, this lecture presents how to use Python for scientific computing and data analysis in three parts:
- The core packages NumPy and Pandas. These packages offer additional data structures necessary to do efficient numerical computations (NumPy arrays) and process mixed-type tabular data (Pandas dataframes).
- Visualization. Here we show the students how arrays and dataframes can be represented as plots (scatter, histogram etc.) and how these plots can be formatted. Both the fundamental plotting library Matplotlib as well as the higher-lever library seaborn are presented.
- Data analysis. Here we show how numerical data can in general be analysed using tools like SciPy and statsmodels. We also provide a brief introduction into more specialized packages such as scikit-learn for Machine Learning or scikit-image for Computer Vision to provide some insight into more domain-specific problems.
All the course material is offered in the form of interactive notebooks that can be executed via Jupyter or its Google equivalent Colab. As Colab doesn't require any installation, participants are expected to run their notebooks from there, and no support is provided during the course for local Jupyter installations. Participants who try a local installation and encounter problems are welcome to get in touch with the course organizers.
You can open the notebooks in Colab directly by clicking on the badge at the top left of this page. Beware that you should save a copy to your Google Drive if you want to preserve your changes. You can also run all notebooks in the Jupyter environment by clicking on the Binder badge. Beware that the Binder sessions are only temporary, so download any modification you wish to keep.
To run Python and Jupyter we strongly recommend to install the necessary software via conda. Conda is an environment manager that allows you to create for each of your projects a specific environment on your computer in which you can then install combinations of Python packages without interference between projects. If you don't have conda installed, follow these instructions to install a minimal version called miniconda. You can also install Anaconda which on top of conda also installs a graphical interface and a long list of useful software (including non-Python software like RStudio). It takes however quite some space on disk.
You can either download or clone this GitHub repository to your computer. For download you can use the green "Code" button at the top right of this page and then unzip the downloaded folder. If you know git you can also type this is your terminal:
git clone https://github.com/guiwitz/DAVPy.git
Now you need to create a conda environment where then you can install the necessary packages for this course. You can do this by using the provided environment.yml file. If you look into it you will see that it lists a series of packages, including e.g. Numpy and Pandas, and creates an environment called DAVPy
(top of the file). To create this environment, open a terminal, move to binder folder of the downloaded repository and type:
conda env create -f environment.yml
To use the environment, you then have to activate it:
conda activate DAVPy
Finally, you can start Jupyter by typing:
jupyter notebook