Predicting average temperatures for the state of Bahia using regression
Continentality is a measure of difference between continental and marine climates which is characterized by increased range between day and night-time temperatures that occurs over land compared to water. That happens because of the specific heat of the water is many times (approx. 5) smaller compared to land which thus makes the temperature rates of the regions around them slower to change.
In this regard we were asked to analyze the effects of continentality at the extremities of the state through curve fitting models.
In order to do so we were provided with data from 25 cities (out of 417) - an approximate 6% coverage - from which the state of Bahia is consisted. As shown below, the dataset presents information on latitude, longitude, altitude and average temperatures from a few meteorological stations for each month between the years of 1999 to 2019. Therefore we tried to build a regression model that allowed to predict average temperatures for every city in the state (for which we lacked data) given its altitude, lat, long and given a period of time (month and year).
It is quite straightforward to see that to learn the task accurately means, in someway, to verify the hypothesis that average temperatures can be inferred from lat, long and altitude, but another aspect that we wanted to verify (empirically) was whether prediction precision correlated with the distance to the nearest climate station for the cities in which the data was absent.
The figure below demonstrates the dataset coverage in terms of cities in which data was available. We can see a plot of the average temperatures for the year of 2019.
The first step was to train the models to fit the data so we choosed to evaluate the Linear, Polynomial and Multivariate regression approaches. Considering that for the Linear and Polynomial regression approaches we could only relate the dependent variable (temparature) to one of the independent variables each time, therefore the heuristics consisted of fitting one curve for each of the independent variable (e.g., Temperature x Altitude, Temperature x Latitude, Temperature vs Longitude and so on) and then combining them by averaging, to obain the final prediction. Through this, for each regression model we had 5 temperature predictions of which we used to compute the final prediction by taking the mean.
This is a terrible heuristic since it disregards modeling the variance of the data. That was not the case for the multivariate regression model, as it goes through an optimization process that relates these variables by finding coefficient values for each of them (fitting a hyperplane to the data). Yet, we still evaluated both approaches by partitioning the data through stratified random sampling, i.e., dividing it into train, val and test.
As mentioned before, continentality effects can be observed more significantly in the temperature variations along the day, therefore, with the data given at hand the task proposed to us was a bit trickier, as the data provided by us only presented daily average temperatures (without any variance). Besides that, continentality can also
Because of that, our idea was to train a regression model to extrapolate to the unavailable cities and then to analyze the temperature variations as we move away from the sea and towards the continent i.e., longitudinally.
The following figure was generated by fitting the multivariate regression model to the data, which presented an R2 score of 0.75, in contrast to averaging the linear regression models, which presented an R2 score of 0.06. Besides that, the multivariate model presented and mean absolute error of 0.93 celsius, in contrast to 4.21 celsius for the best polynomial model.
We gathered data from Open Street Maps to find a list of cities that belongs to the state of Bahia and then crossed that with data from governamental websites to find lat/long points thta represented each city. That way we filled our data, the next step then was just fitting the trained regression model to extrapolate the data. We then obtained the average temperature for each month for each given city for the year of 2019 and then y averaging the annual temparature we obtained the folllowing figure:
We observed from the relief map below that the lower temperatures from the previous figure correlated with the higher altitude regions in the relief map below. With respect to continentality, we can see that indeed as we move towards the center of the continent (longitudinally) the temperatures increases, as well as we move up towards the equator.
- Test other encoding functions for temporal data (e.g., sine)
- Interactive web page deploy.
- Other models (mlp).