-
Notifications
You must be signed in to change notification settings - Fork 17
simplify Dockerfile #113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
simplify Dockerfile #113
Conversation
## Overview This PR uses updated versions of Python and `prophet` to greatly simplify the python environment setup in the Dockerfile. The code has been tested by creating a local Docker container, and sample outputs were written to the following tables in `moz-fx-data-bq-data-science.bochocki`: - `tmp_desktop_kpi_forecast` - `tmp_desktop_kpi_forecast_confidences` - `tmp_mobile_kpi_forecast` - `tmp_mobile_kpi_forecast_confidences` ## Additional Changes - `.gitignore`: ignore additional filetypes - `kpi_forecasting.py`: set confidence intervals `target` from `config` instead of relying on hardcoded `"desktop"`. This `target` is overwritten in `write_confidence_intervals_to_bigquery` [here](https://github.com/mozilla/docker-etl/blob/4cfbec915375343023944d1ca23f527251a5ada8/jobs/kpi-forecasting/kpi-forecasting/Utils/DBWriter.py#L116), but I think this change makes the it clear that we're not unintentionally using "desktop" labels on "mobile" forecasts. - `PosteriorSampling.py`: minor refactoring required to resolve errors and deprecation warnings that are now being raised by pandas as a result of package upgrades. - `README.md`: update examples - `requirements.txt`: updated packages to get easier-install versions of `prophet` and `statsforecast`.
| ] | ||
| .groupby("{}".format(aggregation_unit_of_time)) | ||
| .sum() | ||
| .sum(numeric_only=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| uncertainty_samples_aggregated.iloc[0, 1:] += observed_aggregated["value"].iloc[ | ||
| -1 | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same intended logic as before, but the previous code doesn't work in new versions of pandas because observed_aggregated.iloc[-1].value doesn't return a single value, it returns an array of values. Using the . column access method was also confusing, because at first glance it looks like a typo of .values which casts a pandas column to a numpy array.
| columns={"y": "value"} | ||
| ).sort_values(by="{}".format(aggregation_unit_of_time)) | ||
|
|
||
| observed_aggregated = observed_aggregated.astype({"value": np.float64}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
observed_aggregated["value"] is being stored as an Int64Dtype, which is a pandas type for storing large integers. For some reason, using this type breaks the following merge on line 100:
all_aggregated = pd.merge(
observed_aggregated,
uncertainty_samples_aggregated,
on=["{}".format(aggregation_unit_of_time), "value", "type"],
how="outer",
)I think using float64 instead is an okay workaround here, since the values in the confidence intervals are reported as float64 anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very happy to see this PR. LGTM and matches the expectations I had about this work based on prior conversations we've had 👍
Overview
This PR uses updated versions of Python and
prophetto greatly simplify the python environment setup in the Dockerfile. The code has been tested by creating a local Docker container, and sample outputs were written to the following tables inmoz-fx-data-bq-data-science.bochocki:tmp_desktop_kpi_forecasttmp_desktop_kpi_forecast_confidencestmp_mobile_kpi_forecasttmp_mobile_kpi_forecast_confidencesAdditional Changes
.gitignore: ignore additional filetypeskpi_forecasting.py: set confidence intervalstargetfromconfiginstead of relying on hardcoded"desktop". Thistargetis overwritten inwrite_confidence_intervals_to_bigqueryhere, but I think this change makes the it clear that we're not unintentionally using "desktop" labels on "mobile" forecasts.PosteriorSampling.py: minor refactoring required to resolve errors and deprecation warnings that are now being raised by pandas as a result of package upgrades.README.md: update examplesrequirements.txt: updated packages to get easier-install versions ofprophetandstatsforecast.Checklist for reviewer:
referenced, the pull request should include the bug number in the title)
.circleci/config.yml) will cause environment variables (particularlycredentials) to be exposed in test logs
telemetry-airflow
responsibly.