Skip to content

Commit 545c6f9

Browse files
authored
simplify Dockerfile (#113)
* simplify Dockerfile ## Overview This PR uses updated versions of Python and `prophet` to greatly simplify the python environment setup in the Dockerfile. The code has been tested by creating a local Docker container, and sample outputs were written to the following tables in `moz-fx-data-bq-data-science.bochocki`: - `tmp_desktop_kpi_forecast` - `tmp_desktop_kpi_forecast_confidences` - `tmp_mobile_kpi_forecast` - `tmp_mobile_kpi_forecast_confidences` ## Additional Changes - `.gitignore`: ignore additional filetypes - `kpi_forecasting.py`: set confidence intervals `target` from `config` instead of relying on hardcoded `"desktop"`. This `target` is overwritten in `write_confidence_intervals_to_bigquery` [here](https://github.com/mozilla/docker-etl/blob/4cfbec915375343023944d1ca23f527251a5ada8/jobs/kpi-forecasting/kpi-forecasting/Utils/DBWriter.py#L116), but I think this change makes the it clear that we're not unintentionally using "desktop" labels on "mobile" forecasts. - `PosteriorSampling.py`: minor refactoring required to resolve errors and deprecation warnings that are now being raised by pandas as a result of package upgrades. - `README.md`: update examples - `requirements.txt`: updated packages to get easier-install versions of `prophet` and `statsforecast`. * black format * change `MAINTAINER` label * Revert "change `MAINTAINER` label" This reverts commit 27229dd. * include pytest-black
1 parent 2c751a6 commit 545c6f9

File tree

7 files changed

+127
-107
lines changed

7 files changed

+127
-107
lines changed

jobs/kpi-forecasting/.gitignore

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1+
.cache
12
.idea
2-
.vscode
3+
.local
34
.python-version
5+
.python_history
6+
.vscode

jobs/kpi-forecasting/Dockerfile

Lines changed: 2 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
FROM python:3.8
2-
MAINTAINER Perry McManis <pmcmanis@mozilla.com>
1+
FROM python:3.10
2+
LABEL maintainer="Brad Ochocki <bochocki@mozilla.com>"
33

44
# https://github.com/mozilla-services/Dockerflow/blob/master/docs/building-container.md
55
ARG USER_ID="10001"
@@ -12,19 +12,11 @@ RUN groupadd --gid ${USER_ID} ${GROUP_ID} && \
1212

1313
WORKDIR ${HOME}
1414

15-
RUN apt install gcc
16-
RUN apt install g++
17-
1815
RUN pip install --upgrade pip
1916

20-
RUN pip install pystan==2.19.1.1
21-
RUN python3 -m pip install prophet --no-cache-dir
22-
2317
COPY requirements.txt requirements.txt
2418
RUN pip install -r requirements.txt
2519

26-
RUN pip install git+https://github.com/Nixtla/statsforecast.git
27-
2820
COPY . .
2921

3022
# Drop root and change ownership of the application folder to the user

jobs/kpi-forecasting/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,9 @@ pip install -r requirements.txt
2121
Run the scripts with:
2222

2323
```sh
24-
python kpi_forecasting.py -c yaml/desktop.yaml
24+
python ~/kpi-forecasting/kpi_forecasting.py -c ~/kpi-forecasting/yaml/desktop_non_cumulative.yaml
2525

26-
python kpi_forecasting.py -c yaml/mobile.yaml
26+
python ~/kpi-forecasting/kpi_forecasting.py -c ~/kpi-forecasting/yaml/mobile_non_cumulative.yaml
2727
```
2828

2929
### On SQL Queries And Preprocessing

jobs/kpi-forecasting/kpi-forecasting/Utils/AutoArimaFit.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,6 @@
99

1010

1111
def run_forecast_arima(dataset: pd.DataFrame, config: dict) -> pd.DataFrame:
12-
1312
fit_parameters = config[
1413
"forecast_parameters"
1514
].copy() # you must force a copy here or it assigns a reference to

jobs/kpi-forecasting/kpi-forecasting/Utils/PosteriorSampling.py

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -31,10 +31,9 @@ def get_confidence_intervals(
3131
uncertainty_samples["ds"] > np.datetime64(final_observed_sample_date)
3232
]
3333
.groupby("{}".format(aggregation_unit_of_time))
34-
.sum()
34+
.sum(numeric_only=True)
3535
)
3636

37-
print(samples_df_grouped.tail())
3837
# start the aggregated dataframe with the mean of the uncertainty samples
3938
uncertainty_samples_aggregated = samples_df_grouped.mean(axis=1).reset_index()
4039

@@ -71,6 +70,8 @@ def get_confidence_intervals(
7170
columns={"y": "value"}
7271
).sort_values(by="{}".format(aggregation_unit_of_time))
7372

73+
observed_aggregated = observed_aggregated.astype({"value": np.float64})
74+
7475
# check if whether there are overlap in actual and forecast at the group level
7576
if (
7677
aggregation_unit_of_time == "ds_month"
@@ -83,10 +84,12 @@ def get_confidence_intervals(
8384
).dayofyear
8485
!= 1
8586
):
86-
uncertainty_samples_aggregated.at[0, 1:] = (
87-
uncertainty_samples_aggregated.iloc[0, 1:]
88-
+ observed_aggregated.iloc[-1].value
89-
)
87+
# add observed samples from current time period to uncertainty samples for
88+
# the remainder of the period.
89+
uncertainty_samples_aggregated.iloc[0, 1:] += observed_aggregated["value"].iloc[
90+
-1
91+
]
92+
9093
observed_aggregated = observed_aggregated.loc[
9194
observed_aggregated[aggregation_unit_of_time]
9295
< observed_aggregated[aggregation_unit_of_time].max()

jobs/kpi-forecasting/kpi-forecasting/kpi_forecasting.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ def main() -> None:
5050
aggregation_unit_of_time=config["confidences"],
5151
asofdate=predictions["ds"].max(),
5252
final_observed_sample_date=dataset["ds"].max(),
53-
target="desktop",
53+
target=config["target"],
5454
)
5555

5656
write_predictions_to_bigquery(predictions, config)
Lines changed: 109 additions & 86 deletions
Original file line numberDiff line numberDiff line change
@@ -1,94 +1,117 @@
1+
adagio==0.2.4
2+
ansi2html==1.8.0
3+
antlr4-python3-runtime==4.11.1
14
appdirs==1.4.4
2-
attrs==20.3.0
3-
bcrypt==3.2.0
4-
beautifulsoup4==4.10.0
5-
BigQuery-Python==1.15.0
6-
black==22.3.0
7-
cachetools==4.2.4
8-
certifi==2021.10.8
9-
cffi==1.15.0
10-
charset-normalizer==2.0.12
11-
click==8.0.4
12-
cmdstanpy==0.9.68
5+
asttokens==2.2.1
6+
backcall==0.2.0
7+
black==23.3.0
8+
blinker==1.6.2
9+
cachetools==5.3.0
10+
certifi==2023.5.7
11+
charset-normalizer==3.1.0
12+
click==8.1.3
13+
cmdstanpy==1.1.0
14+
comm==0.1.3
15+
contourpy==1.0.7
1316
convertdate==2.4.0
14-
cryptography==36.0.1
1517
cycler==0.11.0
16-
Cython==0.29.28
17-
ephem==4.1.3
18-
flake8==3.8.4
19-
google-api-core==1.31.5
20-
google-api-python-client==2.38.0
21-
google-auth-httplib2==0.1.0
22-
google-auth-oauthlib==0.5.0
23-
google-auth==1.35.0
24-
google-cloud-bigquery-storage==1.0.0
25-
google-cloud-bigquery==1.27.2
26-
google-cloud-core==1.7.2
27-
google-cloud-storage==1.31.0
28-
google-crc32c==1.3.0
29-
google-resumable-media==1.3.3
30-
google==3.0.0
31-
googleapis-common-protos==1.55.0
32-
grpcio==1.44.0
33-
hijri-converter==2.2.3
34-
holidays==0.16
35-
httplib2==0.20.4
36-
idna==3.3
37-
iniconfig==1.1.1
38-
Jinja2==2.11.2
39-
joblib==1.2.0
40-
kiwisolver==1.3.2
41-
korean-lunar-calendar==0.2.1
18+
dash==2.9.3
19+
dash-core-components==2.0.0
20+
dash-html-components==2.0.0
21+
dash-table==5.0.0
22+
db-dtypes==1.1.1
23+
debugpy==1.6.7
24+
decorator==5.1.1
25+
ephem==4.1.4
26+
exceptiongroup==1.1.1
27+
executing==1.2.0
28+
Flask==2.3.2
29+
fonttools==4.39.3
30+
fs==2.4.16
31+
fugue==0.8.3
32+
fugue-sql-antlr==0.1.6
33+
google-api-core==2.11.0
34+
google-auth==2.17.3
35+
google-cloud-bigquery==3.10.0
36+
google-cloud-core==2.3.2
37+
google-crc32c==1.5.0
38+
google-resumable-media==2.5.0
39+
googleapis-common-protos==1.59.0
40+
grpcio==1.54.0
41+
grpcio-status==1.54.0
42+
hijri-converter==2.3.1
43+
holidays==0.24
44+
idna==3.4
45+
iniconfig==2.0.0
46+
ipykernel==6.23.0
47+
ipython==8.13.2
48+
itsdangerous==2.1.2
49+
jedi==0.18.2
50+
Jinja2==3.1.2
51+
jupyter-dash==0.4.2
52+
jupyter_client==8.2.0
53+
jupyter_core==5.3.0
54+
kiwisolver==1.4.4
55+
korean-lunar-calendar==0.3.1
56+
llvmlite==0.40.0
4257
LunarCalendar==0.0.9
43-
MarkupSafe==1.1.1
44-
matplotlib==3.3.2
45-
mccabe==0.6.1
46-
more-itertools==8.6.0
47-
mypy-extensions==0.4.3
48-
numpy
49-
oauthlib==3.2.0
50-
packaging==21.3
51-
pandas-gbq==0.13.2
52-
pandas==1.3.5
53-
paramiko==2.9.2
54-
pathspec==0.9.0
55-
Pillow==9.0.1
56-
plotly==4.9.0
57-
pluggy==0.13.1
58-
protobuf==3.19.4
59-
py==1.10.0
60-
pyarrow==7.0.0
61-
pyasn1-modules==0.2.8
62-
pyasn1==0.4.8
63-
pycodestyle==2.6.0
64-
pycparser==2.21
65-
pydata-google-auth==1.3.0
66-
pyflakes==2.2.0
67-
PyMeeus==0.5.11
68-
PyNaCl==1.5.0
69-
pyparsing==2.4.7
70-
pytest-black==0.3.11
71-
pytest-flake8==1.0.6
72-
pytest==6.0.2
58+
MarkupSafe==2.1.2
59+
matplotlib==3.7.1
60+
matplotlib-inline==0.1.6
61+
mypy-extensions==1.0.0
62+
nest-asyncio==1.5.6
63+
numba==0.57.0
64+
numpy==1.24.3
65+
orjson==3.8.12
66+
packaging==23.1
67+
pandas==1.5.3
68+
parso==0.8.3
69+
pathspec==0.11.1
70+
patsy==0.5.3
71+
pexpect==4.8.0
72+
pickleshare==0.7.5
73+
Pillow==9.5.0
74+
platformdirs==3.5.0
75+
plotly==5.14.1
76+
plotly-resampler==0.8.3.2
77+
pluggy==1.0.0
78+
prompt-toolkit==3.0.38
79+
prophet==1.1.2
80+
proto-plus==1.22.2
81+
protobuf==4.23.0
82+
psutil==5.9.5
83+
ptyprocess==0.7.0
84+
pure-eval==0.2.2
85+
pyarrow==12.0.0
86+
pyasn1==0.5.0
87+
pyasn1-modules==0.3.0
88+
Pygments==2.15.1
89+
PyMeeus==0.5.12
90+
pyparsing==3.0.9
91+
pytest==7.3.1
92+
pytest-black==0.3.12
7393
python-dateutil==2.8.2
74-
pytz==2021.3
94+
pytz==2023.3
7595
PyYAML==6.0
76-
regex==2020.11.13
77-
requests-oauthlib==1.3.1
78-
requests==2.27.1
79-
retrying==1.3.3
80-
rsa==4.8
81-
setuptools-git==1.2
96+
pyzmq==25.0.2
97+
qpd==0.4.1
98+
requests==2.30.0
99+
retrying==1.3.4
100+
rsa==4.9
101+
scipy==1.10.1
82102
six==1.16.0
83-
soupsieve==2.3.1
84-
statsforecast==1.1.0
85-
statsmodels==0.13.2
86-
storage==0.0.4.3
87-
threadpoolctl==3.1.0
103+
sqlglot==12.2.0
104+
stack-data==0.6.2
105+
statsforecast==1.5.0
106+
statsmodels==0.14.0
107+
tenacity==8.2.2
88108
toml==0.10.2
89-
tqdm==4.63.0
90-
typed-ast==1.5.4
91-
typing-extensions==3.10.0.0
92-
ujson==5.1.0
93-
uritemplate==4.1.1
94-
urllib3==1.26.8
109+
tomli==2.0.1
110+
tornado==6.3.1
111+
tqdm==4.65.0
112+
trace-updater==0.0.9.1
113+
traitlets==5.9.0
114+
triad==0.8.7
115+
urllib3==2.0.2
116+
wcwidth==0.2.6
117+
Werkzeug==2.3.4

0 commit comments

Comments
 (0)