diff --git a/_config.yml b/_config.yml index b42c373..0629e9e 100644 --- a/_config.yml +++ b/_config.yml @@ -36,8 +36,9 @@ html: use_repository_button: true extra_footer : This book is available under an MIT license. baseurl : "https://aeturrell.github.io/coding-for-economists" - google_analytics_id : "G-9FZQCPFXZJ" # A GA id that can be used to track book views. use_multitoc_numbering : false + analytics: + google_analytics_id : "G-9FZQCPFXZJ" # A GA id that can be used to track book views. # NB this only works if repo info filled in, and if # repo is public. See https://jupyterbook.org/interactive/launchbuttons.html launch_buttons: diff --git a/auto-research-outputs.md b/auto-research-outputs.md index 2814827..d8ce35d 100644 --- a/auto-research-outputs.md +++ b/auto-research-outputs.md @@ -116,9 +116,39 @@ reg_results = Stargazer([est, est2]) reg_results ``` -which can similarly be cast into $\LaTeX$ using `reg_results.render_latex()`. +```{code-cell} ipython3 +import numpy as np +import pandas as pd +#import pylatex as pl # for the latex table; note: not a dependency of pyfixest - needs manual installation +from great_tables import loc, style +from IPython.display import FileLink, display + +import pyfixest as pf + +data = pf.get_data() + +fit1 = pf.feols("Y ~ X1 + X2 | f1", data=data) +fit2 = pf.feols("Y ~ X1 + X2 | f1 + f2", data=data) +fit3 = pf.feols("Y2 ~ X1 + X2 | f1", data=data) +fit4 = pf.feols("Y2 ~ X1 + X2 | f1 + f2", data=data) + +pf.etable([fit1, fit2, fit3, fit4,]) +``` + +which can be cast into $\LaTeX$ using `type="tex"`. + +```{code-cell} ipython3 +tab = pf.etable( + [fit1, fit2, fit3, fit4], + digits=2, + type="tex", + print_tex=True, +) + +tab +``` -We'd like to export tables like this into files that can be picked up by our $\LaTeX$ document. We must first save it to the right place from Python. This would be +We'd like to export tables like this into files that can be picked up by our $\LaTeX$ document. We must first save it to the right place from Python. Assuming you have the folders "outputs/tables" relative to your working directory, this would be ```python from pathlib import Path @@ -131,7 +161,7 @@ in the first example, and ```python from pathlib import Path with open(Path('outputs/tables/reg_table.tex'), 'w') as f: - f.write(reg_results.render_latex()) + f.write(tab) ``` in the second. Remember that `Path` is a clever module that will find the relevant file path regardless of which operating system you happen to be using at the time. This is especially useful when you have co-authors on different systems! diff --git a/code-preliminaries.md b/code-preliminaries.md index 3822ed1..4ba0009 100644 --- a/code-preliminaries.md +++ b/code-preliminaries.md @@ -119,7 +119,7 @@ powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | ie for Windows. Hit return to execute the commands. -Once you have installed uv, you can check it's in properly by running `uv --version`. You should see a message pop up that says "uv" and then the latest version number. +Once you have installed uv, you can check it's installed properly by running `uv --version`. You should see a message pop up that says "uv" and then the latest version number. ### Installing your integrated development environment, Visual Studio Code diff --git a/craft-writing-papers.md b/craft-writing-papers.md index 2f1a66b..1b1b76c 100644 --- a/craft-writing-papers.md +++ b/craft-writing-papers.md @@ -321,9 +321,6 @@ Once you've written a draft you're happy with, there are a bunch of checks you c - Look back at your figures and tables, and be brutal. Do you need them all? Do they all convey important messages that a reader cannot get from the text? In general, you can have up to approximately 10 floats before an editor or referee may wonder if they have picked up a picture book rather than a journal article. Naturally, in some special cases—for example, if the paper is about data visualisation—you may feel warranted in having more. But a good check is whether you can tell the story in just four floats, and the most important result in just one. -![First drafts](https://quotesnhumor.com/wp-content/uploads/2018/04/Writing-memes7.jpg) -*First drafts* - ## Further Resources Two extremely good general resources on writing are {cite:t}`zinsser2006writing` and {cite:t}`white1972elements`. For a more in-depth take on writing papers (specific to applied economics papers), see {cite:t}`bellemare2020write`. diff --git a/data-analysis-quickstart.ipynb b/data-analysis-quickstart.ipynb index 4574508..f0fc1fe 100644 --- a/data-analysis-quickstart.ipynb +++ b/data-analysis-quickstart.ipynb @@ -486,7 +486,7 @@ "metadata": {}, "outputs": [], "source": [ - "table = df[[\"mass\", \"height\"]].agg([np.mean, np.std])\n", + "table = df[[\"mass\", \"height\"]].agg([\"mean\", \"std\"])\n", "table" ] }, diff --git a/data-categorical.ipynb b/data-categorical.ipynb index 7c5c4da..60849da 100644 --- a/data-categorical.ipynb +++ b/data-categorical.ipynb @@ -350,7 +350,7 @@ "outputs": [], "source": [ "time_df = pd.DataFrame(\n", - " pd.Series(pd.date_range(\"2015/05/01\", periods=5, freq=\"M\"), dtype=\"category\"),\n", + " pd.Series(pd.date_range(\"2015/05/01\", periods=5, freq=\"ME\"), dtype=\"category\"),\n", " columns=[\"datetime\"],\n", ")\n", "time_df" @@ -386,9 +386,6 @@ } ], "metadata": { - "interpreter": { - "hash": "9d7534ecd9fbc7d385378f8400cf4d6cb9c6175408a574f1c99c5269f08771cc" - }, "jupytext": { "cell_metadata_filter": "-all", "encoding": "# -*- coding: utf-8 -*-", @@ -396,7 +393,7 @@ "main_language": "python" }, "kernelspec": { - "display_name": "Python 3 (ipykernel)", + "display_name": "codeforecon", "language": "python", "name": "python3" }, @@ -410,7 +407,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.12" + "version": "3.10.16" }, "toc-showtags": true }, diff --git a/data-databases.ipynb b/data-databases.ipynb index 8843b09..8bef2f6 100644 --- a/data-databases.ipynb +++ b/data-databases.ipynb @@ -788,7 +788,7 @@ "main_language": "python" }, "kernelspec": { - "display_name": "Python 3.10.12 ('codeforecon')", + "display_name": "codeforecon", "language": "python", "name": "python3" }, @@ -802,14 +802,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.13" + "version": "3.10.16" }, - "toc-showtags": true, - "vscode": { - "interpreter": { - "hash": "c4570b151692b3082981c89d172815ada9960dee4eb0bedb37dc10c95601d3bd" - } - } + "toc-showtags": true }, "nbformat": 4, "nbformat_minor": 5 diff --git a/data-exploratory-analysis.ipynb b/data-exploratory-analysis.ipynb index 4c36ebb..732306e 100644 --- a/data-exploratory-analysis.ipynb +++ b/data-exploratory-analysis.ipynb @@ -646,10 +646,6 @@ "source": [ "## The **ydata-profiling** package\n", "\n", - "```{warning}\n", - "The live example in this section is not currently working due to **ydata-profiling**'s dependency on an older version of **numpy**. If you need to use something from this section, you probably still can: *coding for economists* has a huge number of dependencies, much more than a normal project, and this particular problem may not affect you. If you want to help, you can show your support for a resolution on [this GitHub issue](https://github.com/ydataai/ydata-profiling/issues/1456)—though please do bear in mind that most open source libraries are run by volunteers, and you should always be constructive in your interactions. The second is to contribute to the library yourself by creating a pull request that fixes the problem.\n", - "```\n", - "\n", "The EDA we did using the built-in **pandas** functions was a bit limited and user-input heavy. The [**ydata-profiling**](https://docs.profiling.ydata.ai/) library aims to automate the legwork of EDA for you. It generates 'profile' reports from a pandas DataFrame. For each column, many statistics are computed and then relayed in an interactive HTML report.\n", "\n", "Let's generate a report on our dataset using the `minimal=True` setting (the default settings produce a lot of computationally expensive extras):\n" @@ -661,13 +657,12 @@ "metadata": {}, "outputs": [], "source": [ - "# from ydata_profiling import ProfileReport\n", - "\n", + "from ydata_profiling import ProfileReport\n", "\n", - "# profile = ProfileReport(\n", - "# df, minimal=True, title=\"Profiling Report: Grinnell House Sales\"\n", - "# )\n", - "# profile.to_notebook_iframe()" + "profile = ProfileReport(\n", + " df, minimal=True, title=\"Profiling Report: Grinnell House Sales\"\n", + ")\n", + "profile.to_notebook_iframe()" ] }, { @@ -708,7 +703,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3.10.12 ('codeforecon')", + "display_name": "codeforecon", "language": "python", "name": "python3" }, @@ -722,12 +717,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.13" - }, - "vscode": { - "interpreter": { - "hash": "c4570b151692b3082981c89d172815ada9960dee4eb0bedb37dc10c95601d3bd" - } + "version": "3.10.16" } }, "nbformat": 4, diff --git a/data-intro.ipynb b/data-intro.ipynb index bd3a297..359a293 100644 --- a/data-intro.ipynb +++ b/data-intro.ipynb @@ -1375,7 +1375,7 @@ "metadata": {}, "outputs": [], "source": [ - "index = pd.date_range(\"1/1/2000\", periods=12, freq=\"Q\")\n", + "index = pd.date_range(\"1/1/2000\", periods=12, freq=\"QE\")\n", "df = pd.DataFrame(np.random.randint(0, 10, (12, 5)), index=index, columns=list(\"ABCDE\"))\n", "df" ] diff --git a/data-joining-data.ipynb b/data-joining-data.ipynb index 527f5a0..a3da8c5 100644 --- a/data-joining-data.ipynb +++ b/data-joining-data.ipynb @@ -73,7 +73,7 @@ "source": [ "import pandas as pd\n", "\n", - "base_url = \"http://www.stata-press.com/data/r14/\"\n", + "base_url = \"https://github.com/aeturrell/coding-for-economists/raw/refs/heads/general-hygiene/data/\" # TODO change to main post merge\n", "state_codes = [\"ca\", \"il\"]\n", "end_url = \"pop.dta\"\n", "\n", @@ -231,9 +231,6 @@ } ], "metadata": { - "interpreter": { - "hash": "9d7534ecd9fbc7d385378f8400cf4d6cb9c6175408a574f1c99c5269f08771cc" - }, "jupytext": { "cell_metadata_filter": "-all", "encoding": "# -*- coding: utf-8 -*-", @@ -241,7 +238,7 @@ "main_language": "python" }, "kernelspec": { - "display_name": "Python 3 (ipykernel)", + "display_name": "codeforecon", "language": "python", "name": "python3" }, @@ -255,7 +252,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.12" + "version": "3.10.16" }, "toc-showtags": true }, diff --git a/data-missing-values.ipynb b/data-missing-values.ipynb index 1651a1d..4d7d3ad 100644 --- a/data-missing-values.ipynb +++ b/data-missing-values.ipynb @@ -222,7 +222,7 @@ "metadata": {}, "outputs": [], "source": [ - "nan_df.fillna(method=\"ffill\")" + "nan_df.ffill()" ] }, { @@ -232,7 +232,7 @@ "metadata": {}, "outputs": [], "source": [ - "nan_df.fillna(method=\"bfill\")" + "nan_df.bfill()" ] }, { @@ -561,7 +561,7 @@ "metadata": {}, "outputs": [], "source": [ - "health_cut.groupby(\"smoker\")[\"age\"].mean()" + "health_cut.groupby(\"smoker\", observed=False)[\"age\"].mean()" ] }, { @@ -581,7 +581,7 @@ "main_language": "python" }, "kernelspec": { - "display_name": "Python 3.8.13 ('codeforecon')", + "display_name": "codeforecon", "language": "python", "name": "python3" }, @@ -595,14 +595,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.12" + "version": "3.10.16" }, - "toc-showtags": true, - "vscode": { - "interpreter": { - "hash": "caf5ac9f613b176c5984ad2a1a4525760eb7d898a3291351da4c152dc719ffa1" - } - } + "toc-showtags": true }, "nbformat": 4, "nbformat_minor": 5 diff --git a/data-numbers.ipynb b/data-numbers.ipynb index b18f715..dcd1223 100644 --- a/data-numbers.ipynb +++ b/data-numbers.ipynb @@ -108,7 +108,7 @@ "outputs": [], "source": [ "(\n", - " flights.groupby([\"dest\"])\n", + " flights.groupby([\"dest\"], observed=False)\n", " .agg(\n", " mean_delay=(\"dep_delay\", \"mean\"),\n", " count_flights=(\"dest\", \"count\"),\n", @@ -132,7 +132,7 @@ "metadata": {}, "outputs": [], "source": [ - "(flights.groupby(\"tailnum\").agg(miles=(\"distance\", \"sum\")))" + "(flights.groupby(\"tailnum\", observed=False).agg(miles=(\"distance\", \"sum\")))" ] }, { @@ -150,7 +150,11 @@ "metadata": {}, "outputs": [], "source": [ - "(flights.groupby(\"dest\").agg(n_cancelled=(\"dep_time\", lambda x: x.isnull().sum())))" + "(\n", + " flights.groupby(\"dest\", observed=False).agg(\n", + " n_cancelled=(\"dep_time\", lambda x: x.isnull().sum())\n", + " )\n", + ")" ] }, { @@ -772,9 +776,6 @@ } ], "metadata": { - "interpreter": { - "hash": "9d7534ecd9fbc7d385378f8400cf4d6cb9c6175408a574f1c99c5269f08771cc" - }, "jupytext": { "cell_metadata_filter": "-all", "encoding": "# -*- coding: utf-8 -*-", @@ -782,7 +783,7 @@ "main_language": "python" }, "kernelspec": { - "display_name": "Python 3 (ipykernel)", + "display_name": "codeforecon", "language": "python", "name": "python3" }, @@ -796,7 +797,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.13" + "version": "3.10.16" }, "toc-showtags": true }, diff --git a/data-spreadsheets.ipynb b/data-spreadsheets.ipynb index 1fcf79e..709cdfd 100644 --- a/data-spreadsheets.ipynb +++ b/data-spreadsheets.ipynb @@ -125,7 +125,7 @@ " \"data/students.xlsx\",\n", " names=[\"student_id\", \"full_name\", \"favourite_food\", \"meal_plan\", \"age\"],\n", ")\n", - "students[\"age\"] = students[\"age\"].replace(\"five\", 5)\n", + "students[\"age\"] = students[\"age\"].replace(\"five\", \"5\").astype(float)\n", "students" ] }, @@ -422,7 +422,7 @@ "main_language": "python" }, "kernelspec": { - "display_name": "Python 3.8.13 ('codeforecon')", + "display_name": "codeforecon", "language": "python", "name": "python3" }, @@ -436,14 +436,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.13" + "version": "3.10.16" }, - "toc-showtags": true, - "vscode": { - "interpreter": { - "hash": "caf5ac9f613b176c5984ad2a1a4525760eb7d898a3291351da4c152dc719ffa1" - } - } + "toc-showtags": true }, "nbformat": 4, "nbformat_minor": 5 diff --git a/data-transformation.ipynb b/data-transformation.ipynb index 2d0ba03..9b4e3f1 100644 --- a/data-transformation.ipynb +++ b/data-transformation.ipynb @@ -286,7 +286,7 @@ "metadata": {}, "outputs": [], "source": [ - "index = pd.date_range(\"1/1/2000\", periods=10, freq=\"Q\")\n", + "index = pd.date_range(\"1/1/2000\", periods=10, freq=\"QE\")\n", "data = np.random.randint(0, 10, (10, 2))\n", "df = pd.DataFrame(data, index=index, columns=[\"values1\", \"values2\"])\n", "df[\"type\"] = np.random.choice([\"group\" + str(i) for i in range(3)], 10)\n", @@ -780,7 +780,7 @@ "metadata": {}, "outputs": [], "source": [ - "df = pd.read_csv(\"https://calmcode.io/datasets/birthdays.csv\")\n", + "df = pd.read_csv(\"https://calmcode.io/static/data/birthdays.csv\")\n", "df[\"date\"] = pd.to_datetime(df[\"date\"])\n", "df = df.set_index(\"date\")\n", "df.head()" @@ -957,11 +957,9 @@ ], "metadata": { "celltoolbar": "Tags", - "interpreter": { - "hash": "c4570b151692b3082981c89d172815ada9960dee4eb0bedb37dc10c95601d3bd" - }, "kernelspec": { - "display_name": "Python 3.10.12 64-bit ('codeforecon': conda)", + "display_name": "codeforecon", + "language": "python", "name": "python3" }, "language_info": { @@ -974,7 +972,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.13" + "version": "3.10.16" } }, "nbformat": 4, diff --git a/econmt-bayes-bambi.ipynb b/econmt-bayes-bambi.ipynb index dd4f117..a06af46 100644 --- a/econmt-bayes-bambi.ipynb +++ b/econmt-bayes-bambi.ipynb @@ -679,7 +679,7 @@ ")\n", "ax.scatter(\n", " df_sch[\"frac\"],\n", - " idata_mean.posterior.stay_mean.mean(axis=0).mean(axis=0),\n", + " idata_mean.posterior[\"p\"].mean(axis=0).mean(axis=0),\n", " label=\"posterior mean\",\n", " color=\"C1\",\n", " alpha=dot_transp,\n", @@ -704,7 +704,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3.8.13 ('codeforecon')", + "display_name": "codeforecon", "language": "python", "name": "python3" }, @@ -718,12 +718,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.12" - }, - "vscode": { - "interpreter": { - "hash": "caf5ac9f613b176c5984ad2a1a4525760eb7d898a3291351da4c152dc719ffa1" - } + "version": "3.10.16" } }, "nbformat": 4, diff --git a/econmt-diagnostics.ipynb b/econmt-diagnostics.ipynb index 08fe3ff..26bc2c2 100644 --- a/econmt-diagnostics.ipynb +++ b/econmt-diagnostics.ipynb @@ -618,7 +618,7 @@ "metadata": { "celltoolbar": "Tags", "kernelspec": { - "display_name": "Python 3.8.13 ('codeforecon')", + "display_name": "codeforecon", "language": "python", "name": "python3" }, @@ -632,12 +632,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.13" - }, - "vscode": { - "interpreter": { - "hash": "caf5ac9f613b176c5984ad2a1a4525760eb7d898a3291351da4c152dc719ffa1" - } + "version": "3.10.16" } }, "nbformat": 4, diff --git a/econmt-regression.ipynb b/econmt-regression.ipynb index 27aed4d..01c9e30 100644 --- a/econmt-regression.ipynb +++ b/econmt-regression.ipynb @@ -938,7 +938,7 @@ "metadata": { "celltoolbar": "Tags", "kernelspec": { - "display_name": "Python 3.8.13 ('codeforecon')", + "display_name": "codeforecon", "language": "python", "name": "python3" }, @@ -952,12 +952,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.13" - }, - "vscode": { - "interpreter": { - "hash": "caf5ac9f613b176c5984ad2a1a4525760eb7d898a3291351da4c152dc719ffa1" - } + "version": "3.10.16" } }, "nbformat": 4, diff --git a/environment.yml b/environment.yml index 8747b6e..78f823d 100644 --- a/environment.yml +++ b/environment.yml @@ -3,7 +3,7 @@ channels: - conda-forge dependencies: - jupyter - - numpy + - numpy<2.0.0 - pandas>=2.1.0 - pip - python=3.10 @@ -65,18 +65,19 @@ dependencies: - great_tables - polars - ibis-sqlite + - sqlmodel + - ydata_profiling + - scikit-learn==1.5.2 - pip: - specification_curve - stargazer - matplotlib-scalebar - skimpy - graphviz - - sqlmodel - binsreg - feature-engine - lets-plot>=4.3.0 - palmerpenguins - pyfixest>=0.17.0 - watermark - - ruff - pdftotext diff --git a/geo-intro.ipynb b/geo-intro.ipynb index 0312603..46f847e 100644 --- a/geo-intro.ipynb +++ b/geo-intro.ipynb @@ -1027,8 +1027,9 @@ "outputs": [], "source": [ "boros_df[[\"geometry\", \"address\"]] = gpd.tools.geocode(\n", - " boros_df.boro_name, provider=\"photon\"\n", - ")\n", + " boros_df.boro_name,\n", + " provider=\"photon\",\n", + ").set_geometry(\"geometry\")\n", "boros_df" ] }, @@ -1239,11 +1240,9 @@ ], "metadata": { "celltoolbar": "Tags", - "interpreter": { - "hash": "c4570b151692b3082981c89d172815ada9960dee4eb0bedb37dc10c95601d3bd" - }, "kernelspec": { - "display_name": "Python 3.10.12 64-bit ('codeforecon': conda)", + "display_name": "codeforecon", + "language": "python", "name": "python3" }, "language_info": { @@ -1256,7 +1255,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.13" + "version": "3.10.16" } }, "nbformat": 4, diff --git a/geo-vis.ipynb b/geo-vis.ipynb index 0b1fd23..66329f5 100644 --- a/geo-vis.ipynb +++ b/geo-vis.ipynb @@ -373,7 +373,7 @@ "outputs": [], "source": [ "collisions = gpd.read_file(gplt.datasets.get_path(\"nyc_collision_factors\"))\n", - "# gplt.quadtree(collisions, nmax=1);" + "gplt.quadtree(collisions, nmax=1);" ] }, { @@ -390,14 +390,14 @@ "outputs": [], "source": [ "boroughs = gpd.read_file(gplt.datasets.get_path(\"nyc_boroughs\"))\n", - "# gplt.quadtree(\n", - "# collisions,\n", - "# nmax=1,\n", - "# projection=gcrs.AlbersEqualArea(),\n", - "# clip=boroughs.simplify(0.001),\n", - "# facecolor=\"lightgray\",\n", - "# edgecolor=\"white\",\n", - "# );" + "gplt.quadtree(\n", + " collisions,\n", + " nmax=1,\n", + " projection=gcrs.AlbersEqualArea(),\n", + " clip=boroughs.simplify(0.001),\n", + " facecolor=\"lightgray\",\n", + " edgecolor=\"white\",\n", + ");" ] }, { @@ -413,17 +413,17 @@ "metadata": {}, "outputs": [], "source": [ - "# gplt.quadtree(\n", - "# collisions,\n", - "# nmax=1,\n", - "# agg=np.mean,\n", - "# projection=gcrs.AlbersEqualArea(),\n", - "# clip=boroughs,\n", - "# hue=\"NUMBER OF PEDESTRIANS INJURED\",\n", - "# cmap=\"plasma\",\n", - "# edgecolor=\"k\",\n", - "# legend=True,\n", - "# );" + "gplt.quadtree(\n", + " collisions,\n", + " nmax=1,\n", + " agg=np.mean,\n", + " projection=gcrs.AlbersEqualArea(),\n", + " clip=boroughs,\n", + " hue=\"NUMBER OF PEDESTRIANS INJURED\",\n", + " cmap=\"plasma\",\n", + " edgecolor=\"k\",\n", + " legend=True,\n", + ");" ] }, { @@ -641,11 +641,10 @@ "source": [ "import osmnx as ox\n", "\n", - "coffee_shops = ox.features_from_place(\n", + "coffee_shops = ox.features_from_address(\n", " \"Canary Wharf, London, UK\",\n", " tags={\"amenity\": \"cafe\"},\n", - " buffer_dist=300,\n", - " which_result=1,\n", + " dist=300,\n", ")\n", "coffee_shops = coffee_shops.to_crs(\"EPSG:3857\")" ] @@ -759,11 +758,9 @@ ], "metadata": { "celltoolbar": "Tags", - "interpreter": { - "hash": "671f4d32165728098ed6607f79d86bfe6b725b450a30021a55936f1af379a247" - }, "kernelspec": { - "display_name": "Python 3.10.12 64-bit ('codeforecon': conda)", + "display_name": "codeforecon", + "language": "python", "name": "python3" }, "language_info": { @@ -776,7 +773,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.13" + "version": "3.10.16" } }, "nbformat": 4, diff --git a/maths-numerical.ipynb b/maths-numerical.ipynb index 3805375..82a770e 100644 --- a/maths-numerical.ipynb +++ b/maths-numerical.ipynb @@ -710,12 +710,12 @@ "metadata": {}, "outputs": [], "source": [ - "from scipy.integrate import simps\n", + "from scipy.integrate import simpson\n", "\n", "x = np.arange(0, 10)\n", "f_of_x = np.arange(0, 10)\n", "\n", - "simps(f_of_x, x) - 9**2 / 2" + "simpson(f_of_x, x) - 9**2 / 2" ] }, { @@ -761,7 +761,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.12" + "version": "3.10.16" }, "vscode": { "interpreter": { diff --git a/ml-data.ipynb b/ml-data.ipynb index a1582a0..3ff4776 100644 --- a/ml-data.ipynb +++ b/ml-data.ipynb @@ -869,7 +869,7 @@ " \"City\": [\"London\", \"Manchester\", \"Liverpool\", \"Bristol\"],\n", " \"Age\": [20, 21, 19, 18],\n", " \"Marks\": [0.9, 0.8, 0.7, 0.6],\n", - " \"dob\": pd.date_range(\"2020-02-24\", periods=4, freq=\"T\"),\n", + " \"dob\": pd.date_range(\"2020-02-24\", periods=4, freq=\"min\"),\n", " }\n", ")\n", "\n", @@ -1076,7 +1076,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.13" + "version": "3.10.16" } }, "nbformat": 4, diff --git a/ml-sup.ipynb b/ml-sup.ipynb index ac9e1ea..6b98e3a 100644 --- a/ml-sup.ipynb +++ b/ml-sup.ipynb @@ -564,9 +564,7 @@ "reg_svr = SVR(kernel=\"linear\", C=10)\n", "reg_svr.fit(train_df.iloc[:, :-1], train_df[\"y\"])\n", "\n", - "mean_squared_error(\n", - " y_true=test_df[\"y\"], y_pred=reg_svr.predict(test_df.iloc[:, :-1])\n", - ").round(4)" + "mean_squared_error(y_true=test_df[\"y\"], y_pred=reg_svr.predict(test_df.iloc[:, :-1]))" ] }, { @@ -610,7 +608,7 @@ " return_std=True,\n", ")\n", "\n", - "mean_squared_error(y_true=test_df[\"y\"], y_pred=mean_predictions_gpr).round(4)" + "mean_squared_error(y_true=test_df[\"y\"], y_pred=mean_predictions_gpr)" ] }, { @@ -773,7 +771,7 @@ "metadata": { "celltoolbar": "Tags", "kernelspec": { - "display_name": "Python 3.8.13 ('codeforecon')", + "display_name": "codeforecon", "language": "python", "name": "python3" }, @@ -787,12 +785,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.13" - }, - "vscode": { - "interpreter": { - "hash": "caf5ac9f613b176c5984ad2a1a4525760eb7d898a3291351da4c152dc719ffa1" - } + "version": "3.10.16" } }, "nbformat": 4, diff --git a/text-nlp.ipynb b/text-nlp.ipynb index 49c8e59..9741499 100644 --- a/text-nlp.ipynb +++ b/text-nlp.ipynb @@ -1174,7 +1174,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3.8.13 ('codeforecon')", + "display_name": "codeforecon", "language": "python", "name": "python3" }, @@ -1188,12 +1188,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.12" - }, - "vscode": { - "interpreter": { - "hash": "c4570b151692b3082981c89d172815ada9960dee4eb0bedb37dc10c95601d3bd" - } + "version": "3.10.16" } }, "nbformat": 4, diff --git a/vis-common-plots.ipynb b/vis-common-plots.ipynb index 291de04..5362465 100644 --- a/vis-common-plots.ipynb +++ b/vis-common-plots.ipynb @@ -580,6 +580,9 @@ " .join(df.iloc[1:].reset_index(drop=True), lsuffix=\"_from\", rsuffix=\"_to\")\n", ")\n", "\n", + "min_yr = df[\"Year\"].min()\n", + "max_yr = df[\"Year\"].max()\n", + "\n", "(\n", " ggplot(df, aes(\"Unemployment\", \"Vacancies\"))\n", " + geom_segment(\n", @@ -599,7 +602,7 @@ " + geom_point(shape=21, color=\"gray\", fill=\"#c28dc3\", size=5)\n", " + geom_text(\n", " aes(label=\"Year\"),\n", - " data=df[df[\"Year\"].isin([2001, 2021])],\n", + " data=df[df[\"Year\"].isin([min_yr, max_yr])],\n", " position=position_nudge(y=0.3),\n", " )\n", " + labs(x=\"Unemployment rate, %\", y=\"Vacancy rate, %\")\n", diff --git a/vis-letsplot.ipynb b/vis-letsplot.ipynb index 9b70437..8e217cc 100644 --- a/vis-letsplot.ipynb +++ b/vis-letsplot.ipynb @@ -406,7 +406,7 @@ "list_dfs = [\n", " web.DataReader(value, \"fred\", start, end)\n", " .rename(columns={value: key})\n", - " .groupby(pd.Grouper(freq=\"AS\"))\n", + " .groupby(pd.Grouper(freq=\"YS\"))\n", " .mean()\n", " for key, value in code_dict.items()\n", "]\n", @@ -579,7 +579,7 @@ "plotted_data = (\n", " ggplot(penguins, aes(x=\"flipper_length_mm\", y=\"body_mass_g\")) + geom_point()\n", ")\n", - "ggsave(plotted_data, filename=\"penguin-plot.svg\")" + "ggsave(plotted_data, filename=\"penguin-plot.svg\");" ] }, { @@ -612,7 +612,7 @@ "metadata": { "celltoolbar": "Tags", "kernelspec": { - "display_name": "Python 3.10.12 ('codeforecon')", + "display_name": "codeforecon", "language": "python", "name": "python3" }, @@ -626,12 +626,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.12" - }, - "vscode": { - "interpreter": { - "hash": "c4570b151692b3082981c89d172815ada9960dee4eb0bedb37dc10c95601d3bd" - } + "version": "3.10.16" } }, "nbformat": 4, diff --git a/wrkflow-environments.md b/wrkflow-environments.md index da149ce..04720f6 100644 --- a/wrkflow-environments.md +++ b/wrkflow-environments.md @@ -25,10 +25,6 @@ In Python, there are multiple tools for managing different environments. Of thos If you're just getting going with coding, this book recommends that you use uv. -## Using Miniconda to Manage Python Environments - -Much of these two subsections is covered by the Miniconda documentation on [managing virtual environments](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html). - ## Using uv to Manage Environments [**uv**](https://docs.astral.sh/uv/) automatically creates a virtual environment per project folder so that each project you do gets its own distinct and independent Python installation. As well as providing virtual environments by default, it can: @@ -80,6 +76,10 @@ Visual Studio Code makes this easier though: just as with conda environments, yo uv is especially strong for reproducibility. Imagine you wish to have a co-author or colleague install everything they need for the project. If you send them the automatically generated `pyproject.toml` and `uv.lock` files then all they need is to run `uv sync --frozen`. This will install all of (*exactly* the same) packages needed to run the code! +## Using Miniconda to Manage Python Environments + +Much of these two subsections is covered by the Miniconda documentation on [managing virtual environments](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html). + ### Creating Environments If you're using Miniconda for Python, you manage and change environments on the command line (more on the command line in {ref}`wrkflow-command-line`) too. Before following these instructions, check that you have Miniconda installed and activated. You should see something like `(base) username@computername:~$` on the command line (base is the default conda environment). diff --git a/wrkflow-version-control.md b/wrkflow-version-control.md index cdeccf5..8bb7445 100644 --- a/wrkflow-version-control.md +++ b/wrkflow-version-control.md @@ -253,10 +253,6 @@ git commit -m "Add readme; first commit" `git commit` takes everything that was staged using `git add` and stores a copy permanently inside the repository's .git directory. This permanent copy is called a commit or a revision. Git gives it a unique identifier, and the first line of output from git commit displays its short identifier 0dc5442, which is the first few characters of that unique label. -The diagram below gives a flavour of the process we're going through here. - -![Diagram of how assets become tracked, from Research Software Engineering in Python.](https://merely-useful.tech/py-rse/figures/git-cmdline/staging-area.png) - ### Reviewing changes We can now look at the git log to see all of the changes we made: