Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Can only use .dt accessor with datetimelike values #2

Open
shane-kercheval opened this issue Jul 27, 2023 · 1 comment
Open

Comments

@shane-kercheval
Copy link

shane-kercheval commented Jul 27, 2023

Python 3.11; pandas 2.0.3; numpy 1.25.1


When running the first cell in Chapter 8 in Simulated-Data.ipynb, I got the following error in a few places: Can only use .dt accessor with datetimelike values

I was able to get it to work by adding pd.to_datetime in various places. I originally adding pd.to_datetime to the date column but that caused other issues. Not sure if there is a better way.

date = pd.date_range("2021-05-01", "2021-07-31", freq="D")
cohorts = pd.to_datetime(["2021-05-15", "2021-06-04", "2021-06-20"]).date
poss_regions = ["S", "N", "W", "E"]

reg_ps = dict(zip(poss_regions,    [.3, .6, .7, .8]))
reg_fe = dict(zip(poss_regions,    [20,  16,  8,  2]))
reg_trend = dict(zip(poss_regions, [0,  0.2,  .4,  .6]))

units = np.array(range(1, 200+1))

np.random.seed(123)

unit_reg = np.random.choice(poss_regions, len(units))
exp_trend = np.random.exponential(0.01, len(units))
treated_unit = np.random.binomial(1, np.vectorize(reg_ps.__getitem__)(unit_reg))

# staggered addopton dgp
df = pd.DataFrame(dict(
    date = np.tile(date.date, len(units)),
    city = np.repeat(units, len(date)),
    region = np.repeat(unit_reg, len(date)),
    treated_unit = np.repeat(treated_unit, len(date)),
    cohort = np.repeat(np.random.choice(cohorts, len(units)), len(date)),
    eff_heter = np.repeat(np.random.exponential(1, size=len(units)), len(date)),
    
    unit_fe = np.repeat(np.random.normal(0, 2, size=len(units)), len(date)),
    time_fe = np.tile(np.random.normal(size=len(date)), len(units)),
    week_day = np.tile(date.weekday, len(units)),
    w_seas = np.tile(abs(5-date.weekday) % 7, len(units)),
)).assign(
    reg_fe = lambda d: d["region"].map(reg_fe), 
    reg_trend = lambda d: d["region"].map(reg_trend), 
    reg_ps = lambda d: d["region"].map(reg_ps), 
    trend = lambda d: (pd.to_datetime(d["date"]) - pd.to_datetime(d["date"]).min()).dt.days,
    day = lambda d: (pd.to_datetime(d["date"]) - pd.to_datetime(d["date"]).min()).dt.days,
    cohort = lambda d: np.where(d["treated_unit"] == 1, d["cohort"], pd.to_datetime("2100-01-01")),
).assign(
    treated = lambda d: ((pd.to_datetime(d["date"]) >= d["cohort"]) & d["treated_unit"] == 1).astype(int),
).assign(
    y0 = lambda d: np.round(10 
                            + d["treated_unit"]
                            + d["reg_trend"]*d["trend"]/2
                            + d["unit_fe"] 
                            + 0.4*d["time_fe"] 
                            + 2*d["reg_fe"]
                            + d["w_seas"]/5, 0),
).assign(
#     y0 = lambda d: np.round(d["y0"] + d.groupby("city")["y0"].shift(1).fillna(0)*0.2, 0)
).assign(
    y1 = lambda d: d["y0"] + np.minimum(0.2*(np.maximum(0, (pd.to_datetime(d["date"]) - pd.to_datetime(d["cohort"])).dt.days)), 1)*d["eff_heter"]*2
).assign(
    tau = lambda d: d["y1"] - d["y0"],
    downloads = lambda d: np.where(d["treated"] == 1, d["y1"], d["y0"]) + np.random.normal(0,.7,len(d)),
#     date = lambda d: pd.to_datetime(d["date"]),
).round({"downloads": 0})

# # # df.head()

and then in the second cell I had to change

.assign(post=lambda d: (d["date"] >= d["cohort"]).astype(int))

to

.assign(post=lambda d: (pd.to_datetime(d["date"]) >= d["cohort"]).astype(int))
@shane-kercheval
Copy link
Author

Just a friendly suggestion; it would be handy for readers if the python version you used was specified in the readme and a requirements.txt was provided in the repo with specific package versions, to help ensure we can run the code without issues.

I'm guessing the code runs in your environment and so I assume the issue is caused by different pandas versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant