Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

period_range() bug? #7817

Closed
ifmihai opened this issue Jul 22, 2014 · 18 comments
Closed

period_range() bug? #7817

ifmihai opened this issue Jul 22, 2014 · 18 comments

Comments

@ifmihai
Copy link

ifmihai commented Jul 22, 2014

I did:

63 In : df = P.DataFrame(index=P.period_range('2000-1-1 10:20', '2005-1-1 12:00'))

64 In : df.index[0]
64 Out: Period('2000-01-01', 'D')

65 In : df.index[0].hour
65 Out: 0

it should have been 10, right?

why does period_range() loses time information?

I was recommended to use periods and period_range() because of pandas Timestamp limitation (nanoseconds time units)

now I'm stuck, because I need datetimes outside Timestamp range

ps.
would it be too drastic to change time units from nanoseconds to microseconds in Timestamp? so that we would be happier? :P
see also #7307

@jreback
Copy link
Contributor

jreback commented Jul 22, 2014

docs are here

you need to specify the freq
.

In [81]: period_range('2000-1-1 10:20','2005-1-1 12:00',freq='H')
Out[81]: 
<class 'pandas.tseries.period.PeriodIndex'>
[2000-01-01 10:00, ..., 2005-01-01 12:00]
Length: 43851, Freq: H

In [82]: period_range('2000-1-1 10:20','2005-1-1 12:00',freq='H')[0].hour
Out[82]: 10

@jreback jreback closed this as completed Jul 22, 2014
@ifmihai
Copy link
Author

ifmihai commented Jul 22, 2014

but i don't need 'H' frequency

I need daily frequency that starts at some hour
wasn't the example clear enough?

@jreback
Copy link
Contributor

jreback commented Jul 22, 2014

what are you trying to do with the range? e.g. show the output frame you are expecting

@ifmihai
Copy link
Author

ifmihai commented Jul 22, 2014

I don't know what normalize does, I will look into it.
it already becomes too complicated.

i need to make a dataframe with Timestamp index functionality
daily, starting at 12:00 for example
then to calculate ephemeris for it as a start
and then adding other data

when it comes to planets or geological studies, Timestamp is so limited I want to scream

@jreback
Copy link
Contributor

jreback commented Jul 22, 2014

and the example I gave give you and hourly frequency, no?

show your start and end dates and a sample of the index.

@ifmihai
Copy link
Author

ifmihai commented Jul 22, 2014

what I do usually, an example that works with Timestamps:

df = P.DataFrame(index=P.date_range('1950-1-1 12:00', '2050-1-1'))
df = add_ephemeris_function(df)  # which updates the dataframe with planetary information

the problem appears when the range is too big for Timestamp

@jreback
Copy link
Contributor

jreback commented Jul 22, 2014

What is the point of the hour if you don't have hourly frequency?

@ifmihai
Copy link
Author

ifmihai commented Jul 22, 2014

if you calculate moon position at midnight, period_range() will do I guess

but moon travels 6 degrees in 12 hours

so if I need planet positions as 12:00, daily
period_range() is useless

date_range() works, but it's limited

@jreback
Copy link
Contributor

jreback commented Jul 22, 2014

so you prob want something like this: #7811
contributions are welcome

@ifmihai
Copy link
Author

ifmihai commented Jul 22, 2014

i'm not sure I need that (although i didn't understood his need exactly)

In my mind,
a period is a start point + span

i don't understand why the start point has to be 00:00 for daily

I see that if I do:

102 In : P.period_range('2000-1-13', '2002-1-26', freq='M')[0]
102 Out: Period('2000-01', 'M')

103 In : P.period_range('2000-1-13', '2002-1-26', freq='M')[0].day
103 Out: 31

it doesnt make sense to me at all.

in my mind
a period has a starting point (in this case 2000-1-13)
and then we keep adding spans of one month
2000-2-13, 2000-3-13, and so on

myeah, so no PeriodIndex for me, it doesnt click for me

@jreback
Copy link
Contributor

jreback commented Jul 22, 2014

@ifmihai you might be better off asking colleagues / mailing lists for your field on how they use pandas.

@ifmihai
Copy link
Author

ifmihai commented Jul 22, 2014

@jreback it's a good point, i will do that
my main frustration is that i have a lot of code which works nice but it's based on Timestamps
I made a poor decision apparently at the beginning
I want to avoid changing all the code
plus, I don't have an alternative... yet...
except maybe mxdatetime from egenix, but it appears it doesnt play well with pandas

@jreback
Copy link
Contributor

jreback commented Jul 22, 2014

@ifmihai its unclear what is not working for you. If you supply a complete example you might get some more help. Periods do not have 100% feature compat with Timestamps, because they don't have the user base. We are working to fix that, and if something is missing (see the issue I ref above), then it will be addressed.

But, I stress that Periods are exactly what you are looking and they were setup to span ANY timestamp.

@ifmihai
Copy link
Author

ifmihai commented Jul 22, 2014

my need is quite simple

thinking in daily frequency
having a starting datetime, like '2014-7-22 12:00'
i need to create an index (with Timestamps functionality):
2014-7-22 12:00
2014-7-23 12:00
2014-7-24 12:00
...
etc

date_range() is capable of starting from 12:00 and then add to datetimes the desired frequency
period_range() insists on changing the starting point, making it 2014-7-22 00:00

it doesn't feel right for me,
that's not a period
that's a particular period, which starts at 00:00, just because

a period, I repeat, in my mind is: starting point + freq span

I would want period_range() or any function which generates a range
to not mess with starting point, to let it be exactly as it was given

I hope I'm clear now, I didn't know it wasnt clear, sorry

@jreback
Copy link
Contributor

jreback commented Jul 22, 2014

You want a 12H frequency Period, which is currently under development. You can do this with timestamps, but not yet with Periods (it only allows a 1H freq atm). simply because it hasn't been done.

@ifmihai
Copy link
Author

ifmihai commented Jul 22, 2014

good to know this
I appreciate your help

ps.
wouldn't this '12H' frequency give also the midnights in my example above?
I mean
2014-7-22 12:00
2014-7-23 00:00
2014-7-23 12:00
etc..

I'm not sure what it will give

@jreback
Copy link
Contributor

jreback commented Jul 22, 2014

yes. But you can filter them out. The issue is that the underlying Period representation doesn't consider '2013-1-1 12', freq='D' (it simply drops the time). not 100% sure if this is changable (easily) or not.

a '24H' freq might also work for you. Would have to experiment. (but again that's the same issue).

@7starsea
Copy link

The outputs of pd.period_range('09:30:00', '10:00:00', freq='3S') in v1.1.1 and in v0.25.2 are different, and the output in v0.25.2 is what I expected.

while the outputs of pd.period_range('2001-01-01 09:30:00', '2001-01-01 10:00:00', freq='3S') in v1.1.1 and in v0.25.2 are the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants