python - How to implement non overlapping rolling functionality on MultiIndex DataFrame -
so far i've found this question doesn't solve problem due facts that:
- i have multiindex dataframe
- the inner level has different amount of data each outer level, can't use
len()
i have following dataframe
outer inner value 1 2.000000 2 4.000000 3 6.000000 4 8.000000 b 1 3.000000 b 2 6.000000 b 3 9.000000 b 4 12.000000 b 5 15.000000
i want sum last 2 values each outer
in non-overlapping manner. a
want sum inner
's 3 + 4, 1 + 2. b
want sum inner
's 4 + 5, 2 + 3. note pairwise sum supposed start last value. resulting in
outer inner value 2 6.000000 4 14.000000 b 3 15.000000 b 5 27.000000
groupby custom resample function
you need custom resampling this. little hacky might work.
- remove
mulitindex
ing deal regular columngroupby()
s groupby()
'outer'
,.apply()
custom function each group- the custom function takes group
- determine length of group
- select length backwards
- turn index seconds
- resample dataframe every 2 samples
resample(...)
.sum()
- resample
inner
column every 2resample(...)
.last()
preserve original index numbers - convert index '
inner'
- even though removed
multiindex
,multiindex
still returnedgroupby(...).apply()
note: there issue rolling
, slides thru values instead of stepping thru values (in non-overlapping method). using resample
allows this. resample time based index needs represented seconds.
example
import math import pandas pd df = pd.dataframe({ 'outer': ['a','a','a','a','b','b','b','b','b'], 'inner': [1,2,3,4,1,2,3,4,5], 'value': [2.00,4.00,6.00,8.00,3.00,6.00,9.00,12.00,15.00] }) def f(g): even_length = int(2.0 * math.floor(len(g) / 2.0)) every_two_backwards = g.iloc[-even_length:] every_two_backwards.index = pd.timedeltaindex(every_two_backwards.index * 1000000000.0) resample_via_sum = every_two_backwards.resample('2s').sum().dropna() resample_via_sum['inner'] = every_two_backwards.resample('2s').last() resample_via_sum = resample_via_sum.set_index('inner') return resample_via_sum resampled_df = df.groupby(['outer']).apply(f) print resampled_df
value outer inner 2.0 6.0 4.0 14.0 b 3.0 15.0 5.0 27.0
Comments
Post a Comment