python - How to implement non overlapping rolling functionality on MultiIndex DataFrame -


so far i've found this question doesn't solve problem due facts that:

  1. i have multiindex dataframe
  2. the inner level has different amount of data each outer level, can't use len()

i have following dataframe

outer inner    value               1     2.000000       2     4.000000       3     6.000000       4     8.000000   b     1     3.000000   b     2     6.000000   b     3     9.000000   b     4     12.000000   b     5     15.000000 

i want sum last 2 values each outer in non-overlapping manner. a want sum inner's 3 + 4, 1 + 2. b want sum inner's 4 + 5, 2 + 3. note pairwise sum supposed start last value. resulting in

outer inner    value               2     6.000000       4    14.000000   b     3    15.000000   b     5    27.000000 

groupby custom resample function

you need custom resampling this. little hacky might work.

  1. remove mulitindexing deal regular column groupby()s
  2. groupby() 'outer' , .apply() custom function each group
  3. the custom function takes group
    1. determine length of group
    2. select length backwards
    3. turn index seconds
    4. resample dataframe every 2 samples resample(...).sum()
    5. resample inner column every 2 resample(...).last() preserve original index numbers
    6. convert index 'inner'
  4. even though removed multiindex, multiindex still returned groupby(...).apply()

note: there issue rolling, slides thru values instead of stepping thru values (in non-overlapping method). using resample allows this. resample time based index needs represented seconds.

example

import math import pandas pd  df = pd.dataframe({     'outer': ['a','a','a','a','b','b','b','b','b'],     'inner': [1,2,3,4,1,2,3,4,5],     'value': [2.00,4.00,6.00,8.00,3.00,6.00,9.00,12.00,15.00] })  def f(g):     even_length = int(2.0 * math.floor(len(g) / 2.0))     every_two_backwards = g.iloc[-even_length:]     every_two_backwards.index = pd.timedeltaindex(every_two_backwards.index * 1000000000.0)     resample_via_sum = every_two_backwards.resample('2s').sum().dropna()     resample_via_sum['inner'] = every_two_backwards.resample('2s').last()     resample_via_sum = resample_via_sum.set_index('inner')      return resample_via_sum  resampled_df = df.groupby(['outer']).apply(f)  print resampled_df 

             value outer inner            2.0      6.0       4.0     14.0 b     3.0     15.0       5.0     27.0 

Comments

Popular posts from this blog

magento2 - Magento 2 admin grid add filter to collection -

Android volley - avoid multiple requests of the same kind to the server? -

Combining PHP Registration and Login into one class with multiple functions in one PHP file -