python - Pandas: create dataframe using value_counts -


i have data

age 32 16 39 39 23 36 29 26 43 34 35 50 29 29 31 42 53 

i need smth image can

df.age.value_counts() ,

100. * df.age.value_counts() / len(df.age) 

but how can union , give name columns?

you can use cut agg:

#helper df min , max ages, necessary add category total df1 = pd.dataframe({'g':['14 yo , younger','15-19','20-24','25-29','30-34',                          '35-39','40-44','45-49','50-54','55-59','60-64','65+','total'],                       'min':[0, 15,20,25,30,35,40,45,50,55,60,65,np.nan],                       'max':[14,19,24,29,34,39,44,49,54,59,64,120, np.nan]})  print (df1)                     g    max   min 0   14 yo , younger   14.0   0.0 1               15-19   19.0  15.0 2               20-24   24.0  20.0 3               25-29   29.0  25.0 4               30-34   34.0  30.0 5               35-39   39.0  35.0 6               40-44   44.0  40.0 7               45-49   49.0  45.0 8               50-54   54.0  50.0 9               55-59   59.0  55.0 10              60-64   64.0  60.0 11                65+  120.0  65.0 12              total    nan   nan 
cutoff = np.hstack([np.array(df1.min[0]), df1.max.values]) labels = df1.g.values  df['groups'] = pd.cut(df.age, bins=cutoff, labels=labels, right=true, include_lowest=true) print (df)     age groups 0    32  30-34 1    16  15-19 2    39  35-39 3    39  35-39 4    23  20-24 5    36  35-39 6    29  25-29 7    26  25-29 8    43  40-44 9    34  30-34 10   35  35-39 11   50  50-54 12   29  25-29 13   29  25-29 14   31  30-34 15   42  40-44 16   53  50-54 
df = df.groupby('groups')['groups']        .agg({'total':[len, lambda x: len(x)/df.shape[0] * 100 ]})        .rename(columns={'len':'n', '<lambda>':'%'})  #last total row df.ix['total'] = df.sum()  print (df)                      total                                   n           % groups                              14 yo , younger   0.0    0.000000 15-19               1.0    5.882353 20-24               1.0    5.882353 25-29               4.0   23.529412 30-34               3.0   17.647059 35-39               4.0   23.529412 40-44               2.0   11.764706 45-49               0.0    0.000000 50-54               2.0   11.764706 55-59               0.0    0.000000 60-64               0.0    0.000000 65+                 0.0    0.000000 total              17.0  100.000000 

edit1:

solution size scale better:

df1 = df.groupby('groups').size().to_frame() df1.columns = pd.multiindex.from_arrays(('total','n')) df1.ix[:,('total','%')] = 100 * df1.ix[:,('total','n')] / df.shape[0] df1.ix['total'] = df1.sum() print (df1)                   total                                   n           % groups                              14 yo , younger   0.0    0.000000 15-19               1.0    5.882353 20-24               1.0    5.882353 25-29               4.0   23.529412 30-34               3.0   17.647059 35-39               4.0   23.529412 40-44               2.0   11.764706 45-49               0.0    0.000000 50-54               2.0   11.764706 55-59               0.0    0.000000 60-64               0.0    0.000000 65+                 0.0    0.000000 total              17.0  100.000000 

Comments

Popular posts from this blog

magento2 - Magento 2 admin grid add filter to collection -

Android volley - avoid multiple requests of the same kind to the server? -

Combining PHP Registration and Login into one class with multiple functions in one PHP file -