python - Pandas: create dataframe using value_counts -
i have data
age 32 16 39 39 23 36 29 26 43 34 35 50 29 29 31 42 53
df.age.value_counts()
,
100. * df.age.value_counts() / len(df.age)
but how can union , give name columns?
#helper df min , max ages, necessary add category total df1 = pd.dataframe({'g':['14 yo , younger','15-19','20-24','25-29','30-34', '35-39','40-44','45-49','50-54','55-59','60-64','65+','total'], 'min':[0, 15,20,25,30,35,40,45,50,55,60,65,np.nan], 'max':[14,19,24,29,34,39,44,49,54,59,64,120, np.nan]}) print (df1) g max min 0 14 yo , younger 14.0 0.0 1 15-19 19.0 15.0 2 20-24 24.0 20.0 3 25-29 29.0 25.0 4 30-34 34.0 30.0 5 35-39 39.0 35.0 6 40-44 44.0 40.0 7 45-49 49.0 45.0 8 50-54 54.0 50.0 9 55-59 59.0 55.0 10 60-64 64.0 60.0 11 65+ 120.0 65.0 12 total nan nan
cutoff = np.hstack([np.array(df1.min[0]), df1.max.values]) labels = df1.g.values df['groups'] = pd.cut(df.age, bins=cutoff, labels=labels, right=true, include_lowest=true) print (df) age groups 0 32 30-34 1 16 15-19 2 39 35-39 3 39 35-39 4 23 20-24 5 36 35-39 6 29 25-29 7 26 25-29 8 43 40-44 9 34 30-34 10 35 35-39 11 50 50-54 12 29 25-29 13 29 25-29 14 31 30-34 15 42 40-44 16 53 50-54
df = df.groupby('groups')['groups'] .agg({'total':[len, lambda x: len(x)/df.shape[0] * 100 ]}) .rename(columns={'len':'n', '<lambda>':'%'}) #last total row df.ix['total'] = df.sum() print (df) total n % groups 14 yo , younger 0.0 0.000000 15-19 1.0 5.882353 20-24 1.0 5.882353 25-29 4.0 23.529412 30-34 3.0 17.647059 35-39 4.0 23.529412 40-44 2.0 11.764706 45-49 0.0 0.000000 50-54 2.0 11.764706 55-59 0.0 0.000000 60-64 0.0 0.000000 65+ 0.0 0.000000 total 17.0 100.000000
edit1:
solution size
scale better:
df1 = df.groupby('groups').size().to_frame() df1.columns = pd.multiindex.from_arrays(('total','n')) df1.ix[:,('total','%')] = 100 * df1.ix[:,('total','n')] / df.shape[0] df1.ix['total'] = df1.sum() print (df1) total n % groups 14 yo , younger 0.0 0.000000 15-19 1.0 5.882353 20-24 1.0 5.882353 25-29 4.0 23.529412 30-34 3.0 17.647059 35-39 4.0 23.529412 40-44 2.0 11.764706 45-49 0.0 0.000000 50-54 2.0 11.764706 55-59 0.0 0.000000 60-64 0.0 0.000000 65+ 0.0 0.000000 total 17.0 100.000000
Comments
Post a Comment