pandas - calculating top 10 products for a given query based on its priority in python -
suppose given dataframe like:
query productid priority index 0 3ds 2125233 0.018946 1 rca 2009324 0.027599 2 nook 1517163 0.009443 3 rca 2877125 0.012054 4 rca 2877134 0.005557 5 flatscreentvs 2416092 0.011961 6 macbook 3108172 0.010459 7 3ds 2264036 0.165948 8 rca 8280834 0.004006 9 memorycard 2740208 0.013744 10 acpowercord 2584273 0.006865 11 zaggiphone 1230537 0.136073 12 watchthethrone 3168067 0.104679 13 remotecontrolextender 7997055 0.113058 14 camcorder 2009041 0.017809 15 3ds 1988047 0.031711 16 3ds 1686079 0.043783 17 wirelessheadphones 3770439 0.014714 18 wirelessheadphones 2602403 0.008525 19 samsung40 2126065 0.018066
i want find top 2 product_ids
on basis of priority respect given query.
for eg. if have query=3ds
top 2 products should be:
1. 1988047 2. 1686079
this equivalent oracle's row_number() analytic function:
in [172]: df.assign(rn=df.sort_values('priority', ascending=0).groupby('query').cumcount() + 1).query('rn < 3').sort_values(['query','rn']) out[172]: query productid priority rn index 7 3ds 2264036 0.165948 1 16 3ds 1686079 0.043783 2 10 acpowercord 2584273 0.006865 1 14 camcorder 2009041 0.017809 1 5 flatscreentvs 2416092 0.011961 1 6 macbook 3108172 0.010459 1 9 memorycard 2740208 0.013744 1 2 nook 1517163 0.009443 1 1 rca 2009324 0.027599 1 3 rca 2877125 0.012054 2 13 remotecontrolextender 7997055 0.113058 1 19 samsung40 2126065 0.018066 1 12 watchthethrone 3168067 0.104679 1 17 wirelessheadphones 3770439 0.014714 1 18 wirelessheadphones 2602403 0.008525 2 11 zaggiphone 1230537 0.136073 1
show productid
selected query
:
in [180]: (df.assign(rn=df.sort_values('priority', ascending=0).groupby('query').cumcount() + 1) .....: .query('query=="3ds" , rn < 3')['productid'] .....: ) out[180]: index 7 2264036 16 1686079 name: productid, dtype: int64
Comments
Post a Comment