Python Pandas self join for merge cartesian product to produce all combinations and sum -


i brand new python, seems has lot of flexibility , faster traditional rdbms systems.

working on simple process create random fantasy teams. come rdbms background (oracle sql) , not seem optimal data processing.

i made dataframe using pandas read csv file , have simple dataframe 2 columns -- player, salary:

`                    name  salary 0              jason day   11700 1         dustin johnson   11600 2           rory mcilroy   11400 3          jordan spieth   11100 4         henrik stenson   10500 5         phil mickelson   10200 6            justin rose    9800 7             adam scott    9600 8          sergio garcia    9400 9          rickie fowler    9200` 

what trying via python (pandas) produce combinations of 6 players salary between amount 45000 -- 50000.

in looking python options, found itertools combination interesting, result massive list of combinations without filtering sum of salary.

in traditional sql, massive merge cartesian join w/ sum, players in different spots..

such a, b, c then, c, b, a..

my traditional sql doesn't work enough this:

` select distinct one.name "1",    two.name "2",     three.name "3",       four.name "4",    five.name "5",    six.name "6",    sum(one.salary + two.salary + three.salary + four.salary + five.salary + six.salary) salary      nl.pgachamp2 one,nl.pgachamp2 two,nl.pgachamp2 three, nl.pgachamp2 four,nl.pgachamp2 five,nl.pgachamp2 6  one.name != two.name  , one.name != three.name  , one.name != four.name  , one.name != five.name  , two.name != three.name  , two.name != four.name  , two.name != five.name  , two.name != six.name  , three.name != four.name  , three.name != five.name  , three.name != six.name  , five.name != six.name  , four.name != six.name  , four.name != five.name  , one.name != six.name  group one.name, two.name, three.name, four.name, five.name, six.name` 

is there way in pandas/python?

any documentation can pointed great!

here's non-pandas solution using simple algorithm. generates combinations recursively list of players sorted salary. lets skip generating combinations exceed salary cap.

as pirsquared mentions, there no teams of 6 fall within salary limits stated in question, chose limits generate small number of teams.

#!/usr/bin/env python3  ''' limited combinations      generate combinations of players combined salaries fall within given limits      see http://stackoverflow.com/q/38636460/4014959      written pm 2ring 2016.07.28 '''  data = '''\ 0              jason day   11700 1         dustin johnson   11600 2           rory mcilroy   11400 3          jordan spieth   11100 4         henrik stenson   10500 5         phil mickelson   10200 6            justin rose    9800 7             adam scott    9600 8          sergio garcia    9400 9          rickie fowler    9200 '''  data = [s.split() s in data.splitlines()] all_players = [(' '.join(u[1:-1]), int(u[-1])) u in data] all_players.sort(key=lambda t: t[1]) i, row in enumerate(all_players):     print(i, row) print('- '*40)  def choose_teams(free, num, team=(), value=0):     num -= 1     i, p in enumerate(free):         salary = all_players[p][1]         newvalue = value + salary         if newvalue <= hi:             newteam = team + (p,)             if num == 0:                 if newvalue >= lo:                     yield newteam, newvalue             else:                 yield choose_teams(free[i+1:], num, newteam, newvalue)         else:             break  #salary limits lo, hi = 55000, 60500  #indices of players can chosen team  free = tuple(range(len(all_players)))  i, (t, s) in enumerate(choose_teams(free, 6), 1):     team = [all_players[p] p in t]     names, sals = zip(*team)     assert sum(sals) == s     print(i, t, names, s) 

output

0 ('rickie fowler', 9200) 1 ('sergio garcia', 9400) 2 ('adam scott', 9600) 3 ('justin rose', 9800) 4 ('phil mickelson', 10200) 5 ('henrik stenson', 10500) 6 ('jordan spieth', 11100) 7 ('rory mcilroy', 11400) 8 ('dustin johnson', 11600) 9 ('jason day', 11700) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  1 (0, 1, 2, 3, 4, 5) ('rickie fowler', 'sergio garcia', 'adam scott', 'justin rose', 'phil mickelson', 'henrik stenson') 58700 2 (0, 1, 2, 3, 4, 6) ('rickie fowler', 'sergio garcia', 'adam scott', 'justin rose', 'phil mickelson', 'jordan spieth') 59300 3 (0, 1, 2, 3, 4, 7) ('rickie fowler', 'sergio garcia', 'adam scott', 'justin rose', 'phil mickelson', 'rory mcilroy') 59600 4 (0, 1, 2, 3, 4, 8) ('rickie fowler', 'sergio garcia', 'adam scott', 'justin rose', 'phil mickelson', 'dustin johnson') 59800 5 (0, 1, 2, 3, 4, 9) ('rickie fowler', 'sergio garcia', 'adam scott', 'justin rose', 'phil mickelson', 'jason day') 59900 6 (0, 1, 2, 3, 5, 6) ('rickie fowler', 'sergio garcia', 'adam scott', 'justin rose', 'henrik stenson', 'jordan spieth') 59600 7 (0, 1, 2, 3, 5, 7) ('rickie fowler', 'sergio garcia', 'adam scott', 'justin rose', 'henrik stenson', 'rory mcilroy') 59900 8 (0, 1, 2, 3, 5, 8) ('rickie fowler', 'sergio garcia', 'adam scott', 'justin rose', 'henrik stenson', 'dustin johnson') 60100 9 (0, 1, 2, 3, 5, 9) ('rickie fowler', 'sergio garcia', 'adam scott', 'justin rose', 'henrik stenson', 'jason day') 60200 10 (0, 1, 2, 3, 6, 7) ('rickie fowler', 'sergio garcia', 'adam scott', 'justin rose', 'jordan spieth', 'rory mcilroy') 60500 11 (0, 1, 2, 4, 5, 6) ('rickie fowler', 'sergio garcia', 'adam scott', 'phil mickelson', 'henrik stenson', 'jordan spieth') 60000 12 (0, 1, 2, 4, 5, 7) ('rickie fowler', 'sergio garcia', 'adam scott', 'phil mickelson', 'henrik stenson', 'rory mcilroy') 60300 13 (0, 1, 2, 4, 5, 8) ('rickie fowler', 'sergio garcia', 'adam scott', 'phil mickelson', 'henrik stenson', 'dustin johnson') 60500 14 (0, 1, 3, 4, 5, 6) ('rickie fowler', 'sergio garcia', 'justin rose', 'phil mickelson', 'henrik stenson', 'jordan spieth') 60200 15 (0, 1, 3, 4, 5, 7) ('rickie fowler', 'sergio garcia', 'justin rose', 'phil mickelson', 'henrik stenson', 'rory mcilroy') 60500 16 (0, 2, 3, 4, 5, 6) ('rickie fowler', 'adam scott', 'justin rose', 'phil mickelson', 'henrik stenson', 'jordan spieth') 60400 

if you're using older version of python doesn't support yield from syntax can replace

yield choose_teams(free[i+1:], num, newteam, newvalue) 

with

for t, v in choose_teams(free[i+1:], num, newteam, newvalue):     yield t, v 

Comments

Popular posts from this blog

magento2 - Magento 2 admin grid add filter to collection -

Android volley - avoid multiple requests of the same kind to the server? -

Combining PHP Registration and Login into one class with multiple functions in one PHP file -