Python Pandas self join for merge cartesian product to produce all combinations and sum -
i brand new python, seems has lot of flexibility , faster traditional rdbms systems.
working on simple process create random fantasy teams. come rdbms background (oracle sql) , not seem optimal data processing.
i made dataframe using pandas read csv file , have simple dataframe 2 columns -- player, salary:
` name salary 0 jason day 11700 1 dustin johnson 11600 2 rory mcilroy 11400 3 jordan spieth 11100 4 henrik stenson 10500 5 phil mickelson 10200 6 justin rose 9800 7 adam scott 9600 8 sergio garcia 9400 9 rickie fowler 9200`
what trying via python (pandas) produce combinations of 6 players salary between amount 45000 -- 50000.
in looking python options, found itertools combination interesting, result massive list of combinations without filtering sum of salary.
in traditional sql, massive merge cartesian join w/ sum, players in different spots..
such a, b, c then, c, b, a..
my traditional sql doesn't work enough this:
` select distinct one.name "1", two.name "2", three.name "3", four.name "4", five.name "5", six.name "6", sum(one.salary + two.salary + three.salary + four.salary + five.salary + six.salary) salary nl.pgachamp2 one,nl.pgachamp2 two,nl.pgachamp2 three, nl.pgachamp2 four,nl.pgachamp2 five,nl.pgachamp2 6 one.name != two.name , one.name != three.name , one.name != four.name , one.name != five.name , two.name != three.name , two.name != four.name , two.name != five.name , two.name != six.name , three.name != four.name , three.name != five.name , three.name != six.name , five.name != six.name , four.name != six.name , four.name != five.name , one.name != six.name group one.name, two.name, three.name, four.name, five.name, six.name`
is there way in pandas/python?
any documentation can pointed great!
here's non-pandas solution using simple algorithm. generates combinations recursively list of players sorted salary. lets skip generating combinations exceed salary cap.
as pirsquared mentions, there no teams of 6 fall within salary limits stated in question, chose limits generate small number of teams.
#!/usr/bin/env python3 ''' limited combinations generate combinations of players combined salaries fall within given limits see http://stackoverflow.com/q/38636460/4014959 written pm 2ring 2016.07.28 ''' data = '''\ 0 jason day 11700 1 dustin johnson 11600 2 rory mcilroy 11400 3 jordan spieth 11100 4 henrik stenson 10500 5 phil mickelson 10200 6 justin rose 9800 7 adam scott 9600 8 sergio garcia 9400 9 rickie fowler 9200 ''' data = [s.split() s in data.splitlines()] all_players = [(' '.join(u[1:-1]), int(u[-1])) u in data] all_players.sort(key=lambda t: t[1]) i, row in enumerate(all_players): print(i, row) print('- '*40) def choose_teams(free, num, team=(), value=0): num -= 1 i, p in enumerate(free): salary = all_players[p][1] newvalue = value + salary if newvalue <= hi: newteam = team + (p,) if num == 0: if newvalue >= lo: yield newteam, newvalue else: yield choose_teams(free[i+1:], num, newteam, newvalue) else: break #salary limits lo, hi = 55000, 60500 #indices of players can chosen team free = tuple(range(len(all_players))) i, (t, s) in enumerate(choose_teams(free, 6), 1): team = [all_players[p] p in t] names, sals = zip(*team) assert sum(sals) == s print(i, t, names, s)
output
0 ('rickie fowler', 9200) 1 ('sergio garcia', 9400) 2 ('adam scott', 9600) 3 ('justin rose', 9800) 4 ('phil mickelson', 10200) 5 ('henrik stenson', 10500) 6 ('jordan spieth', 11100) 7 ('rory mcilroy', 11400) 8 ('dustin johnson', 11600) 9 ('jason day', 11700) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 1 (0, 1, 2, 3, 4, 5) ('rickie fowler', 'sergio garcia', 'adam scott', 'justin rose', 'phil mickelson', 'henrik stenson') 58700 2 (0, 1, 2, 3, 4, 6) ('rickie fowler', 'sergio garcia', 'adam scott', 'justin rose', 'phil mickelson', 'jordan spieth') 59300 3 (0, 1, 2, 3, 4, 7) ('rickie fowler', 'sergio garcia', 'adam scott', 'justin rose', 'phil mickelson', 'rory mcilroy') 59600 4 (0, 1, 2, 3, 4, 8) ('rickie fowler', 'sergio garcia', 'adam scott', 'justin rose', 'phil mickelson', 'dustin johnson') 59800 5 (0, 1, 2, 3, 4, 9) ('rickie fowler', 'sergio garcia', 'adam scott', 'justin rose', 'phil mickelson', 'jason day') 59900 6 (0, 1, 2, 3, 5, 6) ('rickie fowler', 'sergio garcia', 'adam scott', 'justin rose', 'henrik stenson', 'jordan spieth') 59600 7 (0, 1, 2, 3, 5, 7) ('rickie fowler', 'sergio garcia', 'adam scott', 'justin rose', 'henrik stenson', 'rory mcilroy') 59900 8 (0, 1, 2, 3, 5, 8) ('rickie fowler', 'sergio garcia', 'adam scott', 'justin rose', 'henrik stenson', 'dustin johnson') 60100 9 (0, 1, 2, 3, 5, 9) ('rickie fowler', 'sergio garcia', 'adam scott', 'justin rose', 'henrik stenson', 'jason day') 60200 10 (0, 1, 2, 3, 6, 7) ('rickie fowler', 'sergio garcia', 'adam scott', 'justin rose', 'jordan spieth', 'rory mcilroy') 60500 11 (0, 1, 2, 4, 5, 6) ('rickie fowler', 'sergio garcia', 'adam scott', 'phil mickelson', 'henrik stenson', 'jordan spieth') 60000 12 (0, 1, 2, 4, 5, 7) ('rickie fowler', 'sergio garcia', 'adam scott', 'phil mickelson', 'henrik stenson', 'rory mcilroy') 60300 13 (0, 1, 2, 4, 5, 8) ('rickie fowler', 'sergio garcia', 'adam scott', 'phil mickelson', 'henrik stenson', 'dustin johnson') 60500 14 (0, 1, 3, 4, 5, 6) ('rickie fowler', 'sergio garcia', 'justin rose', 'phil mickelson', 'henrik stenson', 'jordan spieth') 60200 15 (0, 1, 3, 4, 5, 7) ('rickie fowler', 'sergio garcia', 'justin rose', 'phil mickelson', 'henrik stenson', 'rory mcilroy') 60500 16 (0, 2, 3, 4, 5, 6) ('rickie fowler', 'adam scott', 'justin rose', 'phil mickelson', 'henrik stenson', 'jordan spieth') 60400
if you're using older version of python doesn't support yield from
syntax can replace
yield choose_teams(free[i+1:], num, newteam, newvalue)
with
for t, v in choose_teams(free[i+1:], num, newteam, newvalue): yield t, v
Comments
Post a Comment