I want to create a model that predicts the performance of a player in a match based on bast performances.
I grouped the past performances 2 date periods. Date period one contains match performances for 0 to 40 days ago, and date period 2 contains events from 40 to 200 days ago.
The expectation is that more recent performances should weight higher when it comes to predicting his performance in his next match.
However all player does not play an equal amount of games over the periods. Sometimes a player might play 0 or 1 games within the past 40 days. In that case it seems plausible that that the predictive model would weight the average performance from games played 40-200 days ago higher.
But in cases where a player has played a lot of recent matches (within 40 days) it would make sense that it almost only looked at the performances played within that period. (at least that is the hypothesy).
Below is a short table of this example (which shows a players grouped previous performances along with his performance value in a specific match).
(DP = Date period, avg_perf = average past performance during the date period, DP1_played shows games played from 0 to 40 days ago)
Player_id Match_Perf DP1_played DP1_avg_perf DP2_played DP2_avg_perf 1 300 10 400 40 200 2 125 2 60 35 100 1 250 11 380 41 200
Assuming we have a tons of rows with different players from different games. What is the best way to set up something like a multiple regression model in order to predict the match performance?
(I am using python).