According to my first ever built model. Aday Mara will be the steal of the first round and Cameron Boozer is the best player in this draft. Thanks for this canβt wait to keep playing with it.
The Model uses a Gradient Boosting Regressor trained on 719 NCAA prospects (2010β2025) with 18 features including per-100-possession shooting splits, playmaking, age, height, strength of schedule, and recruit rank percentile. It predicts per-possession NBA impact, then converts that to probabilities across 4 outcome tiers.
Cameron Boozer β +2.15 impact, 23.8% superstar probability
Aday Mara β +1.07, projects as likely starter/all-star ceiling
Allen Graves & Caleb Wilson β solid starter projections
How are you thinking about the timeline of the target variable? At first pass, I figure it makes sense to try to predict career peaks or totals of whatever the target is, but from a drafting teamβs perspective, the decision-relevant window is probably relatively narrower, right?
Would you consider limiting the target to 3rd or 4th season impact (or something) to reflect the rookie extension decision point that teams have to make? Any drawbacks youβve come across here?
It's less noisy than looking at 3/4 years, and the correlation between these must be fairly high, anyway.
And looking at peak windows would require choosing an arbitrary set of years, I assume(?)
I understand that players switch teams after their rookie contract. But I'd much rather try building a secondy model that tries to predict future salary.
Having said that, it would be interesting to know how many players extend with the teams that drafted them. If the number is extremely low, I should maybe reconsider
I *am* suggesting to normalize. But it doesn't necessarily have to be by z-scoring. One can potentially also use e.g. MinMax scaling and use ranges such as [0, 1]
According to my first ever built model. Aday Mara will be the steal of the first round and Cameron Boozer is the best player in this draft. Thanks for this canβt wait to keep playing with it.
The Model uses a Gradient Boosting Regressor trained on 719 NCAA prospects (2010β2025) with 18 features including per-100-possession shooting splits, playmaking, age, height, strength of schedule, and recruit rank percentile. It predicts per-possession NBA impact, then converts that to probabilities across 4 outcome tiers.
Cameron Boozer β +2.15 impact, 23.8% superstar probability
Aday Mara β +1.07, projects as likely starter/all-star ceiling
Allen Graves & Caleb Wilson β solid starter projections
Great work.
I hope the article and the data were helpful for your project
How are you thinking about the timeline of the target variable? At first pass, I figure it makes sense to try to predict career peaks or totals of whatever the target is, but from a drafting teamβs perspective, the decision-relevant window is probably relatively narrower, right?
Would you consider limiting the target to 3rd or 4th season impact (or something) to reflect the rookie extension decision point that teams have to make? Any drawbacks youβve come across here?
I simply look at (age adjusted) career average.
It's less noisy than looking at 3/4 years, and the correlation between these must be fairly high, anyway.
And looking at peak windows would require choosing an arbitrary set of years, I assume(?)
I understand that players switch teams after their rookie contract. But I'd much rather try building a secondy model that tries to predict future salary.
Having said that, it would be interesting to know how many players extend with the teams that drafted them. If the number is extremely low, I should maybe reconsider
>Note that some of these statistics have to be normalized β for example, βz-scoredβ
Why would you choose to not normalize a non-categorical feature for this task?
I *am* suggesting to normalize. But it doesn't necessarily have to be by z-scoring. One can potentially also use e.g. MinMax scaling and use ranges such as [0, 1]