6 Comments
User's avatar
AM's avatar
Apr 17Edited

According to my first ever built model. Aday Mara will be the steal of the first round and Cameron Boozer is the best player in this draft. Thanks for this can’t wait to keep playing with it.

The Model uses a Gradient Boosting Regressor trained on 719 NCAA prospects (2010–2025) with 18 features including per-100-possession shooting splits, playmaking, age, height, strength of schedule, and recruit rank percentile. It predicts per-possession NBA impact, then converts that to probabilities across 4 outcome tiers.

Cameron Boozer β€” +2.15 impact, 23.8% superstar probability

Aday Mara β€” +1.07, projects as likely starter/all-star ceiling

Allen Graves & Caleb Wilson β€” solid starter projections

Jeremias Engelmann's avatar

Great work.

I hope the article and the data were helpful for your project

Luke McCartney's avatar

How are you thinking about the timeline of the target variable? At first pass, I figure it makes sense to try to predict career peaks or totals of whatever the target is, but from a drafting team’s perspective, the decision-relevant window is probably relatively narrower, right?

Would you consider limiting the target to 3rd or 4th season impact (or something) to reflect the rookie extension decision point that teams have to make? Any drawbacks you’ve come across here?

Jeremias Engelmann's avatar

I simply look at (age adjusted) career average.

It's less noisy than looking at 3/4 years, and the correlation between these must be fairly high, anyway.

And looking at peak windows would require choosing an arbitrary set of years, I assume(?)

I understand that players switch teams after their rookie contract. But I'd much rather try building a secondy model that tries to predict future salary.

Having said that, it would be interesting to know how many players extend with the teams that drafted them. If the number is extremely low, I should maybe reconsider

Sharps Research's avatar

>Note that some of these statistics have to be normalized β€” for example, β€œz-scored”

Why would you choose to not normalize a non-categorical feature for this task?

Jeremias Engelmann's avatar

I *am* suggesting to normalize. But it doesn't necessarily have to be by z-scoring. One can potentially also use e.g. MinMax scaling and use ranges such as [0, 1]