Tagged datascience


On "Meritocracy" and Machine Learning


The bourgeoisie naturally conceives the world in which it is supreme to be the best - The Communist Manifesto

thinking about how it's pretty common to critique the total failure of systems that are naïvely described as "merotocratic" to actually elevate the most meritorious people, when in fact the whole idea of meritocracy is in fact born of the false belief that people can have an absolute "merit" (let alone that it determines how much power they should be able to exert) - nex3

A key moment in my professional and personal development was when I was reading a ton of data science / machine learning articles for work, I found one in which the author outlined a very solid approach to comparing model performance, then completely undermined it (intentionally) with a paragraph on the inadequacy of "merit" as a concept.

He wasn't simply attacking overly simplistic ML assumptions, but tied the difficulty of defining "merit" in a sterile computational system to professional, social, and political ones - if we're losing sleep objectively defining the quality of a statistical construct, how could we ever be comfortable applying that logic to the messy systems of human interaction and governance.

It was a relatively short aside in a fairly unimportant article, but the comfort with which the author pivoted from technical to fiercely political was genuinely inspiring to me, and still informs a lot of my thinking.


Extremely high-brow beef occurring in the Slay the Spire community


I'm not about to recount the whole thing because that's not really the point, but it's resulted in one of the best videos around on statistics and the idea of comparing ones own performance in a videogame to that of others.

jorbs on youtube


Applying statistics to videogames


Related to my last post, I've been wondering about the idea of modelling a speedrun as a markov chain or some other stochastic model.

It's always been how I think about them, a series of steps with a few possible (sometimes recursive) outcomes and a set amount of time associate with each.

The idea may have legs as an educational tool at best, but honestly I'm really interested in the idea of comparing high and low-risk strategies across many attempts.


On Spreadsheets


My first boss in the stats/data-science space loved spreadsheets - we basically never used them in our work, we did our crimes in R, but he always talked about how good they were at making a normal person sift through and comprehend massive tables of data like one of us weirdos.

It meant that when we were able to do work with Big datasets that wouldn't work in a spreadsheet, execs had a frame of reference to (over)value what we were doing especially when we could squeeze the results into an xlsx, basically giving us a job for life.

Since then I've done a ton of spreadsheet crimes myself so I respect them in a different way, but that point of view stuck with me.