Entries Tagged 'Statistics' ↓

iTunes playlists and randomness

From Slashdot I found this article:How Much Does iTunes Like My Five-Star Songs? In it the author tackles the subject of iTunes and how t picks songs. Its an interesting read and it pretty much confirmed a lot of what I thought about how iTunes works.

Although I do disagree with the assertion about the hypothetical playlist, the author claims that most songs in a persons collection will be rated at 3 stars, and the distribution of rankings will follow a bell curve. In my collection that is most definitely not the case. The majority of my songs are in the 4 and 5 star range, with a smattering of 3 stars, very few 2 stars, and growing number of 1 stars.

For me, I want to hear songs I like (4 and 5 star) more often, so when I hear an unrated song I like I’m more likely to rate that songs. For songs that are ok, I’ll eventually rate them 3 stars. Songs I really don’t care for get the 1 star. Since I imported all of my mp3’s with no rankings I’ve got a mountain of unrated songs. Perhaps this is the wildcard: once I get the majority of my songs rated perhaps they will fall into a bell curve, but I am doubtful of this because of my tendency to get rid of songs/albums/artists I don’t like. i.e. If I can’t rate a single song on an album over 1 or 2 stars, then why bother keeping it?

But that’s just me.

Randomness

The MAA website is constant source of interesting bits of information. Today while reading Ivar’s new column I saw a link to an older article called Random Home Runs where he discusses the topic of "Do baseball hitters have streaks?". I highly recommend the article, its very enlightening.

Tags: Statistics

Statistics in the news

Today while looking at the Mathematical Association of America website (they have really great columns there every month) I saw this article article by Keith Devlin where he talks about some statistics that were reported in a newspaper. His point is that if they are read wrong, statistics can be used to mislead people in a big way. I really wish more people would read articles like this, that way when confronted with newspaper articles reporting a 95% increase in <insert bad thing here> they will be able to look objectively at the numbers and decide if what is being reported is probable or if it is meant to scare.

A lot of people don’t realize that just because something is possible, it does not mean it is probable.

Tags: probability, possibility, stats

Baseball and probability

Today while I was checking out the MAA columns I came across a very interesting article about Statistics and Baseball. It turns out there is this book called Moneyball that talks about how the Oakland A’s used statistical analysis to look at players in a new light. By doing this it allowed them to leverage their small budget (compared to other teams) to acquire players that didn’t fit the mold of “superstar baseball players”.

And this is the kicker: Apparently, once they started doing this, the team started doing better. It seems the analysis they did allowed them to see opportunities that conventional wisdom (and baseball statistical wisdom) would normally overlook.

The article is a very good read. I highly recommend reading it.

Also, there is a line in there that was a minor revelation to me. It said that predicting a win for a football team is hard to do when you are looking at an individual’s stats. Football is such a team-oriented sport in that it doesn’t matter if you have the greatest QB that ever lived, if the rest of the team doesn’t support him, then the team looses. In baseball, almost the opposite is the case: A single stellar player (or a stellar performance by a single player) can benefit the entire team.

This was a minor revelation for me because I’ve been looking at player stats thinking it would be an indicator of success. And in my fantasy football league it is true that we base our score off of individual performances, if a player is a part of a winning team, that should influence the players value.

Very interesting. I have much thinking to do. Especially since my team lost its first matchup this week. :)

Football, Fixed Width records, and Python

Football season will soon be here (American football that is) and this year I am determined to be better prepared. And to that end I have decided to gather player and team statistics so that when I’m setting up my fantasy team this year, they won’t fall apart after the first month. Hopefully.

If you are looking for player stats, I found a great site that gives every player’s stats on a week by week basis. This is exactly what I was looking for. Most stat sheets I have used in the past gave only the totals per player per year (or only career totals). Quickstats has everything on a week by week basis. Good stuff. And a great price too. Check ‘em out if you are interested in football stats at all.

After getting some of the stats file, I found that they are in a fixed width record format. (Thankfully there is no HTML to have to parse through which is what I had been planning to do…) My goal is to take the stats that I have, and put them into a spreadsheet. If I could convert the records from fixed with to a delimited, I could import it in with no problem….

Python to the rescue! This recipe is just what I needed. Now that I have the column widths and python’s struct I can reformat the records with ease. Pretty slick!

Now if only I could actually use the data I’ve gathered to field a winning team… :)

Probability and the Lottery

The other night on the news there was a story about a local man who has a “sure-fire” system for winning the lottery. (Unfortunately I can’t find a link to the story) Basically his method was to study the numbers that have come up in past drawings and use this combined with some other techniques to figure out what numbers were more likely to be drawn. Using this method he won a little over a million dollars a few years ago.

And of course the piece also featured a statistics professor from school talking about how this strategy wasn’t going to make anyone rich. She had all of the usual arguments and randomness and probability which of course were glossed over because they don’t make as good of a news story as someone who has found a way to “beat” the system.

But the professor did have one great line that I really liked. It was something like: “The best that these systems can do is lower your probability of losing.” I chuckled when I heard that because if the odds are 40 million to 1 of winning, that means that lowering you odds of losing by a little bit means that the odds of winning the money is still several million (probably 39,999,999) to 1.

So I had to laugh a little bit on that one. I wondered how many people who saw that story went out and picked up one of the many books about lotto strategies that the story featured. But the one question the piece never asked is:“If these systems work so well, how come there aren’t more millionaires?” ;)

My thoughts exactly…

I try not to get into wars over why one technology is better than the other, but I have to point out two links that made me laugh out loud when I read them. Robert Alsina’s comments about perl programmers, and JWZ’s thoughts on people who spend too much time in database land.

They both hit the nail on the head with their respective topics. Additionally, they made me think about how I use software tools (i.e. python versus perl, oracle vs SQLServer). It is important to remember to use the right tool for the right job. The other day I was trying like mad to figure out a way to use python to parse out a html page so that I could take the data and shove it into a database. But after stopping and thinking about it for a while I decided that it will probably be better to get the stats some other way (like getting a CD from one of the stat websites). If nothing else, trying to design a DB schema to store that stuff will take me just shy of forever, and the draft for my fantasy football league isn’t that far away….

But I do think that I’ll be using python to access and mutilate manipulate the data that I get out of the DB… ;)

A coin tossing simulator

 I was thinking about the coin flipping topic again. I decided to code up a quick little python program to simulate X number of coin tosses. I ran it for several numbers of X (I think the largest was 100000) and it seems to be that most of the time it came out close to 50-50.

 Now a quick disclaimer: I have no idea how random the randrange function is, so this program should be taken with a gain of salt.

 How it works: You give it a number which is the number of coin tosses you want it to do. Say 100 for example. Then it takes this number and proceeds to call random.randrange() with a minimum number of 1 and a max of 100 million. (note this is an odd number and an even number). It then takes the result (somewhere inclusive to that range) and does a modulus 2 operation on it to see if it is even. If it is, it increments the heads counter if it isn’t the tails counter gets incremented.

 At the end of the tosses it then reports back the numbers for heads and tails. Its a decent simulation I think, and somewhat representative of what you would see if you did this in real life. A more accurate simulation would probably take into account which side of the coin was facing up when it was first flipped (see my previous posting for details on that), and it would also probably make sure the random number generator was properly seeded before every toss…. But like I said, this is a quick little demo program. Feel free to modify it.

coinflip.py (Please note you need to save it with a .py extension, I had to put the .txt on there to get it to load onto geocities.)

So maybe it isn’t 50-50…

After a converation at work about the flipping of a coin (and is it really a 50-50 chance of heads or tails), I came across this article today. Its seems that there some catches to the statement I made yesterday.

Research has found that there does seem to be a bias when flipping a coin, it seem that a coin is more likely to land on the same face it started out on. So if its heads up when you flip it, it is more likely that it will be heads up when it lands. I’ll be sure to remember that next time someone asks if I want to flip for something…

Flippin’ pennies:

This weekend I finished reading the memoirs of Dr. Edward Teller (he helped invent the atomic and hydrogen bombs). In his later years Dr. Teller was involved with a lot of foundations to try and help get people into the field of “Applied Science” (think hands-on practical science instead of pure theory)

He recounted an interesting story (pg. 487) about one of the interview questions he used to ask applicants: If you flipped a coin and each time bet a penny on the outcome of the flip (i.e. 1 penny it would land on heads, etc.), how much money would you have at the end of 1,000 tosses?

He said a lot of people had trouble with this one. The answer is close to 0. Why? Each toss of the coin has a 50% chance on landing on heads (or tails). If you toss the coin enough times, you will start to see that you have tossed almost an equal amount of heads and tails. Betting a penny on each toss means that you would come out almost even in the end.

Why almost even? Each coin toss has nothing to do with any past or future toss. It is possible to toss 100 tails in a row, but this is
unlikely. (i.e. it has a low probability) 10 in a row is possible, as is 3, and each of those is more likely than the previous one. So when you
get to the 998th toss, if you have tossed exactly 500 heads and 498 tails, it is possible you could flip 2 more heads in a row giving you a net profit of 2 cents (assuming you bet on heads). Neat huh?