ArrowModel Help for Newbies

Agile Scoring



ArrowModel Help for Newbies



What is scoring?

After building a statistical model to predict an event (hence a binary variable), one often ends up giving each individual, or each case, a score which is highly correlated with the event. This process is called scoring.

What is a (statistical) model?

The important branch of Statistics where you want to predict a target variable is called modeling, or model building. Recently, managers have begun using the expression "what are the drivers of ...?" to mean the same thing. The goal of modeling is to find a trustworthy formula which uses the predictor variables to compute the target variable. The most common case for predicting a continuous variable consists of a linear combination of the predictors. The widely used technique is called multiple regression.

What about predicting an event?

Sometimes, you want to predict an event which either does or does not occur, i.e. a binary variable. It seems simpler than the prediction of a continuous variable, but it is much harder. The main techniques for doing that are logistic regression, decision trees and neural networks. They are very different and each have their champions in the profession, but luckily for the user, their results are similar in terms of predicting ability.

What kind of data is necessary to build a model?

You need a matrix of data where each row corresponds to an individual or to a case for which you have information. The columns of the matrix are called variables, for example age, sex, income, education level, number of children, etc. One of these variable should be the one that you want to predict.

In general, you need many more rows than you have columns, by several orders of magnitude.

Because of the "Garbage In, Garbage Out" principle, you must clean your data: almost all the professional statisticians we know spend a huge amount of time cleaning data.

How do models work?

Of course, there's no magic involved. We simply use the idea that the past can help us predict the future. You do need a lot of historical data, hopefully related to what you want to predict, and "voila:" most programs use classical statistical methods and will take care of the rest.

Do models always work?

If you have short term data, they certainly cannot help you detect the end of a bubble.

But in general, if you have enough clean data, how well you do in the end will depend on two factors: whether or not the information you have is related to the variable you want to predict, and whether or not you have the right tools at your disposal.

When your target variable is binary, we believe that ArrowModel is the right tool for the job.