Modeling concentration and dispersion in multiple regression

2005

We consider concepts and models that are useful for measuring how strongly the distribution of a

positive response Y is concentrated near a value y0 > 0 with a focus on how concentration varies as

a function of covariates. We combine ideas from statistics, economics and reliability theory. Lorenz

introduced a device for measuring inequality in the distribution of incomes that indicate how much the

incomes below the uth quantile fall short of the egalitarian situation where everyone has the same

income. Gini introduced an index that is the average over u of the difference between the Lorenz

curve and its values in the egalitarian case. More generally, we can think of the Lorenz and Gini

concepts as measures of concentration that applies to other response variables in addition to

incomes, e.g. wealth, sales, dividends, taxes, test scores, precipitation, and crop yield. In this paper

we propose modified versions of the Lorenz and Gini measures of concentration that we relate to

statistical concepts of dispersion. Moreover, we consider the situation where the measures of

concentration/dispersion are functions of covariates. We consider the estimation of these functions

for parametric models and a semiparametric model involving regression coefficients and an unknown

baseline distribution. In this semiparametric model, which combines ideas from Pareto, Lehmann and

Cox, we find partial likelihood estimates of the regression coefficients and the baseline distribution

that can be used to construct estimates of the various measures of concentration/dispersion.

Keywords: Spread, concentration, Lorenz curve, Gini index, Lehmann model, Cox regression,

Pareto model.

Statistics Norway, Research Department

Discussion Papers;No. 412