Modeling concentration and dispersion in multiple regression
Working paper
View/ Open
Date
2005Metadata
Show full item recordCollections
- Discussion Papers [1002]
Abstract
We consider concepts and models that are useful for measuring how strongly the distribution of a
positive response Y is concentrated near a value y0 > 0 with a focus on how concentration varies as
a function of covariates. We combine ideas from statistics, economics and reliability theory. Lorenz
introduced a device for measuring inequality in the distribution of incomes that indicate how much the
incomes below the uth quantile fall short of the egalitarian situation where everyone has the same
income. Gini introduced an index that is the average over u of the difference between the Lorenz
curve and its values in the egalitarian case. More generally, we can think of the Lorenz and Gini
concepts as measures of concentration that applies to other response variables in addition to
incomes, e.g. wealth, sales, dividends, taxes, test scores, precipitation, and crop yield. In this paper
we propose modified versions of the Lorenz and Gini measures of concentration that we relate to
statistical concepts of dispersion. Moreover, we consider the situation where the measures of
concentration/dispersion are functions of covariates. We consider the estimation of these functions
for parametric models and a semiparametric model involving regression coefficients and an unknown
baseline distribution. In this semiparametric model, which combines ideas from Pareto, Lehmann and
Cox, we find partial likelihood estimates of the regression coefficients and the baseline distribution
that can be used to construct estimates of the various measures of concentration/dispersion.
Keywords: Spread, concentration, Lorenz curve, Gini index, Lehmann model, Cox regression,
Pareto model.