Vis enkel innførsel

dc.contributor.authorLangsrud, Øyvind
dc.date.accessioned2020-03-23T11:10:55Z
dc.date.available2020-03-23T11:10:55Z
dc.date.created2019-07-03T14:16:24Z
dc.date.issued2019-01
dc.identifier.citationStatistics and computing. 2019, 29 (5), 965-976.en_US
dc.identifier.issn0960-3174
dc.identifier.urihttps://hdl.handle.net/11250/2648071
dc.description.abstractThis paper presents a unified framework for regression-based statistical disclosure control for microdata. A basic method, known as information preserving statistical obfuscation (IPSO), produces synthetic data that preserve variances, covariances and fitted values. The data are then generated conditionally according to the multivariate normal distribution. Generalizations of the IPSO method are described in the literature, and these methods aim to generate data more similar to the original data. This paper describes these methods in a concise and interpretable way, which is close to efficient implementation. Decomposing the residual data into orthogonal scores and corresponding loadings is an essential part of the framework. Both QR decomposition (Gram–Schmidt orthogonalization) and singular value decomposition (principal components) may be used. Within this framework, new and generalized methods are presented. In particular, a method is described by means of which the correlations to the original principal component scores can be controlled exactly. It is shown that a suggested method of random orthogonal matrix masking can be implemented without generating an orthogonal matrix. Generalized methodology for hierarchical categories is presented within the context of microaggregation. Some information can then be preserved at the lowest level and more information at higher levels. The presented methodology is also applicable to tabular data. One possibility is to replace the content of primary and secondary suppressed cells with generated values. It is proposed replacing suppressed cell frequencies with decimal numbers, and it is argued that this can be a useful method.en_US
dc.language.isoengen_US
dc.publisherSpringeren_US
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internasjonal*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/deed.no*
dc.subjectMicrodataen_US
dc.subjectAnonymizationen_US
dc.subjectMicroaggregationen_US
dc.subjectOfficial statisticsen_US
dc.subjectSynthetic dataen_US
dc.subjectMikrodataen_US
dc.subjectAnonymiseringen_US
dc.subjectStatistikkproduksjonen_US
dc.titleInformation preserving regression-based tools for statistical disclosure controlen_US
dc.typePeer revieweden_US
dc.typeJournal articleen_US
dc.description.versionacceptedVersionen_US
dc.rights.holderSpringeren_US
dc.subject.nsiVDP::Matematikk og Naturvitenskap: 400::Matematikk: 410::Statistikk: 412en_US
dc.source.pagenumber965-976en_US
dc.source.volume29en_US
dc.source.journalStatistics and computingen_US
dc.source.issue5en_US
dc.identifier.doi10.1007/s11222-018-9848-9
dc.identifier.cristin1709826
cristin.ispublishedtrue
cristin.fulltextpostprint
cristin.qualitycode2


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel

Attribution-NonCommercial-NoDerivatives 4.0 Internasjonal
Med mindre annet er angitt, så er denne innførselen lisensiert som Attribution-NonCommercial-NoDerivatives 4.0 Internasjonal