Consistently estimating network statistics using Aggregated Relational Data
August 26, 2019 Β· Declared Dead Β· π Proceedings of the National Academy of Sciences of the United States of America
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Emily Breza, Arun G. Chandrasekhar, Shane Lubold, Tyler H. McCormick, Mengjie Pan
arXiv ID
1908.09881
Category
stat.ME
Cross-listed
cs.SI,
stat.AP
Citations
7
Venue
Proceedings of the National Academy of Sciences of the United States of America
Last Checked
2 months ago
Abstract
Collecting complete network data is expensive, time-consuming, and often infeasible. Aggregated Relational Data (ARD), which capture information about a social network by asking a respondent questions of the form ``How many people with trait X do you know?'' provide a low-cost option when collecting complete network data is not possible. Rather than asking about connections between each pair of individuals directly, ARD collects the number of contacts the respondent knows with a given trait. Despite widespread use and a growing literature on ARD methodology, there is still no systematic understanding of when and why ARD should accurately recover features of the unobserved network. This paper provides such a characterization by deriving conditions under which statistics about the unobserved network (or functions of these statistics like regression coefficients) can be consistently estimated using ARD. We do this by first providing consistent estimates of network model parameters for three commonly used probabilistic models: the beta-model with node-specific unobserved effects, the stochastic block model with unobserved community structure, and latent geometric space models with unobserved latent locations. A key observation behind these results is that cross-group link probabilities for a collection of (possibly unobserved) groups identifies the model parameters, meaning ARD is sufficient for parameter estimation. With these estimated parameters, it is possible to simulate graphs from the fitted distribution and analyze the distribution of network statistics. We can then characterize conditions under which the simulated networks based on ARD will allow for consistent estimation of the unobserved network statistics, such as eigenvector centrality or response functions by or of the unobserved network, such as regression coefficients.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β stat.ME
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Performance Metrics (Error Measures) in Machine Learning Regression, Forecasting and Prognostics: Properties and Typology
R.I.P.
π»
Ghosted
External Validity: From Do-Calculus to Transportability Across Populations
R.I.P.
π»
Ghosted
Least Ambiguous Set-Valued Classifiers with Bounded Error Levels
R.I.P.
π»
Ghosted
Doubly Robust Policy Evaluation and Optimization
R.I.P.
π»
Ghosted
Comparison of Bayesian predictive methods for model selection
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Language Models are Few-Shot Learners
R.I.P.
π»
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
π»
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
π»
Ghosted