Este es mi primer draft de mi primer video de regresion lineal
https://www.youtube.com/watch?v=9zn-dWZHPa4
Social Networks ZH
This blog aims to publish short post on ideas I have presented in manuscripts, lectures or conference.
Monday, September 28, 2015
Tuesday, August 25, 2015
Statistical analysis for cross-sectional network data (ERGM) - Part I.
Everyone in social sciences, economics or anyone doing empirical studies is familiar with linear regression and logistic regression. For instance, logistic regression models are particularly important when studying consumer choice behaviour, i.e Modeling Household Purchase Behavior. However, agent's choice decisions are under many circumstances interdependent and the independence assumption of logistic (linear) regression models is violated. Classical examples of this interdependence are given in social networks, where agents' relational variables are not independent. Concepts such as reciprocity (e.g. the tendency for students to form mutual friendships) or transitivity (e.g tendency to become a friend) are examples of some dependency structures that might exists in relational data.
Which class of statistical models can I use for social network data?
Exponential random graphs models (ERGMs) is a class of models that were develop to account for the dependency structures observed in social networks. They have been applied in organisational studies, political science, educational setting, etc.
What are ERGMs?
Understanding exponential random graphs requires a little bit of mathematical knowledge, in particular concepts as Markov random field, Gibbs distribution, joint distribution, stationary distribution, Erdogicity and mixing time are a minimum requirement for a full understanding of ERGMs.
It is not my intention to write a formal derivation of ERGM in my first post, and therefore I will give an intuitive idea and leaving formalism for a future post.
Intuitive idea:
Let us imagine a set of infinite number of schools, and that at time zero a homogenous population of young students are randomly partitioned in different schools, each school having exactly n students at the end of the partition.
Let us assume that at time t greater than zero, students start to add and delete friendship relations between students in the same schools according to certain "social mechanisms", which are represented as local dependencies between the relational variables. We call this process the linking formation process, and we assume that it is the same across schools.
Figure 2. |
|
Now, if two research teams collect friendship relations between students in two different school at the same time time t1 ( we observe a network), then it is quite likely that the observed networks are quite different. It is also likely that if the same research teams collect the friendship relations in the same school but in a different point in time (t2), their observed networks will be different from the first observed networks. These differences are due to the randomness of the linking process.
What does this means?
What are the limitations of ERGMs?
If you are familiar with ERGM or statistical network analysis, you might have observed that quite often analysis are performed in a single network. Unfortunately, in a working paper (see below), I showed that ERGM-parameters are not a constant function of network size. This is a consequence that the probability to relate with someone is inversely proportional to the number of agents in the network (n), which force the parameters of number of links converge to minus infinity; and if reciprocity is a constant function of n, then the parameters for the number of reciprocate pairs of links converge to infinity as n tends to infinity.
These observations make almost impossible to compare estimated parameters across studies, or even to do statistical analysis on multiple networks.
Estimated parameters for links is well approximated by a linear function of log on with negative slope, and the estimated parameters for reciprocate pairs of links and transitive triangles can be approximated by a linear function of log n with positive slope, see Figure 2.
If we add to the network statistics number of reciprocate pairs of links, number of two-path and number of transitive triangles, we have the following results.
These observations have a theoretical foundation, since if reciprocity is a constant function on n, then the functional form estimated parameters for links on n is equal to minus the functional form the estimated parameters for reciprocate pairs of links on n.
Figure 1. A 84 networks B. 75 networks, C. 36 networks and D. 19 networks
|
|
Link to the manuscript
Subscribe to:
Posts (Atom)