antfarm: A toolkit for social scientists based on Latour's Actor Network Theory

Eric Purdy

13 Oct 2012 13 Oct '12

1:52 p.m.

Here are some ideas for a new sort of game theory based on Item Response Theory (http://en.wikipedia.org/wiki/Item_response_theory). I would like to implement them in a python library. Is anybody on this list interested in being part of that? I will probably just post it on github.

* Mathematical Idea

We imagine that we have actors playing a simple game. The game has only one turn, and each player has a list of possible moves, and they randomly choose to either play or not play each move that they are allowed to play. The choices are independent.

We assume that every possible move is played by every player with probability

P(move M | player A) = sigmoid( a_1 * g_1 + a_2 * g_2 + ... + a_k * g_k)

where the a_i are the private and public variables of the actor A, and the g_i are functions of M specified by the rules of the game.

This is actually very, very clever! We are using the mathematical trick (duality) used in the definition of a Nash equilibrium, but we are thinking of it in a Bayesian framework, rather than in a linear programming optimization framework. Fixed points of the computational process described below should correspond to social equilibria of a more general class of games than is treated by current game theory.

* Model Components and Class Hierarchy - Actor - plans: a distribution over strategies - private: some set of parameters - public: some set of parameters

- Strategy: distribution over roles

- Role: distribution over moves

- Game: model of actors and networks - moves: generator of all possible moves - rules: functions that specify how each move interacts with the private and public parameters of the players involved (if we assume perfect information, then the private parameters can be thought of as all other actors' collective model of the actor in question - player model: IRT-style model of how players choose a move

- Move: atomic interactions in the game

- Network: an instance of a played game - players: actors - turns: each is a list of moves and potential moves (the potential moves correspond to unobserved data)

- IRT Model

* Simulation and training

We assume that we can only observe part of the network and part of the actor structure. That is, we assume that we can observe some moves, but not all, and that we can observe some actor variables, but not all.

We alternate between sleep, waking, and training.

We start with an initialization phase. For each move or non-move in the training network, we randomly decide whether or not to "hide" it from the algorithm with a certain probability. This lets us get an independent check of the model, since we can evaluate it by its ability to correctly guess the hidden data. (We will call the hidden data "holdout data".)

During sleep, we reinfer the variables of each actor based on the moves played and not played during the previous cycle. This is just IRT/logistic regression on the variables a_i, given the observed moves.

During waking, we try to reconstruct the network from scratch by sampling from the strategies of the actors. We assume that all moves are played simultaneously, and that the probability of each move being played is independent of all other moves. This is just sampling from the distribution P(M|A).

During training, we update the rules and models of the game to reinforce good move guesses (places where the waking cycle agreed with the training data) and to eliminate bad move guesses (places where the waking cycle disagreed with the training data). This is just IRT/ logistic regression on the variables g_i. Use this to update score on held-out data.

-- -Eric

Show replies by date

Eric Purdy

20 Oct 20 Oct

2:56 p.m.

George and Max: I've been trying to code this up, but I've been having trouble giving the code a nice modular organization. The two of you have worked on modeling social networks, can you give me some advice on what the code should look like?

On Sat, Oct 13, 2012 at 8:52 AM, Eric Purdy epurdy@uchicago.edu wrote:

...

Here are some ideas for a new sort of game theory based on Item Response Theory (http://en.wikipedia.org/wiki/Item_response_theory). I would like to implement them in a python library. Is anybody on this list interested in being part of that? I will probably just post it on github.

Mathematical Idea

We imagine that we have actors playing a simple game. The game has only one turn, and each player has a list of possible moves, and they randomly choose to either play or not play each move that they are allowed to play. The choices are independent.

We assume that every possible move is played by every player with probability

P(move M | player A) = sigmoid( a_1 * g_1 + a_2 * g_2 + ... + a_k * g_k)

where the a_i are the private and public variables of the actor A, and the g_i are functions of M specified by the rules of the game.

This is actually very, very clever! We are using the mathematical trick (duality) used in the definition of a Nash equilibrium, but we are thinking of it in a Bayesian framework, rather than in a linear programming optimization framework. Fixed points of the computational process described below should correspond to social equilibria of a more general class of games than is treated by current game theory.

Model Components and Class Hierarchy

Actor

plans: a distribution over strategies

private: some set of parameters

public: some set of parameters

Strategy: distribution over roles

Role: distribution over moves

Game: model of actors and networks

moves: generator of all possible moves

rules: functions that specify how each move interacts with the private and public parameters of the players involved (if we assume perfect information, then the private parameters can be thought of as all other actors' collective model of the actor in question

player model: IRT-style model of how players choose a move

Move: atomic interactions in the game

Network: an instance of a played game

players: actors

turns: each is a list of moves and potential moves (the potential moves correspond to unobserved data)

IRT Model

Simulation and training

We assume that we can only observe part of the network and part of the actor structure. That is, we assume that we can observe some moves, but not all, and that we can observe some actor variables, but not all.

We alternate between sleep, waking, and training.

We start with an initialization phase. For each move or non-move in the training network, we randomly decide whether or not to "hide" it from the algorithm with a certain probability. This lets us get an independent check of the model, since we can evaluate it by its ability to correctly guess the hidden data. (We will call the hidden data "holdout data".)

During sleep, we reinfer the variables of each actor based on the moves played and not played during the previous cycle. This is just IRT/logistic regression on the variables a_i, given the observed moves.

During waking, we try to reconstruct the network from scratch by sampling from the strategies of the actors. We assume that all moves are played simultaneously, and that the probability of each move being played is independent of all other moves. This is just sampling from the distribution P(M|A).

During training, we update the rules and models of the game to reinforce good move guesses (places where the waking cycle agreed with the training data) and to eliminate bad move guesses (places where the waking cycle disagreed with the training data). This is just IRT/ logistic regression on the variables g_i. Use this to update score on held-out data.

-- -Eric

-- -Eric

Michael Bishop

22 Oct 22 Oct

2:27 a.m.

Cosma Shalizi: At the NYU Stern School Statistics Research Seminarhttp://www.stern.nyu.edu/experience-stern/about/departments-centers-initiatives/academic-departments/ioms/events/statistics/index.htm, "Consistency under Sampling of Exponential Random Graph Models" *Abstract:* The growing availability of network data and of scientific interest in distributed systems has led to the rapid development of statistical models of network structure. Typically, however, these are models for the entire network, while the data consists only of a sampled sub-network. Parameters for the whole network, which is what is of interest, are estimated by applying the model to the sub-network. This assumes that the model is consistent under sampling, or, in terms of the theory of stochastic processes, that it defines a projective family. Focussing on the popular class of exponential random graph models (ERGMs), we show that this apparently trivial condition is in fact violated by many popular and scientifically appealing models, and that satisfying it drastically limits ERGM's expressive power. These results are actually special cases of more general ones about exponential families of dependent random variables, which we also prove. Using such results, we offer easily checked conditions for the consistency of maximum likelihood estimation in ERGMs, and discuss some possible constructive responses. (Joint work with Alessandro Rinaldo http://www.stat.cmu.edu/~arinaldo/; arxiv:1111.3054http://arxiv.org/abs/1111.3054 ) *Time and place:* 11:30--12:30 on Friday, 19 October 2012, in 5-80 Kaufmann Management Center (KMC)

Both talks are free, but I don't know if you'd have to show university ID to get into the buildings.

"LICORS" is pronounced "liquors". "Consistency under Sampling of Exponential Random Graph Models" is pronounced "Really, all we wanted to do was prove that maximum likelihood works when the data is a big graph"http://bactra.org/weblog/837.html .

On Sat, Oct 20, 2012 at 9:56 AM, Eric Purdy epurdy@uchicago.edu wrote:

...

George and Max: I've been trying to code this up, but I've been having trouble giving the code a nice modular organization. The two of you have worked on modeling social networks, can you give me some advice on what the code should look like?

On Sat, Oct 13, 2012 at 8:52 AM, Eric Purdy epurdy@uchicago.edu wrote:

...
Here are some ideas for a new sort of game theory based on Item Response Theory (http://en.wikipedia.org/wiki/Item_response_theory). I would like to implement them in a python library. Is anybody on this list interested in being part of that? I will probably just post it on github.

Mathematical Idea

We imagine that we have actors playing a simple game. The game has only one turn, and each player has a list of possible moves, and they randomly choose to either play or not play each move that they are allowed to play. The choices are independent.

We assume that every possible move is played by every player with probability

P(move M | player A) = sigmoid( a_1 * g_1 + a_2 * g_2 + ... + a_k * g_k)

where the a_i are the private and public variables of the actor A, and the g_i are functions of M specified by the rules of the game.

This is actually very, very clever! We are using the mathematical trick (duality) used in the definition of a Nash equilibrium, but we are thinking of it in a Bayesian framework, rather than in a linear programming optimization framework. Fixed points of the computational process described below should correspond to social equilibria of a more general class of games than is treated by current game theory.

Model Components and Class Hierarchy

Actor

plans: a distribution over strategies

private: some set of parameters

public: some set of parameters

Strategy: distribution over roles

Role: distribution over moves

Game: model of actors and networks

moves: generator of all possible moves

rules: functions that specify how each move interacts with the private and public parameters of the players involved (if we assume perfect information, then the private parameters can be thought of as all other actors' collective model of the actor in question

player model: IRT-style model of how players choose a move

Move: atomic interactions in the game

Network: an instance of a played game

players: actors

turns: each is a list of moves and potential moves (the potential moves correspond to unobserved data)

IRT Model

Simulation and training

We assume that we can only observe part of the network and part of the actor structure. That is, we assume that we can observe some moves, but not all, and that we can observe some actor variables, but not all.

We alternate between sleep, waking, and training.

We start with an initialization phase. For each move or non-move in the training network, we randomly decide whether or not to "hide" it from the algorithm with a certain probability. This lets us get an independent check of the model, since we can evaluate it by its ability to correctly guess the hidden data. (We will call the hidden data "holdout data".)

During sleep, we reinfer the variables of each actor based on the moves played and not played during the previous cycle. This is just IRT/logistic regression on the variables a_i, given the observed moves.

During waking, we try to reconstruct the network from scratch by sampling from the strategies of the actors. We assume that all moves are played simultaneously, and that the probability of each move being played is independent of all other moves. This is just sampling from the distribution P(M|A).

During training, we update the rules and models of the game to reinforce good move guesses (places where the waking cycle agreed with the training data) and to eliminate bad move guesses (places where the waking cycle disagreed with the training data). This is just IRT/ logistic regression on the variables g_i. Use this to update score on held-out data.

-- -Eric

-- -Eric _______________________________________________ Math mailing list Math@moomers.org http://mailman.moomers.org/mailman/listinfo/math

Eric Purdy

27 Oct 27 Oct

1:59 p.m.

OK, here is a bare-bones initial version of the code I was talking about. Hopefully we can make it totally sweet.

https://github.com/epurdy/antfarm

On Sun, Oct 21, 2012 at 9:27 PM, Michael Bishop michaelbish@gmail.com wrote:

...

Cosma Shalizi: At the NYU Stern School Statistics Research Seminar, "Consistency under Sampling of Exponential Random Graph Models" Abstract: The growing availability of network data and of scientific interest in distributed systems has led to the rapid development of statistical models of network structure. Typically, however, these are models for the entire network, while the data consists only of a sampled sub-network. Parameters for the whole network, which is what is of interest, are estimated by applying the model to the sub-network. This assumes that the model is consistent under sampling, or, in terms of the theory of stochastic processes, that it defines a projective family. Focussing on the popular class of exponential random graph models (ERGMs), we show that this apparently trivial condition is in fact violated by many popular and scientifically appealing models, and that satisfying it drastically limits ERGM's expressive power. These results are actually special cases of more general ones about exponential families of dependent random variables, which we also prove. Using such results, we offer easily checked conditions for the consistency of maximum likelihood estimation in ERGMs, and discuss some possible constructive responses. (Joint work with Alessandro Rinaldo; arxiv:1111.3054) Time and place: 11:30--12:30 on Friday, 19 October 2012, in 5-80 Kaufmann Management Center (KMC)

Both talks are free, but I don't know if you'd have to show university ID to get into the buildings.

"LICORS" is pronounced "liquors". "Consistency under Sampling of Exponential Random Graph Models" is pronounced "Really, all we wanted to do was prove that maximum likelihood works when the data is a big graph".

On Sat, Oct 20, 2012 at 9:56 AM, Eric Purdy epurdy@uchicago.edu wrote:

...
George and Max: I've been trying to code this up, but I've been having trouble giving the code a nice modular organization. The two of you have worked on modeling social networks, can you give me some advice on what the code should look like?

On Sat, Oct 13, 2012 at 8:52 AM, Eric Purdy epurdy@uchicago.edu wrote:

...
Here are some ideas for a new sort of game theory based on Item Response Theory (http://en.wikipedia.org/wiki/Item_response_theory). I would like to implement them in a python library. Is anybody on this list interested in being part of that? I will probably just post it on github.

Mathematical Idea

We imagine that we have actors playing a simple game. The game has only one turn, and each player has a list of possible moves, and they randomly choose to either play or not play each move that they are allowed to play. The choices are independent.

We assume that every possible move is played by every player with probability

P(move M | player A) = sigmoid( a_1 * g_1 + a_2 * g_2 + ... + a_k * g_k)

where the a_i are the private and public variables of the actor A, and the g_i are functions of M specified by the rules of the game.

This is actually very, very clever! We are using the mathematical trick (duality) used in the definition of a Nash equilibrium, but we are thinking of it in a Bayesian framework, rather than in a linear programming optimization framework. Fixed points of the computational process described below should correspond to social equilibria of a more general class of games than is treated by current game theory.

Model Components and Class Hierarchy

Actor

plans: a distribution over strategies

private: some set of parameters

public: some set of parameters

Strategy: distribution over roles

Role: distribution over moves

Game: model of actors and networks

moves: generator of all possible moves

rules: functions that specify how each move interacts with the private and public parameters of the players involved (if we assume perfect information, then the private parameters can be thought of as all other actors' collective model of the actor in question

player model: IRT-style model of how players choose a move

Move: atomic interactions in the game

Network: an instance of a played game

players: actors

turns: each is a list of moves and potential moves (the potential moves correspond to unobserved data)

IRT Model

Simulation and training

We assume that we can only observe part of the network and part of the actor structure. That is, we assume that we can observe some moves, but not all, and that we can observe some actor variables, but not all.

We alternate between sleep, waking, and training.

We start with an initialization phase. For each move or non-move in the training network, we randomly decide whether or not to "hide" it from the algorithm with a certain probability. This lets us get an independent check of the model, since we can evaluate it by its ability to correctly guess the hidden data. (We will call the hidden data "holdout data".)

During sleep, we reinfer the variables of each actor based on the moves played and not played during the previous cycle. This is just IRT/logistic regression on the variables a_i, given the observed moves.

During waking, we try to reconstruct the network from scratch by sampling from the strategies of the actors. We assume that all moves are played simultaneously, and that the probability of each move being played is independent of all other moves. This is just sampling from the distribution P(M|A).

During training, we update the rules and models of the game to reinforce good move guesses (places where the waking cycle agreed with the training data) and to eliminate bad move guesses (places where the waking cycle disagreed with the training data). This is just IRT/ logistic regression on the variables g_i. Use this to update score on held-out data.

-- -Eric

-- -Eric _______________________________________________ Math mailing list Math@moomers.org http://mailman.moomers.org/mailman/listinfo/math

-- -Eric

4647

Age (days ago)

4661

Last active (days ago)

List overview

Download

3 comments

2 participants

tags (0)

participants (2)

Eric Purdy
Michael Bishop