antfarm: A toolkit for social scientists based on Latour's Actor Network Theory
Here are some ideas for a new sort of game theory based on Item Response Theory (http://en.wikipedia.org/wiki/Item_response_theory). I would like to implement them in a python library. Is anybody on this list interested in being part of that? I will probably just post it on github.
* Mathematical Idea
We imagine that we have actors playing a simple game. The game has only one turn, and each player has a list of possible moves, and they randomly choose to either play or not play each move that they are allowed to play. The choices are independent.
We assume that every possible move is played by every player with probability
P(move M | player A) = sigmoid( a_1 * g_1 + a_2 * g_2 + ... + a_k * g_k)
where the a_i are the private and public variables of the actor A, and the g_i are functions of M specified by the rules of the game.
This is actually very, very clever! We are using the mathematical trick (duality) used in the definition of a Nash equilibrium, but we are thinking of it in a Bayesian framework, rather than in a linear programming optimization framework. Fixed points of the computational process described below should correspond to social equilibria of a more general class of games than is treated by current game theory.
* Model Components and Class Hierarchy - Actor - plans: a distribution over strategies - private: some set of parameters - public: some set of parameters
- Strategy: distribution over roles
- Role: distribution over moves
- Game: model of actors and networks - moves: generator of all possible moves - rules: functions that specify how each move interacts with the private and public parameters of the players involved (if we assume perfect information, then the private parameters can be thought of as all other actors' collective model of the actor in question - player model: IRT-style model of how players choose a move
- Move: atomic interactions in the game
- Network: an instance of a played game - players: actors - turns: each is a list of moves and potential moves (the potential moves correspond to unobserved data)
- IRT Model
* Simulation and training
We assume that we can only observe part of the network and part of the actor structure. That is, we assume that we can observe some moves, but not all, and that we can observe some actor variables, but not all.
We alternate between sleep, waking, and training.
We start with an initialization phase. For each move or non-move in the training network, we randomly decide whether or not to "hide" it from the algorithm with a certain probability. This lets us get an independent check of the model, since we can evaluate it by its ability to correctly guess the hidden data. (We will call the hidden data "holdout data".)
During sleep, we reinfer the variables of each actor based on the moves played and not played during the previous cycle. This is just IRT/logistic regression on the variables a_i, given the observed moves.
During waking, we try to reconstruct the network from scratch by sampling from the strategies of the actors. We assume that all moves are played simultaneously, and that the probability of each move being played is independent of all other moves. This is just sampling from the distribution P(M|A).
During training, we update the rules and models of the game to reinforce good move guesses (places where the waking cycle agreed with the training data) and to eliminate bad move guesses (places where the waking cycle disagreed with the training data). This is just IRT/ logistic regression on the variables g_i. Use this to update score on held-out data.
George and Max: I've been trying to code this up, but I've been having trouble giving the code a nice modular organization. The two of you have worked on modeling social networks, can you give me some advice on what the code should look like?
On Sat, Oct 13, 2012 at 8:52 AM, Eric Purdy epurdy@uchicago.edu wrote:
Here are some ideas for a new sort of game theory based on Item Response Theory (http://en.wikipedia.org/wiki/Item_response_theory). I would like to implement them in a python library. Is anybody on this list interested in being part of that? I will probably just post it on github.
- Mathematical Idea
We imagine that we have actors playing a simple game. The game has only one turn, and each player has a list of possible moves, and they randomly choose to either play or not play each move that they are allowed to play. The choices are independent.
We assume that every possible move is played by every player with probability
P(move M | player A) = sigmoid( a_1 * g_1 + a_2 * g_2 + ... + a_k * g_k)
where the a_i are the private and public variables of the actor A, and the g_i are functions of M specified by the rules of the game.
This is actually very, very clever! We are using the mathematical trick (duality) used in the definition of a Nash equilibrium, but we are thinking of it in a Bayesian framework, rather than in a linear programming optimization framework. Fixed points of the computational process described below should correspond to social equilibria of a more general class of games than is treated by current game theory.
Model Components and Class Hierarchy
Actor
- plans: a distribution over strategies
- private: some set of parameters
- public: some set of parameters
Strategy: distribution over roles
Role: distribution over moves
Game: model of actors and networks
- moves: generator of all possible moves
- rules: functions that specify how each move interacts with the private and public parameters of the players involved (if we assume perfect information, then the private parameters can be thought of as all other actors' collective model of the actor in question
- player model: IRT-style model of how players choose a move
Move: atomic interactions in the game
Network: an instance of a played game
- players: actors
- turns: each is a list of moves and potential moves (the potential moves correspond to unobserved data)
- IRT Model
- Simulation and training
We assume that we can only observe part of the network and part of the actor structure. That is, we assume that we can observe some moves, but not all, and that we can observe some actor variables, but not all.
We alternate between sleep, waking, and training.
We start with an initialization phase. For each move or non-move in the training network, we randomly decide whether or not to "hide" it from the algorithm with a certain probability. This lets us get an independent check of the model, since we can evaluate it by its ability to correctly guess the hidden data. (We will call the hidden data "holdout data".)
During sleep, we reinfer the variables of each actor based on the moves played and not played during the previous cycle. This is just IRT/logistic regression on the variables a_i, given the observed moves.
During waking, we try to reconstruct the network from scratch by sampling from the strategies of the actors. We assume that all moves are played simultaneously, and that the probability of each move being played is independent of all other moves. This is just sampling from the distribution P(M|A).
During training, we update the rules and models of the game to reinforce good move guesses (places where the waking cycle agreed with the training data) and to eliminate bad move guesses (places where the waking cycle disagreed with the training data). This is just IRT/ logistic regression on the variables g_i. Use this to update score on held-out data.
-- -Eric
Cosma Shalizi: At the NYU Stern School Statistics Research Seminarhttp://www.stern.nyu.edu/experience-stern/about/departments-centers-initiatives/academic-departments/ioms/events/statistics/index.htm, "Consistency under Sampling of Exponential Random Graph Models" *Abstract:* The growing availability of network data and of scientific interest in distributed systems has led to the rapid development of statistical models of network structure. Typically, however, these are models for the entire network, while the data consists only of a sampled sub-network. Parameters for the whole network, which is what is of interest, are estimated by applying the model to the sub-network. This assumes that the model is consistent under sampling, or, in terms of the theory of stochastic processes, that it defines a projective family. Focussing on the popular class of exponential random graph models (ERGMs), we show that this apparently trivial condition is in fact violated by many popular and scientifically appealing models, and that satisfying it drastically limits ERGM's expressive power. These results are actually special cases of more general ones about exponential families of dependent random variables, which we also prove. Using such results, we offer easily checked conditions for the consistency of maximum likelihood estimation in ERGMs, and discuss some possible constructive responses. (Joint work with Alessandro Rinaldo http://www.stat.cmu.edu/~arinaldo/; arxiv:1111.3054http://arxiv.org/abs/1111.3054 ) *Time and place:* 11:30--12:30 on Friday, 19 October 2012, in 5-80 Kaufmann Management Center (KMC)
Both talks are free, but I don't know if you'd have to show university ID to get into the buildings.
"LICORS" is pronounced "liquors". "Consistency under Sampling of Exponential Random Graph Models" is pronounced "Really, all we wanted to do was prove that maximum likelihood works when the data is a big graph"http://bactra.org/weblog/837.html .
On Sat, Oct 20, 2012 at 9:56 AM, Eric Purdy epurdy@uchicago.edu wrote:
George and Max: I've been trying to code this up, but I've been having trouble giving the code a nice modular organization. The two of you have worked on modeling social networks, can you give me some advice on what the code should look like?
On Sat, Oct 13, 2012 at 8:52 AM, Eric Purdy epurdy@uchicago.edu wrote:
Here are some ideas for a new sort of game theory based on Item Response Theory (http://en.wikipedia.org/wiki/Item_response_theory). I would like to implement them in a python library. Is anybody on this list interested in being part of that? I will probably just post it on github.
- Mathematical Idea
We imagine that we have actors playing a simple game. The game has only one turn, and each player has a list of possible moves, and they randomly choose to either play or not play each move that they are allowed to play. The choices are independent.
We assume that every possible move is played by every player with probability
P(move M | player A) = sigmoid( a_1 * g_1 + a_2 * g_2 + ... + a_k * g_k)
where the a_i are the private and public variables of the actor A, and the g_i are functions of M specified by the rules of the game.
This is actually very, very clever! We are using the mathematical trick (duality) used in the definition of a Nash equilibrium, but we are thinking of it in a Bayesian framework, rather than in a linear programming optimization framework. Fixed points of the computational process described below should correspond to social equilibria of a more general class of games than is treated by current game theory.
Model Components and Class Hierarchy
Actor
- plans: a distribution over strategies
- private: some set of parameters
- public: some set of parameters
Strategy: distribution over roles
Role: distribution over moves
Game: model of actors and networks
- moves: generator of all possible moves
- rules: functions that specify how each move interacts with the private and public parameters of the players involved (if we assume perfect information, then the private parameters can be thought of as all other actors' collective model of the actor in question
- player model: IRT-style model of how players choose a move
Move: atomic interactions in the game
Network: an instance of a played game
- players: actors
- turns: each is a list of moves and potential moves (the potential moves correspond to unobserved data)
- IRT Model
- Simulation and training
We assume that we can only observe part of the network and part of the actor structure. That is, we assume that we can observe some moves, but not all, and that we can observe some actor variables, but not all.
We alternate between sleep, waking, and training.
We start with an initialization phase. For each move or non-move in the training network, we randomly decide whether or not to "hide" it from the algorithm with a certain probability. This lets us get an independent check of the model, since we can evaluate it by its ability to correctly guess the hidden data. (We will call the hidden data "holdout data".)
During sleep, we reinfer the variables of each actor based on the moves played and not played during the previous cycle. This is just IRT/logistic regression on the variables a_i, given the observed moves.
During waking, we try to reconstruct the network from scratch by sampling from the strategies of the actors. We assume that all moves are played simultaneously, and that the probability of each move being played is independent of all other moves. This is just sampling from the distribution P(M|A).
During training, we update the rules and models of the game to reinforce good move guesses (places where the waking cycle agreed with the training data) and to eliminate bad move guesses (places where the waking cycle disagreed with the training data). This is just IRT/ logistic regression on the variables g_i. Use this to update score on held-out data.
-- -Eric
-- -Eric _______________________________________________ Math mailing list Math@moomers.org http://mailman.moomers.org/mailman/listinfo/math
OK, here is a bare-bones initial version of the code I was talking about. Hopefully we can make it totally sweet.
https://github.com/epurdy/antfarm
On Sun, Oct 21, 2012 at 9:27 PM, Michael Bishop michaelbish@gmail.com wrote:
Cosma Shalizi: At the NYU Stern School Statistics Research Seminar, "Consistency under Sampling of Exponential Random Graph Models" Abstract: The growing availability of network data and of scientific interest in distributed systems has led to the rapid development of statistical models of network structure. Typically, however, these are models for the entire network, while the data consists only of a sampled sub-network. Parameters for the whole network, which is what is of interest, are estimated by applying the model to the sub-network. This assumes that the model is consistent under sampling, or, in terms of the theory of stochastic processes, that it defines a projective family. Focussing on the popular class of exponential random graph models (ERGMs), we show that this apparently trivial condition is in fact violated by many popular and scientifically appealing models, and that satisfying it drastically limits ERGM's expressive power. These results are actually special cases of more general ones about exponential families of dependent random variables, which we also prove. Using such results, we offer easily checked conditions for the consistency of maximum likelihood estimation in ERGMs, and discuss some possible constructive responses. (Joint work with Alessandro Rinaldo; arxiv:1111.3054) Time and place: 11:30--12:30 on Friday, 19 October 2012, in 5-80 Kaufmann Management Center (KMC)
Both talks are free, but I don't know if you'd have to show university ID to get into the buildings.
"LICORS" is pronounced "liquors". "Consistency under Sampling of Exponential Random Graph Models" is pronounced "Really, all we wanted to do was prove that maximum likelihood works when the data is a big graph".
On Sat, Oct 20, 2012 at 9:56 AM, Eric Purdy epurdy@uchicago.edu wrote:
George and Max: I've been trying to code this up, but I've been having trouble giving the code a nice modular organization. The two of you have worked on modeling social networks, can you give me some advice on what the code should look like?
On Sat, Oct 13, 2012 at 8:52 AM, Eric Purdy epurdy@uchicago.edu wrote:
Here are some ideas for a new sort of game theory based on Item Response Theory (http://en.wikipedia.org/wiki/Item_response_theory). I would like to implement them in a python library. Is anybody on this list interested in being part of that? I will probably just post it on github.
- Mathematical Idea
We imagine that we have actors playing a simple game. The game has only one turn, and each player has a list of possible moves, and they randomly choose to either play or not play each move that they are allowed to play. The choices are independent.
We assume that every possible move is played by every player with probability
P(move M | player A) = sigmoid( a_1 * g_1 + a_2 * g_2 + ... + a_k * g_k)
where the a_i are the private and public variables of the actor A, and the g_i are functions of M specified by the rules of the game.
This is actually very, very clever! We are using the mathematical trick (duality) used in the definition of a Nash equilibrium, but we are thinking of it in a Bayesian framework, rather than in a linear programming optimization framework. Fixed points of the computational process described below should correspond to social equilibria of a more general class of games than is treated by current game theory.
Model Components and Class Hierarchy
Actor
- plans: a distribution over strategies
- private: some set of parameters
- public: some set of parameters
Strategy: distribution over roles
Role: distribution over moves
Game: model of actors and networks
- moves: generator of all possible moves
- rules: functions that specify how each move interacts with the private and public parameters of the players involved (if we assume perfect information, then the private parameters can be thought of as all other actors' collective model of the actor in question
- player model: IRT-style model of how players choose a move
Move: atomic interactions in the game
Network: an instance of a played game
- players: actors
- turns: each is a list of moves and potential moves (the potential moves correspond to unobserved data)
- IRT Model
- Simulation and training
We assume that we can only observe part of the network and part of the actor structure. That is, we assume that we can observe some moves, but not all, and that we can observe some actor variables, but not all.
We alternate between sleep, waking, and training.
We start with an initialization phase. For each move or non-move in the training network, we randomly decide whether or not to "hide" it from the algorithm with a certain probability. This lets us get an independent check of the model, since we can evaluate it by its ability to correctly guess the hidden data. (We will call the hidden data "holdout data".)
During sleep, we reinfer the variables of each actor based on the moves played and not played during the previous cycle. This is just IRT/logistic regression on the variables a_i, given the observed moves.
During waking, we try to reconstruct the network from scratch by sampling from the strategies of the actors. We assume that all moves are played simultaneously, and that the probability of each move being played is independent of all other moves. This is just sampling from the distribution P(M|A).
During training, we update the rules and models of the game to reinforce good move guesses (places where the waking cycle agreed with the training data) and to eliminate bad move guesses (places where the waking cycle disagreed with the training data). This is just IRT/ logistic regression on the variables g_i. Use this to update score on held-out data.
-- -Eric
-- -Eric _______________________________________________ Math mailing list Math@moomers.org http://mailman.moomers.org/mailman/listinfo/math
participants (2)
-
Eric Purdy
-
Michael Bishop