Projects

How do social networks among anti-government actors affect the decision of ruling authorities to challenge its opposition? Current literature focuses on the dyadic relationship between the government and potential challengers. We shift the focus toward exploring how network structures affect the strategic behavior of political actors. We derive and examine testable hypotheses using latent space analysis to infer actors' positions vis-a-vis each other in the network. Network structure is examined and used to test our hypotheses with data on conflicts in Thailand 1997-2010. We show the influential role of network stability in generating conflictual behavior.

The gravity model, long the empirical workhorse for modeling international trade, ignores network dependencies in bilateral trade data, instead assuming that dyadic trade is independent, conditional on a hierarchy of covariates over country, time, and dyad.   We argue that there are theoretical reasons as well as empirical reasons to expect network dependencies in international trade. Consequently standard gravity models are empirically inadequate.  We combine a gravity model specification with "latent space" networks to develop a dynamic mixture model for real-valued directed graphs.  The model incorporates network dependencies in both trade incidence and trade volumes at both levels simultaneously.  We estimate this model using bilateral trade data from 1990-2008.  The model substantially outperforms standard accounts in terms of both in- and out-of-sample predictive heuristics.  We illustrate the model's usefulness by tracking trading propensities between the USA and China.

Quantitative International Relations scholarship has focused on analysis of the so-called dyad. Few studies have given serious thought to the definition of a dyad, nor to the implications that follow from such a conceptualization. This piece argues that dyadic analysis is necessarily incomplete and even when rigorously pursued gives incomplete and at times incoherent pictures of the ebb and flow of interactions among actors in global politics and economics. Much of this myopia could be attributed in prior scholarship to the paucity of data, a defense no longer plausible.

For more than two decades, political scientist have created statistical models aimed at generating out-of-sample predictions of the popular vote in presidential elections. This exercise aims to develop the “best” model. Our approach is different. Rather than creating the best model or theory, instead we create an ensemble of prediction of the top ten models, and use that ensemble to produce a prediction of the current election, weighting each of the ten models by how accurate they have previously been. Our results produce a very close election, at least in terms of the popular vote, with the incumbent gaining only 50.3 % of the popular vote.

The separation plot package (current version at http://cran.r-project.org/) allows users to visually compare the fits of binary models, both in and out of sample. We present a visual method for assessing the predictive power of models with binary outcomes. This technique allows the analyst to evaluate model fit based upon the models’ ability to consistently match high-probability predictions to actual occurrences of the event of interest, and low-probability predictions to nonoccurrences of the event of interest. Unlike existing methods for assessing predictive power for logit and probit models such as Percent Correctly Predicted statistics, Brier scores, and the ROC plot, our “separation plot” has the advantage of producing a visual display that is informative and easy to explain to a general audience, while also remaining insensitive to the often arbitrary probability thresholds that are used to distinguish between predicted events and nonevents. We demonstrate the effectiveness of this technique in building predictive models in a number of different areas of political research.

Monthly forecast report for August 2014, using the CRISP software.

Though weak states are associated with civil war, terrorism and other threats to
humanity, the social sciences provide scant insight into why states vary in their capacity to govern
across territory. This paper seeks to understand why states govern where they do in post-civil war
settings where leaders face stark geographic choices about extending state capacity across territory
in the face of resource constraints. We propose hypotheses derived from the distributive politics
literature and test them using satellite data in six countries (Burundi, Côte Ivoire, Kenya, Liberia,
Sierra Leone and Uganda). Contrary to several well-established theories, we find that state builders
do not reward core supporters or target swing districts. They do focus benefits on capital cities,
but this does not generalize to other urban settings. Instead, state leaders focus their efforts on
areas that have a history of violence.

We consider ensemble Bayesian model averaging (EBMA) in the context of small-n prediction tasks with high rates of missing component forecasts. With a large number of observations to calibrate ensembles and low rates of missing values for each component model, the standard approach to calibrating ensembles introduced by Raftery et al. (2005) performs well. However, data in the social sciences generally do not fulfill these requirements. The number of outcomes being predicted tend to be relatively small and missing predictions are neither random nor rare. In these circumstances, EBMA models may overweight components with low rates of missingness and those that that perform well on the limited calibration sample. This can seriously undermine the advantages of the ensemble approach to prediction. We demonstrate this problem and provide a solution that diminishes these undesirable outcomes by introducing a “wisdom of the crowds” parameter to the standard EBMA framework. We show that this solution improves predictive accuracy of EBMA forecasts in both political and economic applications.

Monthly forecast report for June 2014, generated using the CRISP software.

Prediction is an important goal in the study of international conflict, but a large body of research has found that existing statistical models generally have disappointing predictive abilities. We show that most efforts build on models unlikely to be helpful for prediction. Many models essentially ignore the origins of conflict; studies look either at invariant structural features believed to affect the opportunities of conflict, or at factors that are believed to reduce the baseline risk of conflict, without attempting to identify the potential motivations and contentious issues over which conflicts typically arise. Researchers that have considered how contentious issues may motivate conflict and how these can be managed, using the Issues Correlates of War (ICOW) data, have not considered how these features may inform prediction. We assess the risk of dyadic interstate conflict based on the presence of specific contentious issues and conflict management events that may change the conflict potential of these contentious issues. We evaluate to what extent incorporating contentious issues and conflict management can help improve out-of-sample forecasting, as well as advance our understanding of conflict dynamics. Our results provide strong support for the idea that taking into account contentious issues can inform and improve out-of-sample forecasting.

Monthly forecast report for November 2014, generated using the CRISP software.

Monthly forecast report for September 2014, generated using the CRISP software.

文章摘要

在冲突研究的领域中,虽然预测分析的重要性不言可喻,但是却一直没有受到足够的重视。我们认为,预测不仅具有实质公共政策参考的能力,另一方面也能用来检证既有理论模型、避免统计上过度配适(overfitting)且降低确认误差(confirmation bias),藉以建构出更可靠的冲突预测。在本篇文章中,我们回顾了学界在冲突预测研究中有哪些进展,发现由于这五十年来学科在资料搜集和运算能力的进步下,研究者得以从事过去所难以企及的预测研究工作,尤其在自动化的编码程序辅助下,快速的搜集数字化的新闻讯息成为可能,冲突研究得以应用以每日、每周、每月为单位的事件解析数据(disaggregated event data)来进行国家层次以下,有关政府与反抗团体的个体活动资料进行及时性的冲突预测工作。

为了呈现冲突研究在过去几年的重大进展,本文重新检视Fearon and Laitin (2003)这份奠定冲突研究基础的文献,从而比较和凸显预测分析在近几年的进展。结果发现,虽然Fearon and Laitin的研究中有很多的解释变量具有统计上的显著性,但是模型对于样本外事件的预测精确度却不高,这因为利用观察型的资料建构出具有统计上显著变量的模型,并无法回答像是何时、何处会发生内战这种决策者所关注的预测问题。

站在修正Fearon and Laitin的基础上,我们利用年度层次以下的事件时间解析数据来建构冲突预测模型,并且运用层级模型(hierarchical model)来追踪估计变量在不同国家属性群集中的变化。具体而言,我们利用CRISP事件数据库,建构从1997到2011以每个月为基础的冲突模型来预测UCDP数据库的内战发生事件。这里简要说明内文中两张图表来显示预测模型的效果。图一用分离图(separation plot)说明样本内及样本外模型的配适度,这两个图示说明在预测机率间离散的范围和程度,以及预测机率相应的真实事件的离散程度说明模型的配适度。另外,它依照所有国家年度的内战发生机率从左边最低到右边最高依序排列,在中间的黑色线条表示这个机率,那些实际发生内战的国家是红色,白色是那些没有发生内战的国家。在左边的红线表示负向的错误预测(false negative),但在右边白色表示正向的错误预测(false positive),一个高配适度的预测应该会有比较多的红色部分(事件发生)在图的右侧。这个图说明了:(1)有实际内战的事件是那些有比较高预测机率的,以及(2)如我们所料,样本外预测的配适度比样本内预测的配适度稍微差一些,不过仅管是样本外,模型与数据仍算相当的配适。

另外表五则呈现双元模型配适度的标准表现统计(假设以0.5为分割点)。样本外的配适度一样在所有个别的估计中都略逊于样本内的配适度,但除了低估实际内战的次数之外,仍然算是很不错的估计。样本内的表现则是相当精准的,所有具有最高预测机率事件都实际发生了内战,而拥有最低预测机率的国家则是没有发生任何的内战事件。

对于社会科学常见的批评指出,像是国际冲突这样的复杂社会现象是无法用任何方法进行预测的。但正因为政治冲突的内在理路十分复杂,我们更应该找寻背后的解释机制来试图对它进行解释与预测。我们在这篇文章中凸显了冲突模型对于了解不同国家背景下政治冲突的功用,内战的统计模型不论在样本内或是样本外都可以是高度精确的,而且在新数据不断涌现的世界,我们得以利用模型建构外的数据回头来检验预测模型的可靠性,这样的模型评估黄金法则对于学科统计方法上也有深远的贡献。

Monthly forecast report for April 2014, generated using the CRISP software.

Monthly forecast report for April 2014, generated using the CRISP software.

Monthly forecast report for March 2014, generated using the CRISP software.

Monthly forecast report for February 2014, generated using the CRISP software.

GDELT and ICEWS are arguably the largest event data collections in social science at the moment. During their brief existence they have also been among the most influential data sets in terms of their impact on academic research and policy advice. Yet, we know little to date about how these two repositories of event data compare to each other. Given the nascent existence of both GDELT and ICEWS event data, it is interesting to compare these two repositories of event data. We undertake such a comparison for fighting in Syria, and for protest behavior in Egypt and Turkey, from 2011 to the present. You can view the visualizations here.

The gold-standard approaches to missing data imputation are complicated and computationally expensive. We present a principled solution to this situation, using Copula distributions from which missing data may be quickly drawn. We compare this approach to other imputation techniques and show that it performs at least as well as less efficient approaches. Our results demonstrate that most applied researchers can achieve great speed improvements implementing a Copula-based imputation approach, while still maintaining the performance of other approaches to multiple imputation.

This study investigates de facto states’ internal legitimacy—people’s confidence in the entity itself, the regime, and institutions.  Using original data from a 2010 survey in Abkhazia, we operationalize this using respondent perceptions of security, welfare, and democracy. Our findings suggest that internal legitimacy is shaped by the key Weberian state-building function of monopoly of the legitimate use of force, as well as these entities’ ability to fulfill other aspects of the social contract. 

Pages