Essay: Complex Social Systems – You’ll Need More Than Just Big Data

What makes a “complex system” so vexing is that its collective characteristics cannot easily be predicted from underlying components: the whole is greater than, and often significantly different from, the sum of its parts. A city is much more than its buildings and people. Our bodies are more than the totality of our cells. This quality, called emergent behavior, is characteristic of economies, financial markets, urban communities, companies, organisms, the Internet, galaxies and the health care system.

–Geoffrey West, Santa Fe Institute

The ability to collect and pin to a board all of the insects that live in the garden does little to lend insight into the ecosystem contained therein.

–John H. Miller and Scott E. Page

Defining and Understanding Complexity

Miller and Page (2007) initially punt on defining complexity by invoking Justice Stewart’s definition of pornography: “I know it when I see it.” Many definitions of complexity and complex systems have been suggested, yet none is widely accepted. Often, simply describing the features of complexity seems the modus operandi in explaining complexity.

Key features of a complex system include:

Constituent parts that interact in such ways as to give rise to non-linear and often unanticipated or unpredictable outcomes, outcomes particularly unexpected if the system were examined in a reductionist fashion
Feedback loops between parts and levels of the system and often the system and the environment (Simon, 1969)
Self-organization of constituent parts, adaptation, evolution
And, possibly unique to complex human social systems, the possibility for second-order emergence – emergence of reflexive social institutions based on human collective action (Gilbert & Troitzsch, 2005)

Gilbert and Troitzsch (2005) offer the idea that emergence requires new descriptions that are not required to describe the behavior of underlying components: an individual atom has no temperature, but the interaction of atoms in motion gives rise to temperature. Complexity scholars have also distinguished complication from complexity as a means of explaining complexity. Miller and Page (2007) suggest that removing a seat from a car makes it less complicated while removing the timing belt makes it less complex. Santa Fe Institute president David Krakauer noted in an August 2015 interview, “A watch is complicated…your family is complex,” suggesting that we understand how all of the constituent parts of a watch work together to make a functioning timepiece, but we do not fully understand the various forces that make a family function or not function. Removing a specific part from a watch has a predictable, known consequence. Removing–or adding–a family member changes the interactions in the family’s social system in unknown and unpredictable ways.

The crux of complexity in social systems, then, is how the interactions between individuals in the system give rise to new, emergent properties of the system that cannot be understood by studying each individual alone, as represented by the poetic if macabre Miller and Page (2007) quote regarding pinning butterflies. Perhaps one of the most well-known examples in computational social science (CSS) of macro-level emergence from the interaction of agents in a complex social system is Thomas Schelling’s model of segregation, in which he demonstrated that as individuals choose where to live based on their even very slight preference for having some neighbors who look similar to them, a tremendous degree of residential segregation akin to that observed in many American cities results without any governmental or other top-down organizing schema (1971). Likewise, Simon (1969) recounts teaching urban land use to architectural students who had difficulty accepting that land-use patterns in medieval cities arose from cumulative individual decisions over time rather than top-down guidance from a central planner or designer.

Miller and Page (2007) suggest that innate features of social systems tend to produce complexity: social agents are “enmeshed in a web of connections with one another and, through a variety of adaptive processes, they must successfully navigate through their world” (p. 10). Part of agents’ navigation of the world necessarily involves making decisions and undertaking behaviors either in response to the decisions and behaviors of others, or, importantly, in anticipation of what others will do. The number and disparate types of connections result in non-linear behavior and an inability to reduce the system to its constituent parts without losing the emergent properties of the system (Miller & Page, 2007). Torrens (2010) notes that self-organization and the propagation of information back and forth across scales – notable features of human social systems – embody emergence, a hallmark of complexity.

“Big Data” and Data Science – Not Enough for a Science of Complex Social Systems

Like complexity, definitions of “big data” can seem difficult to pin down, particularly depending on perspective. Technical perspectives approach big data in terms of the “3Vs”: volume, velocity, and variety of data. This perspective is concerned with factors like storage space, transmission networks, and sensors. Another perspective is that of the scientist and researcher: instead of data collection as an expensive, painstaking, time-consuming process that nevertheless results in small samples and woefully inadequate statistical power, it is now possible in some disciplines to quite literally download data that can plausibly be used for research by writing just a small amount of code and tapping the API of a site like Twitter.

Cioffi-Revilla (2014) has described computational social science (CSS) as an “instrument-enabled discipline.” Inasmuch as CSS utilizes computation to investigate complex social systems, big data—and even bigger “computers”—are perhaps an extension of this paradigm: an improvement to our scientific instruments for the study of social complexity. A fascinating example is the controversial research on massive-scale emotional contagion through the social network Facebook. In the research team’s paper, which sought to investigate a phenomenon in which individuals are affected by the emotional expressions of others—and, in turn, affect others through their own expression or withholding of emotion—they noted that the miniscule but statistically significant effect size could only have been detected in a sample as large as that available to the Facebook Data Science team (Kramer, Guillory, & Hancock, 2014). In the context of complex social systems, then, big data represents improved measurement possibilities. At one time, measures of length were imprecise at best – the width of a man’s thumb, length of his foot, the breadth of his outstretched arms – these were the original, inconsistent measures of inch, foot, and yard. Measurement certainly became more precise and more accurate tools were propagated, but more or better data did not change the underlying construct of human height, though it may have helped improve the ability to study it.

Big data isn’t required to appreciate or understand social complexity, however. Returning to Schelling’s work on residential segregation, it is noteworthy that his initial investigation required little more than coins placed on a checkerboard that were then moved according to a series of simple rules. Schelling did not possess or even generate big data, but the modeled social system contained all of the features of a complex system: interacting agents, feedback, adaptation, and emergence. It is also the case that enormous datasets might reveal nothing about complexity; a computer is a complicated machine capable of generating enormous amounts of data on CPU and memory cycles as it operates, but this is not complexity: it is merely executing code, as designed and instructed. A computer, then, is a vastly more complicated watch.

In 2008, WIRED Editor-in-Chief Chris Anderson proclaimed that the deluge of data spelled “the end of theory” and made the scientific method obsolete. Anderson argues that we’ve moved beyond needing to seek causation when we find correlation, that “correlation is enough” with big data. The question – perhaps best left to philosophers of science – is how to define “enough?” Anderson points to Google’s success at solving tasks algorithmically, by throwing more data at more computational power, without the need to even understand the underlying data. Surely one can think of examples in which “enough” might pass the sniff test for a profit-motivated entity, but perhaps not for the scientist driven by intellectual curiosity. An enormous dataset of measures of sky color all over the earth would establish a strong correlation with the sky being blue at midday, yet this tells us nothing about why the sky appears blue to us. Likewise, human beings, owing to our bounded rationality and limited cognition (Simon, 1969, 1976), are fairly terrible sensors in comparison to the satellites and robots NASA might send to Mars. Yet preparation and training for a manned Mars mission is earnestly underway. Why? Arguably, because human curiosity transcends merely knowing “good enough” correlation. Simon described “the vivid new perspective we gained of our place in the universe when we first viewed our own pale, fragile planet from space” (1969). Enormous data on the tremendous number of stars and planetary bodies hadn’t taught that lesson; it required space travel, an enormous feat of collective action in a complex society (Cioffi-Revilla, 2014).

While the “big data” buzzword declined in the first decade of this century—at least according to Google’s Ngram Viewer (see embedded chart at top of post), it is still a paradigm taken seriously by complexity scholars and computational scientists. SFI’s Geoffrey West sees a role for big data in enabling large-scale simulations and models of complex social systems – if, he asserts, we determine a “big theory” to guide which questions we ask and which data we use (2013). In the Manifesto of Computational Social Science, Conte et al. (2012) likewise suggest that big data will play an important role in investigating important questions of human social complexity, but only when coupled with the core principles and concepts of CSS: psychology and the human mind, uncertainty, social change and adaptation, networks, and non-linear and non-equilibrium dynamics, to name but a few.

Pietsch (2013) also takes a highly integrative perspective, using philosophy of science to answer the charge that big data spells “the end of science.” Calling big data “the new science of complexity,” he refutes the notion that big data is not concerned with causality in complex social systems, and in fact suggests that big data will allow for a “contextualization of science” at the level of complex systems rather than attempting to model causality by reducing a phenomena through “dubious simplifications” common in techniques like structural equation modeling used in social science (Pietsch, 2013).

There is little doubt that big data offers exciting new prospects for the study of complex social systems, perhaps in validating complex social system models like Robert Axtell’s 1:1 model of the U.S. economy (Axtell, 2016) or providing more reliable and robust datasets on agent interaction through the sensors contained in smartphones and other so-called “wearables.” Big data advocates who decry the end of the scientific method, however, would do well to keep the complexity hallmark of emergence in mind, though, since emergent behavior is by nature unpredictable. If the emergent property of a complex social system has not yet emerged, there may be nothing in the data – regardless of size – that can describe or predict what’s yet to come. Moreover, the adaptation to feedback that is characteristic of complex social systems also suggests the possibility that big data itself becomes part of the environmental landscape, feedback to which our existing complex social systems and the agents therein will adapt and evolve!

Conte et al. (2012) see a role for big data in the modeling stage when investigating complex social systems; that is, data can reveal statistical features of the system to be studied, and these features can be incorporated in complex social system model, or the emergence of such features may become the object of study. Caution should be exercised in “forcing” big data into simulation models (Conte et al., 2012) and highly detailed predictions of complex social systems, even with big data, may never be possible (West, 2013).

In sum, complexity in social systems is present with or without “big data”; simply observing three preschoolers as they interact, communicating with each other via the linguistic symbol system that emerged to transcend individual human cognitive limitations and with each preschooler predicting and reacting to what each other says or does, can very well lead to highly unpredictable and emergent behavior! At the same time, enormous data can exist from very complicated machines that are not, themselves, complex because they fail the hallmark tests of complexity: self-organization, feedback, emergence. From a methodological perspective, big data technologies and techniques represent new possibilities for how complex social systems might be studied in the discipline of computational social science (e.g., Conte et al., 2012). The fact that computational social science is generative – i.e., can you grow it? (Epstein, 1999) – at times invites the dubious if well-meaning “But where did the data in your model come from?” question, as if actual data generated by human beings – regardless of how or under what circumstances – somehow trumps even the most elegant and effective model. CSS must continue to expand its interdisciplinary toolbox of scientific instruments (Cioffi-Revilla, 2014) and embrace big data as yet another tool to improve our models, our understanding, and our explanations of the complexity inherent in social systems.

REFERENCES

Axtell, R. L. (2016, May). 120 million agents self-organize into 6 million firms: a model of the US private sector. In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems (pp. 806-816). International Foundation for Autonomous Agents and Multiagent Systems.

Cioffi-Revilla, C. (2014). Introduction to computational social science: principles and applications. London: Springer.

Conte, R., Gilbert, N., Bonelli, G., Cioffi-Revilla, C., Deffuant, G., Kertesz, J., … Helbing, D. (2012). Manifesto of computational social science. The European Physical Journal Special Topics, 214(1), 325–346. http://doi.org/10.1140/epjst/e2012-01697-8

Epstein, J. M. (1999). Agent-based computational models and generative social science. Generative Social Science: Studies in Agent-Based Computational Modeling, 4(5), 4–46.

Gilbert, G. N., & Troitzsch, K. G. (2005). Simulation for the social scientist (2nd ed). Maidenhead, England ; New York, NY: Open University Press.

Kramer, A. D. I., Guillory, J. E., & Hancock, J. T. (2014). Experimental evidence of massive-scale emotional contagion through social networks. Proceedings of the National Academy of Sciences, 111(24), 8788–8790. http://doi.org/10.1073/pnas.1320040111

Miller, J. H., & Page, S. E. (2007). Complex adaptive systems: an introduction to computational models of social life. Princeton, N.J: Princeton University Press.

Pietsch, W. (2013). Big Data–The New Science of Complexity. Retrieved from http://philsci-archive.pitt.edu/9944/

Schelling, T. C. (1971). Dynamic models of segregation†. Journal of Mathematical Sociology, 1(2), 143–186.

Simon, H. A. (1969). The sciences of the artificial (3. ed., [Nachdr.]). Cambridge, Mass.: MIT Press.

Simon, H. A. (1976). Administrative behavior: a study of decision-making processes in administrative organization (3d ed). New York: Free Press.

Torrens, P. M. (2010). Geography and computational social science. GeoJournal, 75(2), 133–148.

West, G. (2013). Big data needs a big theory to go with it. Scientific American, May, 15.

Tom Briggs, PhD

Improve performance. Make work better.

Share this:

Related