Campbell’s Law: Why your metric will be gamed

June 14, 2022June 16, 2022 Tom Briggsdesign of experiments, management, measurement, organizations, performance, research methods

The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.

–Donald Campbell, 1979

Campbell, originally an experimental psychologist and trained in experimental method as was customary in his field, soon realized that true experiments could not be done in any of the social sciences because no one would let social scientists treat human beings the way laboratory scientists treated rats and other experimental animals. You couldn’t manipulate people that way because they were free enough to reinterpret the conditions of any experiment and because the institutions where experiments were done were sensitive to the public relations, if not always the moral, issues involved.

[…]

An experimenter might choose a condition for the social program to be tested, but the subjects of the experiments–organizations and the people responsible for them–inevitably and quickly understood how the numbers their actions piled up could be used in ways that might help or hurt their interests. And so, just as routinely, did their best to make sure that the numbers came out the way that gave the best outcome from them, manipulating them in ways their organizational positions and knowledge made available to them. Who knew better how to to that? And that’s been a robust finding. It’s what people organizations do, if they can (and usually they can).

–Howard S. Becker, Writing for Social Scientists (find in a library)

Why Management Science Fails to Perform, according to Peter Drucker

March 30, 2021 Tom Briggsmanagement, organizations, systems

Parts exist in contemplation of the whole.

There is one fundamental insight underlying all management science. It is that the business enterprise is a system of the highest order: a system whose parts are human beings contributing voluntarily of their knowledge, skill, and dedication to a joint venture. And one thing characterizes all genuine systems, whether they be mechanical like the control of a missile, biological like a tree, or social like the business enterprise: it is interdependence. The whole of a system is not necessarily improved if one particular function or part is improved or made more efficient. In fact, the system may well be damaged thereby, or even destroyed. In some cases the best way to strengthen the system may be to weaken a part–to make it less precise or less efficient. For what matters in any system is the performance of the whole; this is the result of growth and of dynamic balance, adjustment, and integration, rather than of mere technical efficiency.

Primary emphasis on the efficiency of parts in management sciences is therefore bound to do damage. It is bound to optimize precision of the tool at the expense of the health and performance of the whole.

Landmarks of Tomorrow
Management: Tasks, Responsibilities, Practices

Evicted: Matthew Desmond’s Pulitzer Prize-Winning Ethnography of Tenants, Landlords, and Eviction in an American City

August 3, 2019 Tom Briggsbooks, ethnography, research methods, Social Science, sociology, survey research

I’ve always felt that my first duty as an ethnographer was to make sure my work did not harm those who invited me into their lives. But this can be a complicated and delicate matter because it is not always obvious at first what does harm.

With all the talk of data science, big data, and computational modeling, it’s increasingly important to highlight exceptional examples of rigorous research employing different methods, as these methods are no less important in our quest to better understand human social systems.

Perhaps the best book I read in 2018 was Matthew Desmond’s Pulitzer Prize-winning ethnographic study of tenants and landlords, Evicted: Poverty and Profit in the American City (find in a library).

In “About This Project,” Desmond details how his ethnographic study ultimately led to a mixed-methods research inquiry. Desmond describes designing a survey, the Milwaukee Area Renters Study (MARS), to assess the experience of renters in the Milwaukee rental market. He notes that his measurement (i.e., survey) items were greatly influenced by what he learned during his ethnography, which, in my experience, is a critical feature of good survey research – qualitative inquiry driving quantitative measurement (and vice-versa). He noted, for example, that simply asking, “Have you ever been evicted?” is likely to undercount evictions, since “eviction” connotes sheriffs and courts for many of the respondents, and a better measurement item would assess the lose of a rental home due to nonpayment or for other reasons.

The multiple methods and different data sources used in this book informed one another in important ways. I began this project with a set of questions to pursue, but lines of inquiry flexed and waned as my fieldwork progressed. Some would not have sprung to mind had I never set foot in the field. But it was only after analyzing court records and survey data that I was able to see the bigger picture, grasping the magnitude of eviction in poor neighborhoods, identifying disparities, and cataloguing consequences of displacement. My quantitative endeavors also allowed me to assess how representative my observations were. Whenever possible, I subjected my ground-level observations to a kind of statistical check, which determined whether what I was seeing on the ground was also detectable within a larger population.

Desmond also highlights the importance when conducting qualitative research that information be verified whenever possible through alternative sources and, in particular, using official records such as those collected by social services and the courts.

I analyzed two years’ worth of nuisance property citations from the Milwaukee Police Department; obtained records of more than a million 911 calls in Milwaukee; and collected rent rolls, legal transcripts, public property records, school files, and psychological evaluations.

The two surveys that Desmond designed following his fieldwork both achieved very respectable response rates: 84 percent for the MARS survey and 66 percent for the Milwaukee Eviction Court Study.

Desmond was also clear when he noted, in multiple places, his own involvement in the events that he was studying. He describes two instances, in particular, in which he provided funds for the rental of a U-Haul truck and a loan to a mother to purchase a stove and refrigerator in advance of an anticipated visit by Child Protective Services. He also explained that he occasionally provided transportation for individuals looking for housing.

Researchers, particularly those working in field settings–which includes organization scientists–rarely seem to do as good a job as Desmond in examining potential biases introduced by the researcher’s mere presence. In survey methodology training, we’re explicitly taught to understand how the presentation of measurement items can affect response data – whether the survey is incentivized or not and if so, what type of incentive is used (overincentivizing survey participation, for example, will generally reduce the quality of the data); whether surveys are presented electronically, on paper, or by a field interviewer; and even the colors and fonts used when presenting items to respondents.

In light of what we know about survey measurement, it’s a tall order to disentangle and fully understand the bias introduced by a researcher doing ethnographic fieldwork, so I was pleased that Desmond did so in Evicted, and did so in an accessible and highly engaging way (in “About This Project.”)

Quantitative social scientists could learn a great deal from our colleagues with more experience using qualitative methods and inquiry.

Desmond practices open science and promotes re-use of his data:

I have made all survey data publicly available through the Harvard Dataverse Network.

And he suggests that other researchers must attempt to replicate his extensive findings in other geographic areas:

That said, it is ultimately up to future researchers to determine whether what I found in Milwaukee is true in other places. A thousand questions remain unanswered. We need a robust sociology of housing that reaches beyond a narrow focus on policy and public housing. We need a new sociology of displacement that documents the prevalence, causes, and consequences of eviction. And perhaps most important, we need a committed sociology of inequality that includes a serious study of exploitation and extractive markets.

Yet Desmond questions what, in the context of a human socio-economic system like landlords and tenants, we actually mean when we talk of “generalizing” findings or replicating them elsewhere:

Still, I wonder sometimes what we are asking when we ask if findings apply elsewhere. Is it that we really believe that something could happen in Pittsburgh but never in Albuquerque, in Memphis but never in Dubuque? The weight of the evidence is in the other direction, especially when it comes to problems as big and as widespread as urban poverty and unaffordable housing. This study took place in the heart of a major American city, not in an isolated Polish village or a brambly Montana town or on the moon.

Finally, Desmond describes the power of storytelling in conveying research:

Ethnographers shrink themselves in the field but enlarge themselves on the page because first-person accounts convey experience—and experience, authority.

While the product of Matthew Desmond’s extensive ethnographic fieldwork, follow-up research, and synthesis stands on its own and should be read by every social scientist in the U.S., I cannot do better than to close with Desmond’s own words at the end of his methodological documentation, revealing the intense interplay between researcher and subjects in any ethnography:

The harder feat for any fieldworker is not getting in; it’s leaving. And the more difficult ethical dilemma is not how to respond when asked to help but how to respond when you are given so much. I have been blessed by countless acts of generosity from the people I met in Milwaukee. Each one reminds me how gracefully they refuse to be reduced to their hardships. Poverty has not prevailed against their deep humanity.

I highly recommend Matthew Desmond’s Evicted: Poverty and Profit in the American City (find in a library).

Organization Design: “All the elements interact in a system”

February 20, 2019 Tom Briggsleadership, management, organizations, quote, systems

Organizations, like individuals, can avoid identity crises by deciding what it is they wish to be and then pursuing it with a healthy obsession.

Some organizations do indeed achieve and maintain an internal consistency. But then they find that it is designed for an environment the organization is no longer in. To have a nice, neat machine bureaucracy in a dynamic industry calling for constant innovation or, alternately, a flexible adhocracy in a stable industry calling for minimum cost makes no sense. Remember that these are configurations of situation as well as structure. Indeed, the very notion of configuration is that all the elements interact in a system. One element does not cause another; instead, all influence each other interactively. Structure is no more designed to fit the situation than situation is selected to fit the structure.

The way to deal with the right structure in the wrong environment may be to change the environment, not the structure. Often, in fact, it is far easier to shift industries or retreat to a suitable niche in an industry than to undo a cohesive structure.

Essentially, the organization has two choices. It can adapt continuously to the environment at the expense of internal consistency—that is, steadily redesign its structure to maintain external fit. Or it can maintain internal consistency at the expense of a gradually worsening fit with its environment, at least until the fit becomes so bad that it must undergo sudden structural redesign to achieve a new internally consistent configuration. In other words, the choice is between evolution and revolution, between perpetual mild adaptation, which favors external fit over time, and infrequent major realignment, which favors internal consistency over time.

–Henry Mintzberg, 1981
Organization Design: Fashion or Fit?

When is a system complex?

January 20, 2019 Tom Briggscomplexity, complexity science, systems

“Flocking birds, weather patterns, commercial organisations, swarming robots… Increasingly, many of the systems that we want to engineer or understand are said to be ‘complex’. But what does this mean? How do these so-called ‘complex systems’ differ from the more easily understood systems that we are familiar with?”

Visit: http://complexityprimer.eng.cam.ac.uk for more on complexity and modularity.

Review: The Half-Life of Facts – Why Everything We Know Has an Expiration Date by Samuel Arbesman

November 25, 2018November 25, 2018 Tom Briggsbibliometrics, book review, computational social science, long-tailed distribution, measurement, Moore's Law, network science, science, scientometrics, social networks

“What we study is not always what is actually out there. It is often what we’re interested in, or what’s easiest to discover.” –Samuel Arbesman

Samuel Arbesman, a mathematician and network scientist at Harvard, begins his fun romp through the science of science The Half Life of Facts – Why Everything We Know Has an Expiration Date (find in a library) with a few cheeky examples of scientific “facts” that have differed depending on the time period. In the first half of the 20th century, it was widely known that there are 48 chromosomes in a human cell, but in the latter half of the 20th century, it became widely known that there are, in fact, only 46 chromosomes in a human cell.

In Chapter 1, Arbesman walks through the mathematical regularities of the growth of knowledge and the decay of knowledge, aptly using as metaphor the half-life of radioactive material. In Chapter 2, “The Pace of Discovery,” he provides an enjoyable introduction to scientometrics, beginning with a story of Derek J. de Solla Price – considered the founder of scientometrics, or the “science of science” – stacking, chronologically, every issue of a British scientific journal against the wall of his apartment and realizing in an idle moment that the heights of the volumes conformed to a specific shape: an exponential distribution. This discovery led Price to focus his research on scientometrics, leading to the publication of Little Science, Big Science, in which Price calculates the doubling times (i.e., exponentially grow) for various components of science and technology.

Price found, for example:

Domain	Doubling Time (in years)
Number of entries in a dictionary of national biography	100
Number of universities	50
Number of important discoveries; number of chemical elements known; accuracy of instruments	20
Number of scientific journals; number of chemical compounds known; membership of scientific institutes	15
Number of asteroids known; number of engineers in the United States	10

Arbesman also cites Harvey Lehman, who published in the journal Social Forces an attempt to count major contributions in different areas of studies, and Arbesman provides the following expanded table:

Field	Doubling Time (in years)
Medicine and hygiene	87
Philosophy	77
Mathematics	63
Geology	46
Entomology	39
Chemistry	35
Genetics	32
Grand opera	20

Chapter 2 also introduces some of the hallmarks of bibliometrics: the h-index and journal impact factors, as well as the study of scientific discoveries, which Arbesman calls “eurekometrics.” Some of the names mentioned in Chapter 2 include: Jorge Hirsh, Harriet Zuckerman, Arthur C. Clarke, Nicholas Christakis, Tyler Cowen, Galileo, Isaac Newton, Stanley Migram.

Chapter 3, “The Asymptote of Facts,” tackles the decay of knowledge. Primarily using citation analysis, Arbesman examines the length of time until papers become “out of date.” Using this approach, determining the half-life of a field is possible: the time it takes for others to stop citing half of the literature in a field. He uses a variety of examples to illustrate the “long tail” of knowledge as it decays.

Arbesman cites a 2008 work by Rong Tang examining scholarly books in different fields, finding the following half-lives by field:

Field	Half-life (in years)
Physics	13.07
Economics	9.38
Math	9.17
Psychology	7.15
History	7.13
Religion	8.76

“[W]hen people thought the earth was flat, they were wrong. When people thought the earth was spherical, they were wrong. But if you think that thinking the earth is spherical is just as wrong as thinking the earth is flat, then your view is wronger than both of them put together.” –Isaac Asimov

Some of the names mentioned in Chapter 3 include: Marjory Courtenay-Latimer, John Hughlings Jackson, Sean Carroll, Kevin Kelly

Chapter 4, “Moore’s Law of Everything,” explores the intersection of technological progress via Moore’s Law and human knowledge. The chapter is key to understanding how the field of computational social science emerged, and why scientometrics has grown: it’s now possible to do, computationally, what was an incredibly painstaking process to do manually. The advent of computation and the exponential growth in computing power has changed what humans are able to know. Topics in Chapter 4 include: carrying capacity, logistic curves, information transformation, innovation, scientific prefixes as evidence of progress, actuarial escape velocity, population growth, and travel distances.

“Technological growth facilitates changes in facts, sometimes rapidly, in many areas: sequencing new genomes…; finding new asteroids (often done using sophisticated computer algorithms that can detect objects moving in space); even proving new mathematical theorems through increasing computer power.” –Samuel Arbesman

Some of the names mentioned in Chapter 4 include: Clayton Christensen, Rodney Brooks, Jonathan Cole, Henry Petroski, Aubrey de Grey, Bryan Caplan, Michael Kremer, Thomas Malthus, Robert Merton

In Chapter 5, “The Spread of Facts,” Arbesman gently introduces network science and mentions some of his work with Nicholas Christakis, explaining how behaviors (e.g., health behaviors) and information has empirically been shown to move through networks. The spread of information through networks introduces the possibility—or perhaps the eventuality—of fact-transmission errors or even all-out fabrications. Arbesman uses the children’s game of “telephone” to illustrate how easily a piece of knowledge can be distorted as it moves through the network. Without stealing his thunder, I’ll also note that Arbesman chooses some fantastic examples of misinformation spread in this chapter: Popeye the Sailor and dinosaurs! More mundane, but relevant to scientometrics, is the problem of inaccurate citations “entering the wild,” only to be replicated and spread to an almost unbelievable degree. Some of the names mentioned in Chapter 5 include: Gottfried Leibniz, Jukka-Pekka Onnela, Jeremiah Dittmar, Mark Granovetter, James Fallows, David Liben-Nowell, Jon Kleinberg

In Chapter 6, “Hidden Knowledge,” Arbesman goes deeper into network science and computational techniques for studying networks: he introduces random graphs, preferential attachment, evolutionary programming, meta-analysis, and the academic citation and networking product Mendeley along with several other software products. Some of the names mentioned in Chapter 6 include: Albert-László Barabási, Réka Albert, Herbert Simon, William Shakespeare

In Chapter 7, “Fact Phase Transitions,” Arbesman describes in greater detail the use of mathematical tools from physics to investigate the underlying regularity in the change of knowledge. Topics include Ising models, Fermat’s Last Theorem, and human space exploration.

Chapter 8, “Mount Everest and the Discovery of Error,” explores one of my personal favorites as an applied social scientist: measurement, and its importance to all of science, human knowledge, and understanding. Arbesman uses the change in the “fact” of the height of Mt. Everest during the 20^th century—as measurement techniques either improved or, in the case of GPS, were invented and deployed—to note yet another source contributing to the decay of knowledge. Likewise, measures of length have been similarly inconsistent, until scientists finally arrived at the use of the speed of light to define the meter.

“As our measurements become more precise, the speed of light doesn’t change; instead, the definition of a meter does.”

Arbesman defines precision and accuracy (or reliability and validity for the psychologists and social scientists among us). Precision refers to the consistency of measurements over time; accuracy refers to how similar measurements are to the real value. Importantly, he also discusses error: “all methods are neither perfectly precise nor perfectly accurate; they are characterized by a mixture of imprecision and inaccuracy. But we can keep trying to improve our measurement methods. When we do, changes in precision and accuracy affect the facts we know, and sometimes cause a more drastic overhaul in our facts.” [emphasis mine]

“Statistics is the science that lets you do twenty experiments a year and publish one false result in Nature.” –John Maynard Smith

Next, Arbesman jumps into a discussion of probability, its importance in science, and the woefully misunderstood p-value. He discusses the problem of publishing false results, issues of replication in science, and the dangers of poor measurement.

“When you cannot measure, your knowledge is meager and unsatisfactory.” –Lord Kelvin

“If you can measure it, it can also be measured incorrectly.” –Samuel Arbesman’s corollary to Lord Kelvin

The penultimate chapter, Chapter 9, “The Human Side of Facts,” discusses the human aspects that so often contribute to getting facts wrong: cognitive bias, self-serving bias, representativeness bias, theory-induced blindness, change blindness, and language change, to name a few. Some of the names mentioned in this chapter include: Daniel Kahneman, John Maynard Keynes, Michael Chabon, Thomas Kuhn, John McWhorter, Isaac Newton, and Henry Kissinger.

Finally, in the final Chapter 10, “At the Edge of What We Know,” Arbesman discusses the pace of information change and whether—and how—humans can cope. Arbesman argues, essentially, “quite well,” and points to a variety of examples of where humans seem to be doing mostly okay, despite having human limits, such as Dunbar’s Number of “150 to 200 people we can know and have meaningful social ties,” which surpisingly still seems to hold considering the average number of Facebook friends was 190 at the time of writing. Arbesman points to a variety of technology that will enhance our capabilities as we bump against our limits. Some of the names mentioned in this chapter include: Kathryn Schultz, Carl Linnaeus, Robin Dunbar, Chris Magee, Jonathan Franzen

The Half Life of Facts – Why Everything We Know Has an Expiration Date (find in a library) includes almost 20 pages of endnotes and citations for those wishing to dig deeper. It’s a particularly enjoyable read for science and measurement geeks, or for anyone wanting to know more about the science of science and why what we know is always in flux. I particularly recommend Chapter 8, since measurement is so fundamental to science and to our knowledge of the world.

Essay: Complex Social Systems – You’ll Need More Than Just Big Data

November 14, 2018February 7, 2019 Tom Briggscomplexity, complexity science, computational modeling, computational social science, data science, measurement, systems

What makes a “complex system” so vexing is that its collective characteristics cannot easily be predicted from underlying components: the whole is greater than, and often significantly different from, the sum of its parts. A city is much more than its buildings and people. Our bodies are more than the totality of our cells. This quality, called emergent behavior, is characteristic of economies, financial markets, urban communities, companies, organisms, the Internet, galaxies and the health care system.

–Geoffrey West, Santa Fe Institute

The ability to collect and pin to a board all of the insects that live in the garden does little to lend insight into the ecosystem contained therein.

–John H. Miller and Scott E. Page

Defining and Understanding Complexity

Miller and Page (2007) initially punt on defining complexity by invoking Justice Stewart’s definition of pornography: “I know it when I see it.” Many definitions of complexity and complex systems have been suggested, yet none is widely accepted. Often, simply describing the features of complexity seems the modus operandi in explaining complexity.

Key features of a complex system include:

Constituent parts that interact in such ways as to give rise to non-linear and often unanticipated or unpredictable outcomes, outcomes particularly unexpected if the system were examined in a reductionist fashion
Feedback loops between parts and levels of the system and often the system and the environment (Simon, 1969)
Self-organization of constituent parts, adaptation, evolution
And, possibly unique to complex human social systems, the possibility for second-order emergence – emergence of reflexive social institutions based on human collective action (Gilbert & Troitzsch, 2005)

Gilbert and Troitzsch (2005) offer the idea that emergence requires new descriptions that are not required to describe the behavior of underlying components: an individual atom has no temperature, but the interaction of atoms in motion gives rise to temperature. Complexity scholars have also distinguished complication from complexity as a means of explaining complexity. Miller and Page (2007) suggest that removing a seat from a car makes it less complicated while removing the timing belt makes it less complex. Santa Fe Institute president David Krakauer noted in an August 2015 interview, “A watch is complicated…your family is complex,” suggesting that we understand how all of the constituent parts of a watch work together to make a functioning timepiece, but we do not fully understand the various forces that make a family function or not function. Removing a specific part from a watch has a predictable, known consequence. Removing–or adding–a family member changes the interactions in the family’s social system in unknown and unpredictable ways.

The crux of complexity in social systems, then, is how the interactions between individuals in the system give rise to new, emergent properties of the system that cannot be understood by studying each individual alone, as represented by the poetic if macabre Miller and Page (2007) quote regarding pinning butterflies. Perhaps one of the most well-known examples in computational social science (CSS) of macro-level emergence from the interaction of agents in a complex social system is Thomas Schelling’s model of segregation, in which he demonstrated that as individuals choose where to live based on their even very slight preference for having some neighbors who look similar to them, a tremendous degree of residential segregation akin to that observed in many American cities results without any governmental or other top-down organizing schema (1971). Likewise, Simon (1969) recounts teaching urban land use to architectural students who had difficulty accepting that land-use patterns in medieval cities arose from cumulative individual decisions over time rather than top-down guidance from a central planner or designer.

Miller and Page (2007) suggest that innate features of social systems tend to produce complexity: social agents are “enmeshed in a web of connections with one another and, through a variety of adaptive processes, they must successfully navigate through their world” (p. 10). Part of agents’ navigation of the world necessarily involves making decisions and undertaking behaviors either in response to the decisions and behaviors of others, or, importantly, in anticipation of what others will do. The number and disparate types of connections result in non-linear behavior and an inability to reduce the system to its constituent parts without losing the emergent properties of the system (Miller & Page, 2007). Torrens (2010) notes that self-organization and the propagation of information back and forth across scales – notable features of human social systems – embody emergence, a hallmark of complexity.

“Big Data” and Data Science – Not Enough for a Science of Complex Social Systems

Like complexity, definitions of “big data” can seem difficult to pin down, particularly depending on perspective. Technical perspectives approach big data in terms of the “3Vs”: volume, velocity, and variety of data. This perspective is concerned with factors like storage space, transmission networks, and sensors. Another perspective is that of the scientist and researcher: instead of data collection as an expensive, painstaking, time-consuming process that nevertheless results in small samples and woefully inadequate statistical power, it is now possible in some disciplines to quite literally download data that can plausibly be used for research by writing just a small amount of code and tapping the API of a site like Twitter.

Cioffi-Revilla (2014) has described computational social science (CSS) as an “instrument-enabled discipline.” Inasmuch as CSS utilizes computation to investigate complex social systems, big data—and even bigger “computers”—are perhaps an extension of this paradigm: an improvement to our scientific instruments for the study of social complexity. A fascinating example is the controversial research on massive-scale emotional contagion through the social network Facebook. In the research team’s paper, which sought to investigate a phenomenon in which individuals are affected by the emotional expressions of others—and, in turn, affect others through their own expression or withholding of emotion—they noted that the miniscule but statistically significant effect size could only have been detected in a sample as large as that available to the Facebook Data Science team (Kramer, Guillory, & Hancock, 2014). In the context of complex social systems, then, big data represents improved measurement possibilities. At one time, measures of length were imprecise at best – the width of a man’s thumb, length of his foot, the breadth of his outstretched arms – these were the original, inconsistent measures of inch, foot, and yard. Measurement certainly became more precise and more accurate tools were propagated, but more or better data did not change the underlying construct of human height, though it may have helped improve the ability to study it.

Big data isn’t required to appreciate or understand social complexity, however. Returning to Schelling’s work on residential segregation, it is noteworthy that his initial investigation required little more than coins placed on a checkerboard that were then moved according to a series of simple rules. Schelling did not possess or even generate big data, but the modeled social system contained all of the features of a complex system: interacting agents, feedback, adaptation, and emergence. It is also the case that enormous datasets might reveal nothing about complexity; a computer is a complicated machine capable of generating enormous amounts of data on CPU and memory cycles as it operates, but this is not complexity: it is merely executing code, as designed and instructed. A computer, then, is a vastly more complicated watch.

In 2008, WIRED Editor-in-Chief Chris Anderson proclaimed that the deluge of data spelled “the end of theory” and made the scientific method obsolete. Anderson argues that we’ve moved beyond needing to seek causation when we find correlation, that “correlation is enough” with big data. The question – perhaps best left to philosophers of science – is how to define “enough?” Anderson points to Google’s success at solving tasks algorithmically, by throwing more data at more computational power, without the need to even understand the underlying data. Surely one can think of examples in which “enough” might pass the sniff test for a profit-motivated entity, but perhaps not for the scientist driven by intellectual curiosity. An enormous dataset of measures of sky color all over the earth would establish a strong correlation with the sky being blue at midday, yet this tells us nothing about why the sky appears blue to us. Likewise, human beings, owing to our bounded rationality and limited cognition (Simon, 1969, 1976), are fairly terrible sensors in comparison to the satellites and robots NASA might send to Mars. Yet preparation and training for a manned Mars mission is earnestly underway. Why? Arguably, because human curiosity transcends merely knowing “good enough” correlation. Simon described “the vivid new perspective we gained of our place in the universe when we first viewed our own pale, fragile planet from space” (1969). Enormous data on the tremendous number of stars and planetary bodies hadn’t taught that lesson; it required space travel, an enormous feat of collective action in a complex society (Cioffi-Revilla, 2014).

While the “big data” buzzword declined in the first decade of this century—at least according to Google’s Ngram Viewer (see embedded chart at top of post), it is still a paradigm taken seriously by complexity scholars and computational scientists. SFI’s Geoffrey West sees a role for big data in enabling large-scale simulations and models of complex social systems – if, he asserts, we determine a “big theory” to guide which questions we ask and which data we use (2013). In the Manifesto of Computational Social Science, Conte et al. (2012) likewise suggest that big data will play an important role in investigating important questions of human social complexity, but only when coupled with the core principles and concepts of CSS: psychology and the human mind, uncertainty, social change and adaptation, networks, and non-linear and non-equilibrium dynamics, to name but a few.

Pietsch (2013) also takes a highly integrative perspective, using philosophy of science to answer the charge that big data spells “the end of science.” Calling big data “the new science of complexity,” he refutes the notion that big data is not concerned with causality in complex social systems, and in fact suggests that big data will allow for a “contextualization of science” at the level of complex systems rather than attempting to model causality by reducing a phenomena through “dubious simplifications” common in techniques like structural equation modeling used in social science (Pietsch, 2013).

There is little doubt that big data offers exciting new prospects for the study of complex social systems, perhaps in validating complex social system models like Robert Axtell’s 1:1 model of the U.S. economy (Axtell, 2016) or providing more reliable and robust datasets on agent interaction through the sensors contained in smartphones and other so-called “wearables.” Big data advocates who decry the end of the scientific method, however, would do well to keep the complexity hallmark of emergence in mind, though, since emergent behavior is by nature unpredictable. If the emergent property of a complex social system has not yet emerged, there may be nothing in the data – regardless of size – that can describe or predict what’s yet to come. Moreover, the adaptation to feedback that is characteristic of complex social systems also suggests the possibility that big data itself becomes part of the environmental landscape, feedback to which our existing complex social systems and the agents therein will adapt and evolve!

Conte et al. (2012) see a role for big data in the modeling stage when investigating complex social systems; that is, data can reveal statistical features of the system to be studied, and these features can be incorporated in complex social system model, or the emergence of such features may become the object of study. Caution should be exercised in “forcing” big data into simulation models (Conte et al., 2012) and highly detailed predictions of complex social systems, even with big data, may never be possible (West, 2013).

In sum, complexity in social systems is present with or without “big data”; simply observing three preschoolers as they interact, communicating with each other via the linguistic symbol system that emerged to transcend individual human cognitive limitations and with each preschooler predicting and reacting to what each other says or does, can very well lead to highly unpredictable and emergent behavior! At the same time, enormous data can exist from very complicated machines that are not, themselves, complex because they fail the hallmark tests of complexity: self-organization, feedback, emergence. From a methodological perspective, big data technologies and techniques represent new possibilities for how complex social systems might be studied in the discipline of computational social science (e.g., Conte et al., 2012). The fact that computational social science is generative – i.e., can you grow it? (Epstein, 1999) – at times invites the dubious if well-meaning “But where did the data in your model come from?” question, as if actual data generated by human beings – regardless of how or under what circumstances – somehow trumps even the most elegant and effective model. CSS must continue to expand its interdisciplinary toolbox of scientific instruments (Cioffi-Revilla, 2014) and embrace big data as yet another tool to improve our models, our understanding, and our explanations of the complexity inherent in social systems.

REFERENCES

Axtell, R. L. (2016, May). 120 million agents self-organize into 6 million firms: a model of the US private sector. In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems (pp. 806-816). International Foundation for Autonomous Agents and Multiagent Systems.

Cioffi-Revilla, C. (2014). Introduction to computational social science: principles and applications. London: Springer.

Conte, R., Gilbert, N., Bonelli, G., Cioffi-Revilla, C., Deffuant, G., Kertesz, J., … Helbing, D. (2012). Manifesto of computational social science. The European Physical Journal Special Topics, 214(1), 325–346. http://doi.org/10.1140/epjst/e2012-01697-8

Epstein, J. M. (1999). Agent-based computational models and generative social science. Generative Social Science: Studies in Agent-Based Computational Modeling, 4(5), 4–46.

Gilbert, G. N., & Troitzsch, K. G. (2005). Simulation for the social scientist (2nd ed). Maidenhead, England ; New York, NY: Open University Press.

Kramer, A. D. I., Guillory, J. E., & Hancock, J. T. (2014). Experimental evidence of massive-scale emotional contagion through social networks. Proceedings of the National Academy of Sciences, 111(24), 8788–8790. http://doi.org/10.1073/pnas.1320040111

Miller, J. H., & Page, S. E. (2007). Complex adaptive systems: an introduction to computational models of social life. Princeton, N.J: Princeton University Press.

Pietsch, W. (2013). Big Data–The New Science of Complexity. Retrieved from http://philsci-archive.pitt.edu/9944/

Schelling, T. C. (1971). Dynamic models of segregation†. Journal of Mathematical Sociology, 1(2), 143–186.

Simon, H. A. (1969). The sciences of the artificial (3. ed., [Nachdr.]). Cambridge, Mass.: MIT Press.

Simon, H. A. (1976). Administrative behavior: a study of decision-making processes in administrative organization (3d ed). New York: Free Press.

Torrens, P. M. (2010). Geography and computational social science. GeoJournal, 75(2), 133–148.

West, G. (2013). Big data needs a big theory to go with it. Scientific American, May, 15.

They’ll have your attention…

November 12, 2018November 14, 2018 Tom Briggscognition, performance, quote, systems

In an information-rich world, the wealth of information means a dearth of something else: a scarcity of whatever it is that information consumes. What information consumes is rather obvious: it consumes the attention of its recipients. Hence, a wealth of information creates a poverty of attention and a need to allocate that attention efficiently among the overabundance of information resources that might consume it.

In an information-rich world, most of the cost of information is the cost incurred by the recipient. It is not enough to know how much it costs to produce and transmit information: we much also know now much it costs, in terms of scarce attention, to receive it. I have tried bringing this argument home to my friends by suggesting that they calculate how much the [news] costs them, including the costs of [consuming] it. Making the calculation usually causes them some alarm, but not enough for them to cancel their subscriptions.

–Herbert A. Simon
“Designing Organizations for an Information-Rich World” (1971, PDF)

Do children and money really bring happiness?

October 15, 2018December 15, 2018 Tom Briggsbooks, cognition, complexity, psychology, quote, systems

The belief-transmission game is rigged so that we must believe that children and money bring happiness, regardless of whether such beliefs are true.

This doesn’t mean that we should all now quit our jobs and abandon our families. Rather, it means that while we believe we are raising children and earning paychecks to increase our share of happiness, we are actually doing these things for reasons beyond our ken.

We are nodes in a social network that arises and falls by a logic of its own, which is why we continue to toil, continue to mate, and continue to be surprised when we do not experience all the joy we so gullibly anticipated.

—Daniel Gilbert
Stumbling on Happiness (find in a library)

Measurement: Validity, Reliability, Accuracy (The Basics)

September 28, 2018February 20, 2019 Tom Briggsmeasurement, research methods, survey research

Validity. Data have validity if they accurately measure the phenomenon they are supposed to represent.

Reliability. Data have reliability if similar results would be produced if the same measurement or procedure were performed multiple times on the same population.

Accuracy. Data are accurate if estimates from the data do not widely deviate from the true population value.

So basic, but so important.

From the National Science Foundation – Science & Engineering Indicators 2018 Methodology.

Tom Briggs, PhD

Improve performance. Make work better.