Micro- and macro-scale investigation of information flows

Alan Turing asked “Can machines think?” We ask: “Is the collective thinking on the Web a machine?” To answer this we study how information diffuses on the Web. We seek to capture the socio-technical algorithms that emerge from the interplay of manual human interactions with the technical capabilities and topological features of the infrastructure.

Emergent phenomena on the Web feature human input that determines which systems are orchestrated and where information flows. At this point we ask if it is possible to derive a formal model that represents the computational power of human participants acting purposeful on the Web. Or in other words: Does the traceable human behavior on the Web form a machine?

The working definition of Social Machines says: “Social machines are Web-based socio-technical systems in which the human and technological elements play the role of participant machinery with respect to the mechanistic realisation of system-level processes.” Our micro- and macro-scale investigation of information flows on the Web contributes insight into the “system-level processes” by extending on existing research on information cascades in online communities.

What are information cascades?

Information cascades are network representations that reflect the propagation of information in socio-technical systems through explicit or implicit unique references (e.g. threads, URIs, hashtags, quotations, memes). Nodes in these networks are unique content elements (e.g. individual Web pages or microposts). Edges are links between any two timely consecutive content elements that share such a particular pattern and for which evidence exists that the older element caused the generation of the more recent one. Cascades can branch end merge when multiple patterns appear in content elements. As such information cascades are an established means allowing for the representation of information transmission or personal influence for example and have most significantly been applied to eCommerce scenarios, political campaigns or citizen journalism.

Based on the abstract aim of understanding the growth and emergence of the Web, we need to devise methods which can examine the digital traces of human-machine interaction at different levels of granularity. This allows to examine the flows of information within and between social machines, revealing a micro and macro level of human- and machine- interaction.

Information Cascades in Social Machines - A Citizen Science Case Study

Online citizen science is a prominent example of Social Machines and also a blueprint of a hybrid approach, coupling state-of-the-art artificial intelligence with human-based computation, to tackle problems that are impossible to be solved in a purely computational fashion. The Zooniverse is the world’s largest multi-project citizen science platform with over one million volunteers contributing to projects from various domains such as astrophysics or digital humanities.

While being well known for its size and success the Zooniverse is also explicitly recognised as the source of a number of serendipitous citizen-led discoveries that developed real scientific impact beyond the one the system was originally designed for. Via the forums of the platform participants began to share information about things they perceived interesting or strange while performing the original task of a project. Over time the provided information became more and more comprehensive and gained the attention of the professional scientists behind the respective projects. They decided to complement the amateur’s hypothesising with their established scientific methods, which finally led to the approval of the participants’ discoveries.

Supporting this highly engaged and domain-specific information sharing and exchange is still at the core of the Zooniverse system with particular importance in discovery-oriented projects. The example from the Planet Hunters project shown in the figure below illustrates how participants employ multiple self-constructed and emergent content patterns for testing hypothesis about objects of interest. Some of these patterns refer to contents within the system (e.g. the APH[number] reference or hashtags); some are the output of remote processing (e.g. alternative light curve images); others are identifiers of remote systems (e.g. the KID[number] reference). In order to generate information cascades based on these identifying patterns we adapt the typical approach so that the only constraint for an edge between content elements sharing one or more identifiers is the temporal order. We do this due to the following reason: The Zooniverse does not feature any explicit binary relation between users such as following or friendship. Consequently there is no explicit social network graph that could be consulted to compute the probability that an identifying content pattern is used in direct response to another user or even predict how cascades will evolve as it is common practice. It is instead the case that participants use those patterns independently from each other, only inspired by the features of the object they currently process as part of the core task. This assumption is supported by a recent study on content and community dynamics within the Zooniverse ecosystem that showed that in more than 90% of all forum messages are object-related microposts a user is stimulated to add immediately after or even during the completion of the task. This study also shows that in discovery-oriented projects only in a small fraction of cases participants are actually talking to each other while overall the community collaboratively talks about common things in a structurally and linguistically converging fashion.


The infographic was created by Markus Luczak-Roesch and Ramine Tinati. The brain image was adapted (greyscale, transparent background) from an original published by Giulia Forsythe (Brain Colouring Book, http://goo.gl/6hpZ0W), Attribution-NonCommercial-ShareAlike 2.0 Generic (CC BY-NC-SA 2.0, https://creativecommons.org/licenses/by-nc-sa/2.0/)