Complex data processing and integration pipelines, which often incorporates components from several sources, are commonplace in analysis of social media data. Therefore, traceability of data across analytics pipelines is important both from the data provider's and the consumer's perspective. While the parties involved in the data analytics process are interested in assessing quality, reproducibility, and quality of the outcome of an analysis, the data provider(s), e.g. users of a social network, may also want to have some guarantees about how their data is being used.
In this project, a series of techniques to generate and analyse provenance in various settings have been explored. We draw attention to the general problem of combining provenance collected across systems. We show how a basic method of augmenting the provenance of communicating systems may allow us to provide answers to questions about the movement of data and the attribution of responsibility to agents.