Provenance

Provenance

Complex data processing and integration pipelines, which often incorporates components from several sources, are commonplace in analysis of social media data. Therefore, traceability of data across analytics pipelines is important both from the data provider's and the consumer's perspective. While the parties involved in the data analytics process are interested in assessing quality, reproducibility, and quality of the outcome of an analysis, the data provider(s), e.g. users of a social network, may also want to have some guarantees about how their data is being used.

In this project, a series of techniques to generate and analyse provenance in various settings have been explored. We draw attention to the general problem of combining provenance collected across systems. We show how a basic method of augmenting the provenance of communicating systems may allow us to provide answers to questions about the movement of data and the attribution of responsibility to agents.

Investigators

Publications
Moreau, L., Batlajery B. Victor, Huynh T. Dong, Michaelides D., & Packer H. S. (2017). A templating system to generate provenance. IEEE Transactions on Software Engineering.
Buneman, P., Gascon A., Moreau L., & Murray-Rust D. (2017). Provenance Composition in PROV.
Moreau, L. (2017). A canonical form for PROV documents and its application to equality, signature, and validation. ACM Transactions on Internet Technology. 20.
Moreau, L., & Groth P. (2015). Provenance of Publications: A PROV style for latex. Seventh USENIX Workshop on the Theory and Practice of Provenance (TAPP'15).
Moreau, L., Groth P., Cheney J., Lebo T., & Miles S. (2015). The rationale of PROV. Web Semantics: Science, Services and Agents on the World Wide Web. 35, 235–257.
Dragan, L., Luczak-Rösch M., Simperl E., Packer H., & Moreau L. (2015). A-posteriori Provenance-enabled Linking of Publications and Datasets Via Crowdsourcing. D-Lib Magazine. 21, in print.
Moreau, L. (2015). Aggregation by Provenance Types: A Technique for Summarising Provenance Graphs. Electronic Proceedings in Theoretical Computer Science. 181, 129–144.
Dragan, L., Luczak-Roesch M., Simperl E., Berendt B., & Moreau L. (2014). Crowdsourcing data citation graphs using provenance. Provenance Analytics. 4.
Moreau, L., Huynh T. Dong, & Michaelides D. (2014). An Online Validator for Provenance: Algorithmic Design, Testing, and API. Proceedings of the 17th International Conference on Fundamental Approaches to Software Engineering - Volume 8411. 291–305.
Packer, H. S., Moreau L., & Dragan L. (2014). An auditable reputation service for collective adaptive systems.
Buneman, P. (2014). The Providence of Provenance.
Missier, P., Moreau L., Cheney J., Lebo T., & Soiland-Reyes S. (2013). PROV-Dictionary: Modeling Provenance for Dictionary Data Structures. (De Nies, T., & Coppens S., Ed.).
Moreau, L., & Groth P. (2013). PROV-Overview: An Overview of the PROV Family of Documents.
Huynh, T. Dong, & Zednik P. Grothand S. (2013). PROV Implementation Report.
Moreau, L., & Groth P. (2013). Provenance: An Introduction to PROV.
Moreau, L., & Lebo T. (2013). Linking Across Provenance Bundles.
Moreau, L., & Missier P. (2013). PROV-DM: The PROV Data Model.
Cheney, J., Missier P., Moreau L., & De Nies T. (2013). Constraints of the PROV Data Model.
Moreau, L., & Missier P. (2013). PROV-N: The Provenance Notation.
De Nies, T., Coppens S., Verborgh R., Sande M. Vander, Mannens E., Van de Walle R., et al. (2013). Easy Access to Provenance: An Essential Step Towards Trust on the Web. 2013 {IEEE} 37th Annual Computer Software and Applications Conference Workshops.

Investigators

Prof. Peter Buneman

Dr. Trung Dong Huynh

Prof. Luc Moreau