The second day of the 25th International Conference of the World Wide Web (WWW2014) hosted the 2nd Web Observatory Workshop (WOW) 2014. Attracting a number of new faces as well as those that participated during the previous year, the morning session was opened by Professor Dame Wendy Hall who provided the general overview and insights of the current state of development and future aspirations of the Web Observatory project. In contrast to last year, there was much more of a general consensus towards not only what a Web Observatory entails (both technically and socially), but there was also the agreement for its need. Whilst last year there was specific emphasis on defining on what an observatory ‘could’ be and what it ‘could’ provide for the research community, this year there was much greater focus on showing what is now out there in terms of instantiations of Observatories, and the upcoming social and technical challenges that are faced in the coming months.
One of the most important aspects of the opening keynote was not only showing what has been done in terms of infrastructure building, analytics, and visualisations, but also where the challenges lie next. Perhaps one of the most prominent and pressing tasks are providing ways for uniformly describing Web Observatory data, projects, and tools, which is one of the fundamental principles of listing data within the Web Observatory; describe your data.
As a means to reflect the broad range of topics which were covered in the workshop, the rest of this post will provide a synthesis of the major themes which underpinned a number of the papers and posters presented.
Describing your Web Observatory, Data, and your workflow
In line with the opening keynote, a prominent theme that was shared across many of the papers discussed was the agreement and work going towards the necessity for using a common language to describe the entities that will be shared across different Web Observatories. Presented by Jim Hendler, but also mentioned within Kristina’s presentation was the current work going towards using Schema.org in order to provide the necessary common vocabulary to describe datasets, projects and tools. As discussed during the presentations, the challenge lie in finding appropriate ways to not only describe the data, but also help describe what it means to be a Web Observatory projects, which may also include the methodologies, tools that underpin the projects. There is also the challenge of providing a way to describe the workflow from the visualisations, the analytics, with the original data sources that the work was derived from. Essentially, we want to provide a full trace of the research process that has been applied to the data, something like that shown the Figure below.
However, as pointed out, rather than running the risk of over engineering this, and enforcing Web Observatory members with data descriptions, the consensus is that if one is to describe their data with additional metadata as others, and importantly, using RDF, then it will be possible for linking between datasets, and essentially, Web Observatory instances.
Another consideration which will also require consideration in the future is the discoverability of the Web observatories and the entities that exist within them. As this aims to be a distributed approach, we must ensure that we are using techniques to enable this to be achieved. As discussed with fellow workshop attendees making sure we use linked data techniques in order to describe thus discover Observatories offers a solution to this, and with implementations such SQIN (thanks Markus - @MLuczak for this), we are able to achieve this at scale.
Working Observatories
From the live lab of the Zooniverse Citizen Science platform, the development of the ArchiveHub, to the social media platform NeXT developed by NUS, the workshop was host to a collection of papers discussing working Web Observatories.
Presentations showcased new platforms to enable data exploration, providing a window into the eco-system of human interaction and community formation. Making use of the extensive set of data resources that are now available on the Web, specifically the work performed by the Internet Archive have harvested the Web since 1996. Previously only available via “The Wayback Machine”, the ArchiveHub, a project led by assistant professor Matt Webber at Rutgers University are developing a platform to make it possible to explore this data in a simple yet powerful manner. ArchiveHub was presented as a platform to enable individuals across disciplines to query, store and investigate archival Web data. Consequently, the ArchiveHub will provide a unique resource for those interested in researching evolution and change in the Web, and reduce the skill set required, which is essential for critical World Wide Web research.
The NeXT Project, an Observatory platform developed by NUS and led by Tat-Seng Chua also demonstrated the capabilities of harvesting multiple real-time streams of Web data, and demonstrated how a mixture of data sources can be enriched in order to provide insights in various forms of human activity. NeXT, which can harvest streams of data from social networks has developed in line with Big Data techniques and technologies, have demonstrated the capabilities
Analysis with Observatories
WOW2014 also invited authors to present the application of Web Observatories from an analytical perspective. Building upon last year’s programme, WOW2013 featured two papers which focused on methodological and theoretical approaches for social machines analysis. Demonstrating the application of methodologies outside the computer science domain, Mizuku Oka et al. presented their work on identifying burst patterns – or events – in social media streams by using techniques adopted from Physics. This was a great example of how the kind of data sources that the Web Observatory aims to provide can be analysed beyond traditional network analysis methods. This was further emphasised by the research presented by Gareth Beeston et al, who described their recent study of Weibo and the ‘Salt Crisis’, using theoretical frameworks to describe the observations made. In both of these cases, what was shown was the application of interdisciplinary methods, and furthermore, how the Web Observatory can be used to support these investigations.
Beyond just describing their research, the presentations led to an agreement that the Web Observatory goes beyond solely being a platform to list datasets, it provides a place where the academic community (as well as others) have a socio-technical environment to perform research. It provides individuals with the opportunity to reproduce, replicate, refine (the list goes on, I think 23 ‘R’s of data was formed by the end of the workshop), and most importantly, trace the process of the research conducted. We are not only talking about a platform that can list data, but one that can support the progress of science.
Discussion panel – Web Observatories interoperability and standards
Wrapping up the workshop was a panel concerning the interoperability and standards of Web Observatories, chaired by Professor Dave De Roure. This was a great opportunity to draw together the different themes discussed during the day as well as discuss the next steps towards disseminating the concept of the Web Observatory beyond the immediate community. Each panellist (Professor Jim Hendler, Professor Tat-seng Chua, Dr Thanassis Tiropanis) were given a short slot to describe their view on how to drive the Web Observatory further, and all agreed that the next steps will require a set of minimum standards of what it requires to be a Web Observatory. Although this needs to be a ground up movement, these standards will help define aspects such as the data, and the entity that represents the Web Observatory.
The panel and the audience raised an important point regarding the growth of the Web Observatory, this needs to be considered as a socio-technical system. Its growth reflects many of the same characteristics of other social machines on the Web, it requires a diverse, willing and enrolled community, and the technology must accommodate the needs of the individuals, and be flexible enough to enable adoption over time.
Drawing upon some of the research I did during my PhD the development of Web communities, the growth of the Web Observatory as a social-technical activity is something that requires both bottom up (grassroots) involvement, as well as the help and support of a lightweight top-down framework that can help support and guide the community with standards and best practices. Reflecting on the current progress of the Web Observatory community, the W3C community group, the commitment of the various academic institutions and ongoing work of the individuals exhibit the characteristics of a translating Web activity, driven by common goals and incentives. As a Web activity, there appears to be a strong social by-in by a variety of stakeholders within the community, given this, it is important to ensure that the technologies developed to support this reflect the communities needs and is flexible enough to adapt to the changes that will occur.