The virtual 2021 provenance week event will be held on July 19th - July 22nd, 2021. Authors of accepted TaPP and IPAW papers will be given a full presentation slot. There will be two poster / demo sessions with break-out rooms for each poster and demo where participants can interact with the presenter. All times are in central time.

Join the event

We are using Zoom. The zoom link will be send to all registered participants.

Free Online Access to IPAW + Joint Posters and Demos LNCS Proceedings

Springer is providing 4 weeks of free online access to the IPAW LNCS proceedings for ProvenanceWeek participants.

Detailed Schedule

Monday - July 19th

8:45 am - 9:00 am - Introduction by the Chairs

9:00 am - 10:00 am - IPAW Keynote: Paolo Missier - “Quo vadis, provenancer? Cui prodest? (our own trajectory: provenance of data science pipelines)”

Bio: Paolo Missier is Professor of Scalable Data Analytics with the School of Computing at Newcastle University and currently a Fellow (2018-2020) of the Alan Turing Institute, UK’s National Institute for Data Science and Artificial Intelligence.
With a background in traditional databases and data management, since around 2000 his research has touched on Data and Information Quality, web semantics, workflow-based infrastructure for e-science, and data provenance. He has been actively involved in the specification of the W3C PROV data model for provenance (2011-2013). His recent work in this area includes an exploration into the management and exploitation of fine-grained provenance of data science pipelines, and into the connection between data pre-processing and the fairness of the resulting machine learning models.

Paolo leads the School of Computing’s post-graduate academic teaching on Big Data Analytics, and he is Sr. Associate Editor for the ACM Journal on Data and Information Quality (JDIQ).

Abstract: The provenance community has come a long way over more than twenty years, and when measured in number of publications, some very influential, its research footprint is counted in the thousands. Large efforts have also gone into building numerous prototypes as well as mature provenance lifecyle management toolkits, some tied to important infrastructure. Having had an opportunity to reflect on some of these achievements while preparing an “impact statement” for the recent UK Research Excellence Framework exercise, I will try and share some thoughts, hopefully provocative in a positive way, on our actual impact (“cui prodest?”) and the next green fields (“quo vadis?”) as gleaned through a simple bibliometric exercise. I would then like to present some of our latest efforts in one of the directions we find promising, namely collecting very fine-grained Data Provenance of data processing pipelines for Data Science applications (DP4DS).

10:15 am - 11:50 am - IPAW - Session 1 - Provenance Representation

Session chair: Seokki Lee

  • A Delayed Instantiation Approach to Template-driven Provenance for Electronic Health Record Phenotyping (Elliot Fairweather, Martin Chapman, Vasa Curcin)

  • Provenance Supporting Hyperparameter Analysis in Deep Neural Networks (Débora Pina, Liliane Kunstmann, Daniel de Oliveira, Patrick Valduriez, Marta Mattoso)

  • Evidence Graphs: Supporting Transparent and FAIR Computation, with Defeasible Reasoning on Data, Methods and Results (Sadnan Al Manir, Justin Niestroy, Maxwell Levinson, Timothy Clark)

  • The PROV-JSONLD Serialization (Luc Moreau, Trung Dong Huynh)

11:50 am - 1:25 pm - IPAW - Session 2 - Security, Reliability and Trusteworthiness

Session chair: Paolo Missier

  • Proactive Provenance Policies for Automatic Cryptographic Data Centric Security (Shamaria Engram, Tyler Kaczmarek, Alice Lee, David Bigelow)

  • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps (Andreas Schreiber, Tim Sonnekalb, Thomas Heinze, Lynn von Kurnatowski, Jesus M. Gonzalez-Barahona, Heather Packer)

  • A Model and System for Querying Provenance from Data Cleaning Workflows (Nikolaus N. Parulian, Bertram Ludaescher, Timothy McPhillips)

  • Non-repudiable Provenance for Clinical Decision Support Systems (Elliot Fairweather, Rudolf Wittner, Martin Chapman, Petr Holub, Vasa Curcin)

1:25 pm - 2:45 pm - Lunch break

2:45 pm - 4:00 pm - IPAW - Session 3 - Provenance Types, Inference, Queries and Summarization

Session chair: Paul Groth

  • Notebook Archaeology: Inferring Provenance from Computational Notebooks (David Koop)

  • Efficient Computation of Provenance for Query Result Exploration (Murali Mani, Naveenkumar Singaraj, Zhenyan Liu)

  • Incremental Inference of Provenance Types (David Kohan Marzago, Trung Dong Huynh, Luc Moreau)

Tuesday - July 20th

9:15 am - 10:15 am - TaPP - Session 1 - Performance and Scalability of Provenance Systems

Session chair: Peter Alvaro

  • Discrepancy Detection in Whole Network Provenance (Raza Ahmad (SRI International), Eunjin Jung (University of San Francisco), Carolina de Senne Garcia (Ecole Polytechnique), Hassaan Irshad (SRI International), Ashish Gehani (SRI International))

  • Provenance expressiveness benchmarking on non-deterministic executions (Sheung Chi Chan, James Cheney, Pramod Bhatotia)

  • Observed vs Possible Provenance (Tom Blount, Adriane Chapman, Michael Johnson, Bertram Ludascher)

10:15 am - 11:30 am - TaPP Keynote: Hazeline Asunction- “Finding Connections: Software Traceability & Data Provenance”

Abstract: Software and data are increasingly at the heart of virtually every scientific and engineering discipline. In software engineering, the ability to efficiently connect related data is crucial for large projects. In scientific research, the ability to collect, retrieve, and understand both raw and processed data is necessary to advance research. This talk discusses our research investigations in connecting related data in software engineering, referred to as software traceability, and in eScience, referred to as data provenance. We discuss challenges encountered and our contributions thus far in both domains.

Bio: Hazeline Asuncion is an Associate Professor at the University of Washington Bothell. Her research focuses on traceability of data that may be found in different file types, locations, and owner groups. In the domain of software engineering, software traceability aids in various development tasks, such as system comprehension, system debugging, and communication between various stakeholders. In the domain of eScience, tracing how a dataset arrived at its current state, referred to as data provenance, is necessary in assessing a dataset’s integrity and in supporting repeatability of analyses or experiments. She has published over 30 peer-reviewed papers spanning these two topics. Her work has been funded by the National Science Foundation, including NSF REUs and an NSF Career. She received her Ph.D., M.S., and B.S., in Information and Computer Science from the University of California, Irvine.

11:30 am - 12:45 pm - TaPP - Session 2 - Provenance Analytics

Session chair: Khalid Belhajjame

  • A first-principles algebraic approach to data transformations in data cleaning: understanding provenance from the ground up (Santiago Núñez-Corrales (iSchool and NCSA, UIUC), Lan Li (iSchool, UIUC), Bertram Ludäscher (iSchool and NCSA, UIUC))

  • Efficient Provenance Alignment in Reproduced Executions (Tanu Malik (DePaul University), Yuta Nakamura (DePaul University), Ashish Gehani (SRI International))

  • A Generic Explainability Framework for Function Circuits (Sylvain Hallé (Université du Québec à Chicoutimi), Hugo Tremblay (Université du Québec à Chicoutimi))

  • Provenance-integrated parameter selection and optimization in numerical simulations (Julia Kühnert, Dominik Göddeke, Melanie Herschel)

12:45 pm - 1:30 pm - Lunch break

1:30 pm - 2:30 pm - TaPP - Session 3 - Provenance Data Model and Queries

Session chair: Ashish Gehani

  • An Event-based Data Model for Granular Information Flow Tracking (Joud Khoury (Raytheon BBN Technologies), Timothy Upthegrove (Raytheon BBN Technologies), Armando Caro (Raytheon BBN Technologies), Brett Benyo (Raytheon BBN Technologies), Derrick Kong (Raytheon BBN Technologies))

  • Data Provenance for Attributes: Attribute Lineage (Dennis Dosso (University of Padua), Susan B. Davidson (University of Pennsylvania), Gianmaria Silvello (University of Padua))

  • A declarative query language for Data Provenance (Argyro Avgoustaki, Giorgos Flouris, Dimitris Plexousakis)

2:45 pm - 4:00 pm - TaPP - Session 4 - Security

Session chair: Zhen Huang

  • Integrity checking and abnormality detection of provenance records (Sheung Chi Chan (University of Edinburgh), James Cheney (University of Edinburgh), Ashish Gehani (SRI International), Hassaan Irshad (SRI International))

  • Using Provenance to Evaluate Risk and Benefit of Data Sharing (Taeho Jung, Seokki Lee, Wenyi Tang)

  • Practical Provenance Privacy Protection (Ashish Gehani, Raza Ahmad, Hassaan Irshad)

Wednesday - July 21th

9:00 am - 12:45 am - ProvViz Workshop

  • 09:00 – Opening introduction
  • 09:15 – Keynote: Bill Howe, “Viziometrics: Comprehending Visualization Use in the Scientific Literature”
    • Abstract: The use of visualization in the scientific literature has a significant effect on impact and communicability, but has been largely ignored in metascience studies. We aim to develop a comprehensive set of techniques and technologies for analyzing the visual literature to provide new insights for discovery, communication, and teaching within and across scientific boundaries. We develop models to classify figures from the literature to measure how the use of various visualization techniques relate to scientific impact, and how their use varies across fields and over time. For example, we find that explanatory diagrams and rich quantitative plots are associated with higher impact papers, reinforcing the importance of visualization in scientific communication. But their impact within-discipline is lower than across-discipline, suggesting opportunities to customize exposition for certain audiences. To understand how usage varies over time and across disciplines, we fine-tune ResNet-50 on scientific figures and use the latent space of visualizations to describe qualitative differences in time periods, disciplines, and other categories. We consider whether papers include a “key figure” that can be used as a graphical abstract, and whether these key figures improve discoverability and understanding. I’ll also discuss new methods we developed as part of his project, including algorithms for multi-modal learning (e.g., figures and captions) and hierarchical classification.
    • Bio: Bill Howe is Associate Professor in the Information School and Adjunct Associate Professor in the Allen School of Computer Science & Engineering and the Department of Electrical Engineering. His research interests are in data management, machine learning, and visualization, particularly as applied in the physical and social sciences. As Founding Associate Director of the UW eScience Institute, Dr. Howe played a leadership role in the Moore-Sloan Data Science Environment program through a $32.8 million grant awarded jointly to UW, NYU, and UC Berkeley, and founded UW’s Data Science for Social Good Program. With support from the MacArthur Foundation, NSF, and Microsoft, Howe directs UW’s participation in the Cascadia Urban Analytics Cooperative. He founded the UW Data Science Masters Degree, serving as its inaugural Program Chair, and created a first MOOC on data science that attracted over 200,000 students. His research has been featured in the Economist and Nature News, and he has authored award-winning papers in conferences across data management, machine learning, and visualization. He has a Ph.D. in Computer Science from Portland State University and a Bachelor’s degree in Industrial & Systems Engineering from Georgia Tech.
  • 10:15 – Discussion abstract presentations
  • 10:45 – Proposal of other topics from workshop attendees
  • 11:00 – Break and group selection
  • 11:15 – Discussion session 1
  • 11:45 – Discussion session 2 (changing groups is optional)
  • 12:15 – Report back and closing
  • 12:45 – Workshop end

12:45 pm - 2:45 pm - Poster and Demo Session with Breakout Rooms

  • ProvViz: An Intuitive Prov Editor and Visualiser (Ben Werner, Luc Moreau) Materials – Zoom Breakout Room #1

  • Curating Covid-19 data in Links (Vashti Galpin, James Cheney) Materials – Zoom Breakout Room #2

  • Towards a provenance management system for astronomical observatories (Mathieu Servillat, François Bonnarel, Catherine Boisson, Mireille Louys, Jose Enrique Ruiz, Michèle Sanguillon) Materials – Zoom Breakout Room #3

  • Towards Provenance Integration for Field Devices in Industrial IoT systems (Iori Mizutani, Jonas Brütsch, Simon Mayer) Materials – Zoom Breakout Room #4

  • COVID-19 Analytics in Jupyter: Intuitive Provenance Integration using ProvIt (Martin Chapman, Elliot Fairweather, Asfand Khan, Vasa Curcin) Materials – Zoom Breakout Room #5

  • CPR - A Comprehensible Provenance Record for Verification Workflows in Whole Tale (Timothy M. McPhillips, Thomas Thelen, Craig Willis, Kacper Kowalik, Matthew Jones, Bertram Ludäscher) Materials – Zoom Breakout Room #6

2:45 pm - 4:00 pm - TaPP - Session 5 - Applications of Provenance

Session chair: Qing Liu

  • Provenance-Based Interpretation of Multi-Agent Information Analysis (Scott Friedman (SIFT), Jeff Rye (SIFT), David LaVergne (SIFT), Dan Thomsen (SIFT), Matthew Allen (Raytheon BBN Technologies), Kyle Tunis (Raytheon BBN Technologies))

  • GitLab2PROV—Provenance of Software Projects hosted on GitLab (Andreas Schreiber, Claas de Boer, Lynn von Kurnatowski)

  • Detailed Provenance Metadata from Statistical Analysis Software (George Alter, Jack Gager, Pascal Heus, Carson Hunter, Sanda Ionescu, Jeremy Iverson, H V Jagadish, Jared Lyle, Alexander Mueller, Sigve Nordgaard, Ørnulf Risnes, Dan Smith, Jie Song, Bertram Ludaescher, Timothy McPhillips, Thomas Thelen.)

  • Astronomical Pipeline Provenance: A Use Case Evaluation (Michael Johnson, Kristen Lackeos, Hans-Rainer Klöckner, Sirko Schindler, Marcus Paradies, David Champion, Marta Dembska)

Thursday - July 22nd

9:00 am - 12:45 am - T7 Workshop on Provenance for Transparent Research

About T7    Workshop Format    Detailed Schedule    Abstracts    Mailing List    Register Now

  • Principles of Transparent Research: Implementation Challenges (Keynote, Lars Vilhuber) [slides]

  • Automated screening of COVID-19 preprints - Are we helping authors improve transparency and reproducibility? (Automated Screening Working Group, Anita Bandrowski) [abstract]

  • Persistent IDs and W3C PROV, or, Is W3C PROV FAIR? (Sadnan Al Manir, Justin Niestroy, Maxwell Adam Levinson, Timothy Clark) [abstract]

  • Improving the traceability of (meta)data through semantically enriched nanopublications (Matheus Pedra Puime Feijoó, Rodrigo Jardim, Sergio Manuel Serra da Cruz, Maria Luiza Campos) [abstract]

  • Enabling Trustworthy and Tracable Research by Non-repudiable Opaque Provenance in a Distributed Environment (Rudolf Wittner, Jörg Geiger, Cecilia Mascia, Francesca Frexia, Heimo Muller, Petr Holub) [abstract]

  • Contemporary and Established Provenance Issues in Natural History Collections (Laurence Livermore, Ben Scott, Mathias Dillen) [abstract]

  • Auditable spreadsheets (Laura Waltersdorfer, Fajar Ekaputra, Tomasz Miksa) [abstract]

  • From Traceability to Transparency of Bioinformatics Data Analyses: Round Trip? (Sarah Cohen-Boulakia) [abstract]

12:45 pm - 2:45 pm - Poster and Demo Session with Breakout Rooms

  • ReproduceMeGit: A Visualization Tool for Analyzing Reproducibility of Jupyter Notebooks (Sheeba Samuel, Birgitta König-Ries) Materials – Zoom Breakout Room #1

  • Mapping Trusted Paths to VGI (Bernard Roper, Adriane Chapman, David Martin, Stefano Cavazzi) Materials – Zoom Breakout Room #2

  • Querying Data Preparation Modules Using Data Examples (Khalid Belhajjame, Mahmoud Barhamgi) Materials – Zoom Breakout Room #3

  • Privacy Aspects of Provenance Queries (Tanja Auge, Nic Scharlau, Andreas Heuer) Materials – Zoom Breakout Room #4

  • ISO 23494: Biotechnology – Provenance Information Model for Biological Specimen and Data (Rudolf Wittner, Petr Holub, Heimo Müller, Joerg Geiger, Carole Goble, Stian Soiland-Reyes, Luca Pireddu, Francesca Frexia, Cecilia Mascia, Elliot Fairweather, Jason R. Swedlow, Josh Moore, Caterina Strambio, David Grunwald, Hiroki Nakae) Materials – Zoom Breakout Room #5

  • Machine Learning Pipelines: Provenance, Reproducibility and FAIR Data Principles (Sheeba Samuel, Frank Löffler, Birgitta König-Ries) Materials – Zoom Breakout Room #1

  • Explaining and Replaying Containers Using Provenance (Raza Ahmad, Madeline Deeds, Yuta Nakamura, Naga Nithin Manne, Tanu Malik) Materials – Zoom Breakout Room #6

2:45 pm - Provenance Week Townhall

ProvenanceWeek 2021

  • ProvenanceWeek 2021

Following successful past ProvenanceWeek events, ProvenanceWeek 2021 will again co-locate the IPAW and TaPP workshops as well as several satellite events that focus on novel directions for provenance.

Powered by Bootstrap 4 Github Pages