FDSI Community Needs Assessment (Evolving Draft)

Summary

The primary activity of our Kickoff Workshop that was held in Boulder on May 15-16 was breakout discussions. The topics of discussion were selected in a crowd-source manner from the pool of questions that had been posted to the GitHub site. We say crowd-source because participants dynamically signed up for the topics and thus the topics that were discussed were not dictated by the organizers, rather they were voted up by the community. This lead to lively discussions of the topics by the people most interested in them. Each breakout discussion had a scribe who captured the discussion and a reporter who reported back to the workshop participants during the reporting sessions. Recordings of the reporting sessions, and the full scribe reports are posted for remote access. Below, are summaries of these scribe reports into a single document that serves as a self-contained document. Please note that our hope in providing this document is to re-engage the discussion in advance of and especially AT the next workshop. The advance part can be done using the GitHub issue tracker. Note not only do we welcome more discussion of existing issues but you are also encouraged to start new issues for any topic that you feel is not already being addressed by the current list of issues. We have also started a general discussion forum for the FDSI community.

Issue: A-3, A-5, A-6 What are the greatest obstacles to extracting insight from experimental and computational data bases?

Description of Problem: The goals of data are problem-dependent such that abstraction and standardization of data is critical for broader implementation. Recurring questions exist for who is going to host and pay-for long-term storage and sharing of data. What is the best way for individuals to tap into the collective expertise of the whole community.
Potential Solutions: 1) Need to think about and organize/store/load data in a way that is accessible for both humans and machines, 2) need to incentivize the proper sharing and implementation of data between groups, 3) need to develop infrastructure and tools for accessing (large) data sets, and 4) need to engage more people in the computer science community.
Potential Role of FDSI: FDSI can: 1) promote community engagement across disparate fields, 2) provide minimal metadata standards and help to develop standardized interfaces, 3) facilitate better collaboration between experimentalists and software developers (e.g. provide examples, training and vetting of software tools), and 4) help to assure proper usage of data.

Issue: A-1 What are the greatest obstacles to comparing experimental and computational data sets?

Description of Problem: Many obstacles and differences exist, and were outlined, between experimental and computational data sets including: differences in spatial and temporal resolution, characterization of data set and instrumentation, error/uncertainty models, size and accessibility of data sets which all hinder collaboration.
Potential Solutions: 1) Better lock-step collaboration between simulations and experiments is needed throughout the course of a study (need to stop siloing efforts), 2) better understanding of the relative advantages and limitations are needed for each, and 3) better data assimilation techniques are needed to combine computational and experimental data.
Potential Role of FDSI: FDSI can help to build a framework and standards for: 1) sharing data sets, but doesn’t need to host, 2) integrated data analysis tools, and 3) the design of integrated studies including both computational and laboratory/field data.

Issue: C-1 How do we foster a community around FDSI (previously CFDSI) that recognizes all fluid dynamics needs equally, without disproportionate emphasis on CFD?

Description of Problem: This is a really important question, with the central issue related to: What is meant by use of the word “software”? Additionally how does this software serve as a “Gateway” for community constituents?
Potential Solutions: 1) Drop the “C” from “CFDSI,” 2) Surveys should be carried out broadly across the community, 3) develop a clear slogan or message: “Software infrastructure for the acceleration of analysis and prediction of fluid dynamics,” and 4) focus outreach or a sub-community workshop towards experimental fluid dynamics software.
Role of FDSI: FDSI can: 1) bring experimental and CFD researchers together and promote collaborative challenge problems and 2) potentially broaden access for underrepresented groups to participate at a high level in fluid dynamics research and education.

C-2 How can we distinguish modeling and discretization errors in data sets produced by simulation?

Description of Problem: The issue arises in particular when comparing simulation and experimental data, for instance, for model validation and that the simulation data is often subject to modeling errors (or unknowns) that are not present in the experimental data. Examples include any discrepancy in boundary and initial conditions between the actual experiment and simulation model, or when the physics of the problem at hand is not fully understood.
Potential Solutions:1) One can’t determine these errors post-simulation, provenance of discretization error should be incorporated in the simulation data and 2) For certain possible scenarios, e.g., RANS on simple flows, discretization errors can be quantified with a grid convergence study. Then what errors are left are modeling errors.
Potential Role of FDSI: FDSI can 1) develop software tools to identify the discretization error, 2) facilitate comparisons between codes (shorter runs with specified initial conditions), 3) provide recommendations and software for tracking and reporting discretization error estimates, 4) serve as a conduit for infusing method of manufactured solutions to the broader, 5) support data-driven models, 6) encourage better dialog with experimentalist to design experiments to expose assumptions in modeling errors and modeling hierarchy, and 7) define standards for each simulation to export to a standard, unified query.

Issue: C-3 Prediction and Calibration -- standardizing protocols for using data from experiments and observations for robust prediction.

Description of Problem: At the core of the difficulties associated with the calibration of computational models for the purpose of prediction are that 1) calibration is a statistical inference problem that is hard to solve/understand, 2) the actual meaning of data may not be clear -- data semantics, 3) often one has to deal with recursive thinking, i.e., one may run into something that cannot be predicted, which then necessitates rethinking/re-calibrating the model, and 4) often models are used for predictions in the regimes that haven’t been calibrated for.
Potential Solutions: 1) Devise protocols for provenance and calibration needs to be established, 2) Use common language to describe and quantify errors/uncertainty among experiments and computation, 3) Provide more complete reporting of error? Elevate folks who do this. Provide software to make this happen, 4) Incentivize reproducibility for experiments and software, and 5) Data should be stamped each time it is processed -- Data passport.
In addition the notion of “data quality hierarchy” was proposed with the following levels: 1) Baseline - data is posted on the website; some metadata, 2) Next tier - Provide some information about data processing, and 3) Higher tier - Good characterization of uncertainty; open source pipeline all the way to the “raw” sources.
Potential Role of FDSI: FDSI should consider 1) bringing together experimental and computational collaborations, 2) curating tools for processing data and establishing provenance standards, and 3) devising a common language to describe and quantify errors/uncertainty among experiments and computation.

Issue: C-4 What are the best topics for the sub-community workshops for FDSI (previously CFDSI)?

Description of Problem: The vision of FDSI is so broad that the sub-communities could focus in many different directions, clearly defined boundaries are needed and the outcome of the NSF grant should be a definition of community needs not a proposal for an institute.
Potential Solutions and Role of FDSI: Identify the “potential” services that FDSI can provide to the community: 1) Software Development & Sustainability, 2) Education, Training & Outreach, 3) Software for Data Creation, 4) Software for Data Exchange, 4) Software for Data Inquiry, Insight, & Discovery.

Issue: B-1 Can common software be developed for data analysis of fields from higher order simulations?

Description of Problem: High-order methods span a wide range of technologies with differing geometric and analysis descriptions, so there has been no universal “standardization” for pre- and post-processing.
Potential Solutions and Role of CFDSI: 1) Standardizing inputs for pre-/post-processing software, 2) standardizing or at least cataloguing language, 3) develop of interpreter software for “function standardization” rather than “data standardization”, and 4) development of standardized descriptions of high-order field data. Additional, educational modules can be developed to help young researchers get started with and understand high-order methods.

Issue: B-2 How can we/funding agencies better incentivize code authors to invest in code dissemination and community building?

Description of Problem: There are three major obstacles to code dissemination: (i) lack of resources (it costs money to share/maintain code), (ii) non-supportive culture (there are not good mechanisms for recognizing and rewarding shared code), and (iii) lack of training.
Potential Solutions: 1) Incentives, including means of publishing about software and supplemental funding, 2) cultural shift, including listing and citing software in both publications and grant proposals, and 3) and dedicated help, such as the creation of a dedicated entity which could collect community-useful software and provide training.
Role of FDSI: FDSI can: 1) recommend that NSF start asking people to list open-source software and data sets, 2) provide training in modern software packages, and 3) collect community-useful software. 4)Full time software engineers hired to catalyze/assist shared code sustainability.

Issue: B-4 How do we create and sustain a library of benchmark problems of varying complexity and which span the range of model, simulation, prediction, and UQ for public comparison of methods?

Description of Problem: The discussions around this topic were primarily focused on issues associated with (i) identification of benchmark topics to attract high level of contributions, (ii) the precise definition of benchmark problems, and (iii) rigorous comparison of codes, data analysis methods, etc. In particular, the following problems were identified and there was a strong agreement among the contributors of this group that the institute should address them: 1) Comparing experiments and CFD is often challenging due to potential differences in, for instance, boundary conditions, 2) Lack of adequate funding to work on benchmark problems, 3) Metric for comparison, i.e., accuracy and/or performance?, and 4) How should software (not necessarily physics) benchmark problems be defined?
Potential Solutions: Several potential directions and solutions were discussed that the group thought could address some of the challenges associated with creating and sustaining a library of benchmark problems. These included: 1) Identify gaps and shortcomings in existing benchmarks, 2) Identify a broad set of benchmark problems that spans the audience we want to reach, 3) Identify benchmark problems with increasing physical complexity, 4) Test code, data analysis methods, etc. on the same benchmark problems to allow for proper comparison, 5) Consider benchmark problems lending themselves to models with multiple fidelity levels (for optimization and UQ), and 6) Consider benchmark problem sets with increasing physical complexity.
Potential Role of FDSI: Several avenues were identified through which the FDSI could impact the definition of benchmark problems. In particular, the group felt that the FDSI could: 1) Encourage the community at large to cultivate benchmarks focused on answering questions of interest, 2) Identify the specific questions that benchmarks are attempting to answer, 3) Catalogue and put benchmarks in a coherent structure, 4) Provide guidelines to measure the performance of software, in terms of computational efficiency and accuracy, 5) Encourage open source software, and 6) Develop software benchmark routines to apply to “physics” benchmark problems.

Issue: B-7 The 3 Rs: Robustness, Reliability, and Reproducibility in CFD

Description of Problem: There are several issues associated with the three Rs: reliability, typically seen as convergence, may be difficult to measure, 2) reproducibility is machine, compiler, and library dependent, and 3) metrics of reproducibility are problem dependent.
Potential Solutions: 1) Comprehensive regression test suites and 2) storage of CFD archival data along with the code that generated it.
Role of FDSI: FDSI can: 1) enforce good practices for storing data, 2) post data along with software and run-time parameters used to generate the data, 3) in the case when data cannot be stored, post restart files, and 4) establish a set of rules for admitting a data set.

Issue: C-5 Scientific visualization of large fluid dynamics dataset should play a key role (“seeing is believing”) not only as a research tool but also as educational and outreach components.

Description of Problem: There are several issues with visualization: 1) there is often a technological and educational gap between simulation and visualization tools, 2) there is a lack of standardization of outputs that promote visualization, 3) funding agencies and centers are compute centric not visualization centric, and 4) visualization presents opportunities for outreach that are often not taken advantage of.
Potential Solutions: Bring together simulation scientists with data scientists (visualization experts) to ask the right questions for standardization.
Role of FDSI: FDSI can: 1) build a community for the purposes of education, sharing techniques and software, and building a catalogue, 2) build interfaces between CFD codes and visualization tools, 3) serve as a community interface between the solver and visualization communities, 4) share visualization outreach success stories, and 5) encourage open-source distributions of visualization tools.

Issue: D-1 How can we facilitate better integration of modeling and simulation with experimental studies, especially to facilitate UQ?

Description of Problem: Traditionally there are several barriers in bringing modeling/simulation and experimental studies together. These primarily stem from the fact that: 1) Output of simulations chosen with different concepts in mind, 2) Simulation and experimental outputs are probabilistic, but have different character, 3) Exp/comp communities operate separately, thus processing and errors may not be reported to the level that is required for successful UQ, 4) Issues that seem obvious to report to one community may not be obvious to another, and 5) Objectives may be different, i.e., scientific observations vs. engineering models, which then highlights the need for common incentives.
Potential Solutions: 1) Better communication among the experimental and computational FD researchers, 2) Comparison of nominally identical designs - experimental and computational - investigate sensitivity to tolerance, etc., and 3) Create benchmark problems.
Role of FDSI: FDSI can: 1) Define benchmark problems (through a summer program) and encourage broad participation from both sides. Perhaps analyze synthetically altered dataset (one comp, one exp?) - add noise, subsample, different accuracy of observation, etc., 2) Encourage/facilitate model-informed experimental design collaborations, 3) Organize UQ workshops that are attended by both experimental and computational researchers, and 4) Facilitate the creation of a formal course (MOOC?) covering such topics as model calibration, validation, and UQ.

Issue: D-4 Aside from providing a benchmark data library, what are the other critical software challenges faced by experimentalists within the fluid dynamics community?

Description of Problem: There is not today a repository of software tools, including those that are open-source, and experimental software is developed on an as-needed basis typically during post-processing. Moreover, much existing software is often measurement type/instrument specific. We should identify: (i) Are there general categories of algorithms that support different types of diagnostics? (ii) Is there a catalog of measurement techniques, for each of which one can develop a sustainable piece of software. (iii) What is the role of the using artificial signals (e.g., simulation data) to evaluate experimental techniques?
Potential Solutions: We can use synthetic (e.g., computational or simulation) data to benchmark experimental software tools (for verification). We should also begin to consider elements of measurement ‘tools’ as ‘software’. As part of this reconception, we should encourage the creation of experimentally focused open-source software tools to supplement the commercial tools that are available. Similarly, experimentalists should be willing to share their software/post-processing techniques with other experimentalists and computationalists for clarity and to best replicate the presented results. The full processing chain (including data corrections) and measurement parameters should also be incorporated into the meta-data of archival data.
Role of FDSI: The institute can facilitate sharing and dissemination of software between different groups, as well as vetting of the software. Synthetic data can be provided to aid the development and reliability of the software used for processing experimental data. The institute can also outline good software development practices, including the requirement that meta data should include not only the full measurement parameters, but the post-processing code utilized.

Issue: B-5 What is the center's goal towards developing/extending a standard for CFD and experimental FD data?

Description of Problem: Currently data are produced and shared in many different formats, and it is not clear at what level standardization should take place (e.g., format, interface, tools, etc.)? Moreover, it will be difficult to account for the extreme variability of experimental data types, as opposed to the relative uniformity of CFD data. However, even within CFD we must find a way to handle volumes, elements, structured, unstructured, and adaptive mesh data. We also need to define a metadata standard, but we must educate the community on what is needed in metadata, the utility of different formats, etc.
Potential Solutions: Converter codes could be developed to produce one format from many different data types, as well as improved documentation for different data types. Metadata should always be included to document what has been done and to identify data corrections that may not be obvious. If corrections have been applied, we should publish raw data, corrected data, and details of correction.
Role of FDSI: We should first explore what is out there in terms of standard data formats, and what the relative benefits are. From this, we will create a document that gives pros and cons of different formats (educational aspect), and identify a standard data format, possibly after conversion. Most importantly, FDSI should facilitate the discussion of what level we standardize, single format for shared data, many formats possible for a uniform set of tools, etc. If conversion tools are used, the institute can create, test, and host those tools.

Issue: D-7 CFD ecosystem contains -- private research tools, open source research software, open source software and commercial tools. How will CFDSI facilitate each of these sectors to support and sustain each other?

Description of Problem: How does one solve problems through a mix of private, open-source, and commercial software tools, and how can the institute help when using various sources of software? How do we determine what tools are available and how can we make them work together? Students don’t typically know how to best integrate libraries, so instead they write software from scratch that is not as good as it could be.
Potential Solutions: Education of researchers and students is critical, as well as documentation of available software, guidance for selection of libraries and software tools based on task, and APIs.
Role of FDSI: The institute can promote standardized APIs that ease the burden and complexity of integrating different software tools and can clearly define open interfaces (not necessarily open-source) for interoperability of individual tools. The institute should promote communication among labs, industry, and academia and educate students/researchers how to develop and work with APIs. We can also provide overviews/catalogues of available software, as well as guidance for how to combine different tools to solve problems. Institute members should become members of hardware and middleware standardization committees, as well as language standardization committees.

Issue: D-2 What new science and broader impacts will FDSI enable?

Description of Problem: Untapped public enthusiasm for fluids, as well as potential advances in education at the undergraduate and graduate levels. Curriculum development for professionals (i.e., industry) is also lacking.
Potential Solutions: It is essential to ensure uniform access to resources, since this can enhance socio-economic mobility for under-represented groups. We must also increase exposure to fluid dynamics for non-traditional and underrepresented participants, as well as lower the barrier to entry for new entrants into research field. Efforts should be made to popularize fluid dynamics to other disciplines (e.g., media and science relations committee) and also to highlight unconventional applications (e.g., collaborations with other fields). It would also be helpful to create organized tutorials for fluid dynamics.
Role of FDSI: Support underrepresented institutions with infrastructure (inclusiveness). Connect with the APS-DFD Media and Science Relations Committee. Combinations of experiments and simulations, better harmonization in education. Apply FD techniques into broader topics [other disciplines?]. Enable more complexity. Enable impacts in other disciplines: Geophysical, material properties, weather, climate, drones, personal air mobility, energy systems, biomimicry, patient specific modeling, fog water harvesting, microfluidics, human impact on environment, national security. Enable everyone to access high fidelity experiments and simulations to blur lines and artificial barriers to collaboration. Include the biofluids and geo-fluids (meteorology, oceanography) communities. Fusing different types of data to make impact. Offer a wide range of inverse problems to be solved.

Search

Other ways to search:

Breadcrumb