The Government is requesting information on current capabilities and critical gaps related to artificial intelligence test and evaluation.
The Chief Digital and Artificial Intelligence Office (CDAO)
Test and Evaluation (T&E) Directorate supports the testing of a variety of
Artificial Intelligence (AI) and Machine Learning (ML) applications throughout
the Department of Defense (DoD). To enable and accelerate AI testing throughout
the DoD, CDAO T&E is funded to develop the Joint AI Test Infrastructure
Capability (JATIC), a suite of interoperable software tools for comprehensive
AI model T&E.
In order to begin this work, CDAO is calling on subject matter
experts to provide input on priorities and gaps for AI T&E in both industry
and the government, as well as existing products and solutions (particularly
open-source software) that the Department can leverage for the development of
this capability. Information gained from this RFI will be used to inform the
requirements and directions of work for JATIC contracts in FY23.
The focus of this RFI is the T&E of AI/ML models for
Computer Vision (CV) classification and object detection problems. All
questions below refer exclusively to T&E of AI models for CV classification
and object detection. Other areas of T&E, such as systems integration,
human-machine, and operational T&E, as well as T&E of other AI
modalities, such as autonomous agents and natural language processing, are
pressing and complex issues, but are out of scope for this RFI.
If more information is needed, CDAO T&E will follow-up
to specific responses with further requests.
General
Question:
What
is your business type and size, to include specific classification (e.g.,
SDVSOB, etc.) if a small business?
CDAO
Questions:
The
JATIC effort has identified the following five priority dimensions for AI
T&E:
- Performance
Measures:
given a labeled test set, compute standard measures of performance, including:
- Accuracy,
precision, recall and other CV metrics to assess how the model performs on the
prediction task
- Expected
calibration error, reliability diagrams and other probability calibration
metrics to assess the reliability of the model's measure of predictive
uncertainty
- Model
throughput, resource usage and other similar metrics to assess the efficiency
and computational needs
- Robustness
to Natural Shifts in Data: assess how performance changes as
data in the original test set is corrupted using natural perturbations,
including:
- Pre-sensor,
environmental or physical corruptions (e.g., fog, snow, rain, changes in target
shape or dimensions)
- Sensory
corruptions (e.g., out-of-focus, glare, blur)
- Post-sensor,
in-silico corruptions (e.g., Gaussian noise, digital compression)
- Robustness
to Adversarial Attacks: assess how performance changes as
data in the original test set is corrupted using adversarial strategies, with
characteristics of the attack described by varying dimensions, such as:
- White-box
vs. black-box attack
- Pre-defined
vs. adaptive attacks
- Lp
norm-constrained vs. physically-realizable perturbations
- Empirical
vs. certified attack
- Model
Analysis:
facilitate deeper insight into model performance, such as:
- Reporting
performance on known partitions of the dataset given available metadata
- Automated
clustering of data inputs that result in similar outputs, to understand
potential sources of error or high performing regions
- Dataset
Analysis:
evaluate the quality of a dataset, including:
- Testing
for class imbalance or biases in the dataset
- Quantifying
the similarity between two datasets
- Assessing
the sufficiency (e.g., number of samples and variation) of a dataset
- Detecting
outliers, anomalies, label errors, or data poisoning
The
above five AI T&E dimensions are by no means the only dimensions of AI
T&E. We strongly encourage you to add up to three other AI T&E
dimensions in which you believe there is a critical capability gap. For
any added AI T&E dimensions, please provide a short description of the
dimension.
For
each of the above dimensions on which your company possess expertise
(including any added dimensions), please provide the following information:
- Please
briefly describe your understanding of the maturity of research in this
dimension.
- What
are existing software products and capabilities in this dimension? Which, if
any, are open-source capabilities?
- Do
gaps exist in currently available capabilities? If so, what are the gaps?
- Where
or how could value be provided to the existing state-of-practice in this
dimension?
- If
you have a product or expertise that could provide a solution that falls within
this dimension, please describe it.
This
is a Call for Information. This is a request for information only, not a
solicitation
for proposals, quotes, or bids. Information received as a result of this
Call for Info will be used for market research purposes. No award will
result from the Call at
this time. Responses are not offers and cannot be accepted by the
Government to
form a binding contract. No classified, confidential, or sensitive
information
shall be included in your response. Proprietary information, if any,
should be
clearly marked.
Any statement submitted in response to this Call should be no
more than eight (8) pages (not including a cover sheet), single-spaced, using
Times New Roman 12-point font.