❗Submission Form is now open! ❗
Objective
We are collecting challenging questions that will test AI systems’ ability to surface and reason over key information necessary for developing personalized therapeutic approaches for patients with genetic disease.
Your submission will contribute to a benchmark for evaluating AI capabilities in therapeutic actionability. If your question is selected for inclusion, your name will be associated with it in the dataset, and you will be invited as an author on the corresponding paper.
Submission Process
-
Develop Your Question: Create a question in English focused on therapeutic actionability for genetic diagnoses. This could involve identifying appropriate treatments, assessing clinical trial eligibility, interpreting genetic information relevant to treatment, or applying principles of drug development to rare diseases. Review the contest description including the Frequently Asked Questions subsection to understand scope.
-
Test with Current AI Systems: We recommend testing your question with available AI systems to gauge its difficulty level. Questions that stump current models or reveal common LLM “hallucinations” (false trial IDs, numbers, etc.) are especially valuable for our benchmark.
-
Provide a Comprehensive Solution: Include a detailed yet concise explanation of the correct answer, including reasoning steps and necessary information sources.
-
Submit for Review: After submission, your question will undergo expert review to ensure quality, accuracy, and alignment with the challenge objectives.
-
Publication and Attribution: Selected questions will be included in the dataset with proper attribution to you as the author. Contributors with more accepted questions will be listed earlier in the author list of the resulting paper. Note that although your question may be kept private for detecting AI memorization, you will still receive credit.
Please review our example questions to understand the expected format and challenge level.
Guidelines
1. Original & Interesting
- Questions must be original and authored by you (not copied from other sources).
- Consider submitting multiple questions with minor detail modifications (e.g., specific variant, patient age) that change the answer
- Write questions that you’d find impressive if an automated system could consistently and accurately answer. Consider testing current LLMs to confirm your questions present meaningful challenges!
2. Medically Relevant
- Questions should reflect realistic therapeutic decision points.
- Focus on information that would help clinicians and researchers make treatment decisions.
- Questions should span the full spectrum of therapeutic strategies (approved therapeutics, clinical trials, off-label drug repurposing opportunities, and personalized therapeutic feasibility).
3. LLM-Challenging
The following suggestions are based on known LLM failure modes. Try to design questions that:
- Require integration or comparison of information from different sources (e.g., public databases that may not be fully indexed by web crawlers).
- Necessitate multi-step reasoning about the potential for variant pathogenicity, genetic mechanisms of action, or clinical applications.
- Have uncommon correct answers (e.g., from peer-reviewed publications or GeneReviews) that contradict popular or common misinformation.
- Involve different identifiers, such as a preclinical drug name that differs from an approved drug name.
4. Objective & Close-Ended
- Questions must have answers that would be accepted by other experts with relevant expertise.
- All necessary context and definitions must be included within the input JSON object.
- Good: Answers should be specific, unambiguous, and concise (e.g., “Yes”/”No”, gene name, numeric values).
- Bad: Questions asking to “explain,” “discuss,” or “describe” are unsuitable.
Categories
Questions will be crafted to elicit short, scorable answers suitable for automated evaluation. Example questions are below.
Category | Tasks | Example Questions |
---|---|---|
Established, Targeted Therapies | Accurately retrieve and apply key information regarding established therapeutics for this genetic diagnosis. | ![]() ![]() |
Established, Supportive Therapies | ” | ![]() |
Clinical Trials | Identify ongoing clinical trials and assess for patient eligibility based on available patient information. | ![]() ![]() |
Drug Development and Repurposing | Reasoning over over drugs, targets, pathways, and assays. | ![]() ![]() |
Variant Assessment | Applied / molecular biology reasoning with a focus on questions relevant to determining feasibility of precision therapeutics | ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Examples
Example 1: Eligibility for OTC-HOPE Trial Based on Gestational Age
- Patient Genotype:
- Gene: OTC
- Transcript: NM_000531.5
- Variant (cDNA): c.386G>A
- Variant (protein): p.Arg129His
- Zygosity: hemizygous
- Patient Phenotype: 3-month-old male infant with hyperammonemia, lethargy, and poor feeding requiring hospitalization shortly after birth. The patient was born at 35 weeks’ gestation and was diagnosed with OTC deficiency following genetic testing. Currently on protein-restricted diet, arginine supplementation, and alternative pathway therapy.
- Question:
- Category: Clinical Trials
- Prompt: Based on this patient’s gestational age at birth and genetic status, do they meet the inclusion criteria for the OTC-HOPE trial (NCT06255782) testing ECUR-506 gene therapy? Answer “Yes” or “No”.
- Expected Answer: No
- Primary Data Source:
- ClinicalTrials.gov
- Literature on OTC defiency
- Other (Specific URL): https://clinicaltrials.gov/ct2/show/NCT06255782
- Required Resources: ClinicalTrials.gov is essential for accessing current trial eligibility criteria.
- Answer Explanation: Although the patient has a pathogenic variant in OTC and a phenotype consistent with OTCD, they do not meet key inclusion criteria for the OTC-HOPE trial. Specifically, the trial requires a gestational age of ≥ 37 weeks at birth, and this patient was born at 35 weeks’ gestation. This requires combining knowledge of the specific trial inclusion criteria with the interpretation of the genetic variant and patient history.
Note: This example shows how a small change in the patient details (gestational age at birth) would alter the eligibility answer. A companion question with a patient born at term (≥37 weeks) would have the opposite answer.