Industry-Leading Accuracy and Comprehensiveness Scoring highest across multiple dimensions
Methodology We conducted a blind, randomized study with biomedical researchers and clinicians, recruiting participants via User Interviews between October 15 and 29, 2023. Each subject-matter expert was assigned a specific set of tasks aligned with their expertise and were asked to evaluate two randomly selected syntheses: one generated by System and the other by OpenAI's GPT-4. Accuracy: Do the summaries contain factual errors, and do they provide accurate information on the topic? Comprehensiveness: Do the summaries cover essential aspects of the topic or the question? Is there any key information missing from the summaries? Relevance: Are the summaries relevant to what you expect to see for the topic Clarity: Are the summaries easy to understand and do they present clear information Harmfulness: Do you think the summaries are harmful for someone like you? Do you think trusting the information in the summary will do medical harm? Researchers and clinicians prefer System Pro's synthesis Taking accuracy, completeness, relevance, helpfulness, and clarity into account, 70% of experts prefer System Pro’s synthesis over other AI-assisted research tools. System 68% Commercial Product 1 32% Methodology We conducted a randomized single-blind study with researchers and clinicians. Users were recruited on User Interviews between October 1-15, 2023. Each subject-matter expert was assigned a set of tasks relevant to their domain of expertise. For each task, users were asked to compare two randomly assigned syntheses: one generated by System and the other by another commercial product. They were then instructed to choose the better synthesis taking into account multiple dimensions (accuracy, completeness, clarity, relevance, and helpfulness). After each selection, users were required to provide a reason for their choice. Prior to data collection, a statistical power analysis was conducted to estimate the amount of survey data needed. The presented results are based on 144 responses by 33 unique participants. The most accurate, comprehensive, and relevant research synthesis on the market Methodology 50 search queries done by System Pro users from June-October 2023 were used to create a dataset of syntheses from System, Commercial Product #1, and Commercial Product #2. A new gold standard for explainability in AI-assisted research The most citations
The most depth
The most breadth
Methodology A representative sample of 50 searches conducted by System Pro users between May and September 2023 was created. To compare System Pro with Commercial Product #1, we conducted the same search query and recorded the resulting summary and citations. Searches were done in September 2023. Commercial Product #2 does not directly synthesize search results, as it relies on a question to generate an answer. To make a direct comparison, we utilized the sections of System’s synthesis for a specific search query (for example, for user query of “SLE and b-cell depletion” System Pro generated the following sections: “Overview”, “Role of B-cells in SLE“, “B-cell depletion therapies”, “Efficacy of B-cell depletion in SLE”). We generated a question for each section using OpenAI's GPT-4 and asked Commercial Product #2 that question (in the example above, for the section called “B-cell depletion therapies” GPT-4 generated the following question: “What are the different B-cell depletion therapies used in the treatment of SLE?”). We then saved the resulting summary and articles. On average, it took 4.9 searches on Commercial Product #2 to generate a comparable summary. (责任编辑:) |