ARISEKit
Medical AI benchmark toolkit

From clinical question to validated benchmark

ARISEKit is a toolkit that helps clinicians, researchers, and hospital teams design, plan, and build rigorous medical AI tools — without starting from a blank file. A clinician with an idea (“can ChatGPT tell me which of my patients need ACL repair?”) and a de-identified dataset should be able to finish with a runnable,

MAST
-compatible benchmark, plus the human-rater plan needed to validate it.

What ARISE Kit covers

Three pieces of the stack, one workflow

Three pieces fit together. Claude provides reasoning, the

FHIR
MCP
layer gives the agent eyes and hands on patient data, and ARISEKit evaluates the output.

01.Design

Author the product and the agent

Start from a clinical or admin archetype, customize the UI, and write the Claude

Skill
that defines how your agent behaves, all without leaving the browser.

02.Tools

Connect to
FHIR
data

Wire

MCP
-style tools so your agent can read and act on patient and operational data.The catalog is pre-built for workshop speed; the patterns transfer to production.

03.Evaluation

Score before you ship

Turn rubrics, test cases, and a judge model into a quantitative eval. Know whether your agent is actually good, not just impressive in a demo.

Hands-on workshop

Ready to build something?

The ARISE Kit workshop walks you through the full loop in 90–120 minutes: design, tooling, eval, and take-home artifacts. No local setup required.

Enter the workshop
v0.3.28