Exemplar Reports

Name ID Type Description Report
400-Word Essay on the Wars of the Roses wars_of_roses_essay knowledge Tests the model's ability to write a concise, informative historical essay with proper structure on a well-known historical topic. View Report
Free-Form Definition of 'Granite' granite_definition linguistic Tests the model's ability to provide freeform word definitions, including translations and examples. View Report
JSON-Schema Comprehensive Word Definition comprehensive_definition linguistic Tests the model's ability to provide structured-JSON word definitions, including example sentences and translation. View Report
Sonnet about Daffodils in Spring daffodil_sonnet creative Tests the model's ability to generate structured poetry following formal constraints while conveying specific imagery and themes. View Report
Generate and Score Poker Hands poker_hand_scorer coding Tests the model's ability to create a basic algorithm involving playing cards, as well as knowledge of the common game of poker. View Report
Firefighter Break Room Dialogue firefighter_conversation creative Tests the model's ability to create authentic dialogue. View Report
Neighborhood Logic Puzzle neighborhood_puzzle reasoning Tests the model's ability to solve a complex logic puzzle by tracking multiple constraints and making deductions. View Report