Name |
ID |
Type |
Description |
Report |
400-Word Essay on the Wars of the Roses |
wars_of_roses_essay |
knowledge |
Tests the model's ability to write a concise, informative historical essay with proper structure on a well-known historical topic. |
View Report |
Free-Form Definition of 'Granite' |
granite_definition |
linguistic |
Tests the model's ability to provide freeform word definitions, including translations and examples. |
View Report |
JSON-Schema Comprehensive Word Definition |
comprehensive_definition |
linguistic |
Tests the model's ability to provide structured-JSON word definitions, including example sentences and translation. |
View Report |
Sonnet about Daffodils in Spring |
daffodil_sonnet |
creative |
Tests the model's ability to generate structured poetry following formal constraints while conveying specific imagery and themes. |
View Report |
Generate and Score Poker Hands |
poker_hand_scorer |
coding |
Tests the model's ability to create a basic algorithm involving playing cards, as well as knowledge of the common game of poker. |
View Report |
Firefighter Break Room Dialogue |
firefighter_conversation |
creative |
Tests the model's ability to create authentic dialogue. |
View Report |
Neighborhood Logic Puzzle |
neighborhood_puzzle |
reasoning |
Tests the model's ability to solve a complex logic puzzle by tracking multiple constraints and making deductions. |
View Report |