Benchmark Run Details

System Prompt

You are taking a vocabulary test. Your task is to select the word that best matches 
a given definition from a list of choices. Respond with only the correct word, nothing else.

Run Summary

Model gemini-2.5-flash-preview-04-17
Benchmark 0020_definitions
Normed Score 100
Run Timestamp 2025-04-24 19:21:03

Question-Level Details

Question ID Score Evaluation Time (ms) Debug Info
0020_definitions:batch_gemma2_9b:0 100 2033 { "response": "valley", "expected": "valley", "is_correct": true }
[+]
0020_definitions:batch_gemma2_9b:1 100 2209 { "response": "opal", "expected": "opal", "is_correct": true }
[+]
0020_definitions:batch_gemma2_9b:10 100 2157 { "response": "echo", "expected": "echo", "is_correct": true }
[+]
0020_definitions:batch_gemma2_9b:11 100 2127 { "response": "strength", "expected": "strength", "is_correct": true }
[+]
0020_definitions:batch_gemma2_9b:12 100 2330 { "response": "swift", "expected": "swift", "is_correct": true }
[+]
0020_definitions:batch_gemma2_9b:13 100 2594 { "response": "proud", "expected": "proud", "is_correct": true }
[+]
0020_definitions:batch_gemma2_9b:14 100 2239 { "response": "magic", "expected": "magic", "is_correct": true }
[+]
0020_definitions:batch_gemma2_9b:15 100 2981 { "response": "adventure", "expected": "adventure", "is_correct": true }
[+]
0020_definitions:batch_gemma2_9b:16 100 2649 { "response": "zeal", "expected": "zeal", "is_correct": true }
[+]
0020_definitions:batch_gemma2_9b:17 100 2222 { "response": "monster", "expected": "monster", "is_correct": true }
[+]
0020_definitions:batch_gemma2_9b:18 100 1882 { "response": "energy", "expected": "energy", "is_correct": true }
[+]
0020_definitions:batch_gemma2_9b:19 100 1624 { "response": "justice", "expected": "justice", "is_correct": true }
[+]
0020_definitions:batch_gemma2_9b:2 100 2326 { "response": "gentle", "expected": "gentle", "is_correct": true }
[+]
0020_definitions:batch_gemma2_9b:20 100 1744 { "response": "knight", "expected": "knight", "is_correct": true }
[+]
0020_definitions:batch_gemma2_9b:21 100 2454 { "response": "honor", "expected": "honor", "is_correct": true }
[+]
0020_definitions:batch_gemma2_9b:22 100 1128 { "response": "queen", "expected": "queen", "is_correct": true }
[+]
0020_definitions:batch_gemma2_9b:23 100 2043 { "response": "ignite", "expected": "ignite", "is_correct": true }
[+]
0020_definitions:batch_gemma2_9b:24 100 2516 { "response": "zeal", "expected": "zeal", "is_correct": true }
[+]
0020_definitions:batch_gemma2_9b:3 100 3594 { "response": "glory", "expected": "glory", "is_correct": true }
[+]
0020_definitions:batch_gemma2_9b:4 100 1891 { "response": "fighter", "expected": "fighter", "is_correct": true }
[+]
0020_definitions:batch_gemma2_9b:5 100 2512 { "response": "belief", "expected": "belief", "is_correct": true }
[+]
0020_definitions:batch_gemma2_9b:6 100 2510 { "response": "justice", "expected": "justice", "is_correct": true }
[+]
0020_definitions:batch_gemma2_9b:7 100 1374 { "response": "scared", "expected": "scared", "is_correct": true }
[+]
0020_definitions:batch_gemma2_9b:8 100 1655 { "response": "honest", "expected": "honest", "is_correct": true }
[+]
0020_definitions:batch_gemma2_9b:9 100 1834 { "response": "village", "expected": "village", "is_correct": true }
[+]