Language Model Benchmark Results

Last updated: April 24, 2025

"Exemplar" results

Model Type

Model Size

Benchmark Category

Benchmark
Llama 3.2 1B Local 710 MB

Model: lmstudio/mlx-community/llama-3.2-1b-instruct

Launch: 2024-09-25

Size: 710 MB

License: Llama 3.2 License

Gemma 3 Small Local 815 MB

Model: gemma3:1b:Q4_K_M

Launch: 2025-03-12

Size: 815 MB

License: Gemma License

QWEN25 Small Local 986 MB

Model: qwen2.5:1.5b:Q4_K_M

Launch: 2024-09-15

Size: 986 MB

License: Apache License

Gemma2 Small Local 1600 MB

Model: gemma2:2b:Q4_0

Launch: 2024-06-07

Size: 1600 MB

License: Gemma License

SmolLM 2 Local 1800 MB

Model: smollm2:1.7b:Q8_0

Launch: 2024-10-31

Size: 1800 MB

License: Apache License

Llama 3.2 Local 2000 MB

Model: llama3.2:3b:Q4_K_M

Launch: 2024-09-25

Size: 2000 MB

License: Llama 3.2 License

QWEN3 4B Local 2600 MB

Model: qwen3:4b:Q4_K_M

Launch: 2025-04-28

Size: 2600 MB

License: Apache License

Gemma 3 Local 4300 MB

Model: gemma3:4b:Q4_K_M

Launch: 2025-03-12

Size: 4300 MB

License: Gemma License

QWEN 2.5 Local 4700 MB

Model: qwen2.5:7b:Q4_K_M

Launch: 2024-09-15

Size: 4700 MB

License: Apache License

Granite 3.3 Local 4900 MB

Model: granite3.3:8b:Q4_K_M

Launch: 2025-04-16

Size: 4900 MB

License: Apache License

Granite 3.3 8B Local 4940 MB

Model: lmstudio/granite-3.3-8b-instruct

Launch: 2025-04-17

Size: 4940 MB

License: Apache License

Gemma 2 Local 5400 MB

Model: gemma2:9b:Q4_0

Launch: 2024-06-27

Size: 5400 MB

License: Gemma License

Gemma3 12B QAT Local 7740 MB

Model: lmstudio/lmstudio-community/gemma-3-12b-it-qat-gguf/gemma-3-12b-it-qat-q4_0.gguf

Launch: 2025-04-18

Size: 7740 MB

License: Apache License

Phi 4 Local 9100 MB

Model: phi4:14b:Q4_K_M

Launch: 2025-01-08

Size: 9100 MB

License: MIT License

Claude 3.5 Haiku Remote 11919 MB

Model: claude-3-5-haiku-20241022

Launch: 2024-10-22

Size: 11919 MB

License: Closed Model

GPT-4.1 nano Remote 12102 MB

Model: gpt-4.1-nano-2025-04-14

Launch: 2025-04-14

Size: 12102 MB

License: Closed Model

GPT-4.1 mini Remote 16382 MB

Model: gpt-4.1-mini-2025-04-14

Launch: 2025-04-14

Size: 16382 MB

License: Closed Model

GPT-4o-mini Remote 16383 MB

Model: gpt-4o-mini-2024-07-18

Launch: 2024-07-18

Size: 16383 MB

License: Closed Model

Gemini 2.5 Flash Remote 18101 MB

Model: gemini-2.5-flash-preview-04-17

Launch: 2025-04-17

Size: 18101 MB

License: Closed Model

Word Length
Language

ID: 0011_word_length

Description: A benchmark to evaluate a model's ability to count the total number of letters in a given word.

7 254.3ms

Score: 7/100

Avg Time: 254.3ms

Run ID: 110

Click for detailed results

52 289.6ms

Score: 52/100

Avg Time: 289.6ms

Run ID: 112

Click for detailed results

5 500.7ms

Score: 5/100

Avg Time: 500.7ms

Run ID: 111

Click for detailed results

32 380.7ms

Score: 32/100

Avg Time: 380.7ms

Run ID: 104

Click for detailed results

60 500.3ms

Score: 60/100

Avg Time: 500.3ms

Run ID: 106

Click for detailed results

75 1507.2ms

Score: 75/100

Avg Time: 1507.2ms

Run ID: 246

Click for detailed results

77 542.2ms

Score: 77/100

Avg Time: 542.2ms

Run ID: 109

Click for detailed results

72 996.0ms

Score: 72/100

Avg Time: 996.0ms

Run ID: 107

Click for detailed results

20 1386.0ms

Score: 20/100

Avg Time: 1386.0ms

Run ID: 232

Click for detailed results

17 1193.5ms

Score: 17/100

Avg Time: 1193.5ms

Run ID: 254

Click for detailed results

70 1358.8ms

Score: 70/100

Avg Time: 1358.8ms

Run ID: 105

Click for detailed results

75 2601.2ms

Score: 75/100

Avg Time: 2601.2ms

Run ID: 281

Click for detailed results

100 2018.5ms

Score: 100/100

Avg Time: 2018.5ms

Run ID: 108

Click for detailed results

100 1061.0ms

Score: 100/100

Avg Time: 1061.0ms

Run ID: 137

Click for detailed results

100 430.6ms

Score: 100/100

Avg Time: 430.6ms

Run ID: 174

Click for detailed results

100 906.3ms

Score: 100/100

Avg Time: 906.3ms

Run ID: 175

Click for detailed results

85 623.3ms

Score: 85/100

Avg Time: 623.3ms

Run ID: 113

Click for detailed results

100 1101.4ms

Score: 100/100

Avg Time: 1101.4ms

Run ID: 201

Click for detailed results

Letter Count
Language

ID: 0012_letter_count

Description: A benchmark to evaluate a model's ability to count how many times a specific letter appears in a word.

26 292.3ms

Score: 26/100

Avg Time: 292.3ms

Run ID: 90

Click for detailed results

34 289.9ms

Score: 34/100

Avg Time: 289.9ms

Run ID: 92

Click for detailed results

20 514.9ms

Score: 20/100

Avg Time: 514.9ms

Run ID: 91

Click for detailed results

10 311.4ms

Score: 10/100

Avg Time: 311.4ms

Run ID: 84

Click for detailed results

50 415.0ms

Score: 50/100

Avg Time: 415.0ms

Run ID: 86

Click for detailed results

48 1500.3ms

Score: 48/100

Avg Time: 1500.3ms

Run ID: 247

Click for detailed results

16 723.3ms

Score: 16/100

Avg Time: 723.3ms

Run ID: 89

Click for detailed results

30 859.5ms

Score: 30/100

Avg Time: 859.5ms

Run ID: 87

Click for detailed results

42 1999.7ms

Score: 42/100

Avg Time: 1999.7ms

Run ID: 233

Click for detailed results

44 2756.1ms

Score: 44/100

Avg Time: 2756.1ms

Run ID: 255

Click for detailed results

40 1067.4ms

Score: 40/100

Avg Time: 1067.4ms

Run ID: 85

Click for detailed results

52 4226.2ms

Score: 52/100

Avg Time: 4226.2ms

Run ID: 282

Click for detailed results

38 2209.5ms

Score: 38/100

Avg Time: 2209.5ms

Run ID: 88

Click for detailed results

44 937.7ms

Score: 44/100

Avg Time: 937.7ms

Run ID: 138

Click for detailed results

44 643.6ms

Score: 44/100

Avg Time: 643.6ms

Run ID: 176

Click for detailed results

68 739.8ms

Score: 68/100

Avg Time: 739.8ms

Run ID: 177

Click for detailed results

52 621.2ms

Score: 52/100

Avg Time: 621.2ms

Run ID: 93

Click for detailed results

96 1246.1ms

Score: 96/100

Avg Time: 1246.1ms

Run ID: 202

Click for detailed results

Spell Check
Language

ID: 0015_spell_check

Description: A benchmark to evaluate a model's ability to identify misspelled words in a sentence and provide their correct spelling.

69 413.7ms

Score: 69/100

Avg Time: 413.7ms

Run ID: 19

Click for detailed results

75 481.4ms

Score: 75/100

Avg Time: 481.4ms

Run ID: 21

Click for detailed results

85 805.3ms

Score: 85/100

Avg Time: 805.3ms

Run ID: 20

Click for detailed results

54 653.0ms

Score: 54/100

Avg Time: 653.0ms

Run ID: 13

Click for detailed results

96 847.5ms

Score: 96/100

Avg Time: 847.5ms

Run ID: 15

Click for detailed results

92 2396.5ms

Score: 92/100

Avg Time: 2396.5ms

Run ID: 248

Click for detailed results

74 1036.4ms

Score: 74/100

Avg Time: 1036.4ms

Run ID: 18

Click for detailed results

96 1674.2ms

Score: 96/100

Avg Time: 1674.2ms

Run ID: 16

Click for detailed results

81 3253.5ms

Score: 81/100

Avg Time: 3253.5ms

Run ID: 234

Click for detailed results

78 2036.8ms

Score: 78/100

Avg Time: 2036.8ms

Run ID: 256

Click for detailed results

96 2157.9ms

Score: 96/100

Avg Time: 2157.9ms

Run ID: 14

Click for detailed results

98 3443.7ms

Score: 98/100

Avg Time: 3443.7ms

Run ID: 17

Click for detailed results

100 1020.6ms

Score: 100/100

Avg Time: 1020.6ms

Run ID: 139

Click for detailed results

99 631.1ms

Score: 99/100

Avg Time: 631.1ms

Run ID: 178

Click for detailed results

99 934.9ms

Score: 99/100

Avg Time: 934.9ms

Run ID: 179

Click for detailed results

100 687.3ms

Score: 100/100

Avg Time: 687.3ms

Run ID: 22

Click for detailed results

99 1409.0ms

Score: 99/100

Avg Time: 1409.0ms

Run ID: 203

Click for detailed results

Antonym Identification
Language

ID: 0016_antonym

Description: Tests ability to identify the correct antonym from a list of options.

0 1609.6ms

Score: 0/100

Avg Time: 1609.6ms

Run ID: 285

Click for detailed results

85 262.3ms

Score: 85/100

Avg Time: 262.3ms

Run ID: 30

Click for detailed results

90 309.4ms

Score: 90/100

Avg Time: 309.4ms

Run ID: 32

Click for detailed results

100 581.4ms

Score: 100/100

Avg Time: 581.4ms

Run ID: 31

Click for detailed results

82 420.8ms

Score: 82/100

Avg Time: 420.8ms

Run ID: 24

Click for detailed results

90 470.0ms

Score: 90/100

Avg Time: 470.0ms

Run ID: 26

Click for detailed results

97 832.0ms

Score: 97/100

Avg Time: 832.0ms

Run ID: 243

Click for detailed results

80 554.8ms

Score: 80/100

Avg Time: 554.8ms

Run ID: 29

Click for detailed results

100 928.9ms

Score: 100/100

Avg Time: 928.9ms

Run ID: 27

Click for detailed results

77 1227.5ms

Score: 77/100

Avg Time: 1227.5ms

Run ID: 229

Click for detailed results

50 2215.5ms

Score: 50/100

Avg Time: 2215.5ms

Run ID: 252

Click for detailed results

100 1388.8ms

Score: 100/100

Avg Time: 1388.8ms

Run ID: 25

Click for detailed results

100 2714.7ms

Score: 100/100

Avg Time: 2714.7ms

Run ID: 278

Click for detailed results

100 2041.7ms

Score: 100/100

Avg Time: 2041.7ms

Run ID: 28

Click for detailed results

100 981.4ms

Score: 100/100

Avg Time: 981.4ms

Run ID: 134

Click for detailed results

100 612.7ms

Score: 100/100

Avg Time: 612.7ms

Run ID: 168

Click for detailed results

100 1107.7ms

Score: 100/100

Avg Time: 1107.7ms

Run ID: 169

Click for detailed results

100 580.7ms

Score: 100/100

Avg Time: 580.7ms

Run ID: 33

Click for detailed results

100 1594.8ms

Score: 100/100

Avg Time: 1594.8ms

Run ID: 198

Click for detailed results

Definitions
Language

ID: 0020_definitions

Description: A benchmark to evaluate a model's ability to identify the correct definition of words.

48 261.0ms

Score: 48/100

Avg Time: 261.0ms

Run ID: 286

Click for detailed results

88 118.4ms

Score: 88/100

Avg Time: 118.4ms

Run ID: 70

Click for detailed results

100 185.5ms

Score: 100/100

Avg Time: 185.5ms

Run ID: 72

Click for detailed results

100 338.3ms

Score: 100/100

Avg Time: 338.3ms

Run ID: 71

Click for detailed results

72 204.9ms

Score: 72/100

Avg Time: 204.9ms

Run ID: 64

Click for detailed results

96 414.8ms

Score: 96/100

Avg Time: 414.8ms

Run ID: 66

Click for detailed results

100 21229.2ms

Score: 100/100

Avg Time: 21229.2ms

Run ID: 244

Click for detailed results

100 444.0ms

Score: 100/100

Avg Time: 444.0ms

Run ID: 69

Click for detailed results

100 961.2ms

Score: 100/100

Avg Time: 961.2ms

Run ID: 67

Click for detailed results

96 699.6ms

Score: 96/100

Avg Time: 699.6ms

Run ID: 230

Click for detailed results

96 635.5ms

Score: 96/100

Avg Time: 635.5ms

Run ID: 253

Click for detailed results

100 858.1ms

Score: 100/100

Avg Time: 858.1ms

Run ID: 65

Click for detailed results

100 1054.2ms

Score: 100/100

Avg Time: 1054.2ms

Run ID: 279

Click for detailed results

100 2294.7ms

Score: 100/100

Avg Time: 2294.7ms

Run ID: 68

Click for detailed results

96 705.8ms

Score: 96/100

Avg Time: 705.8ms

Run ID: 135

Click for detailed results

100 618.1ms

Score: 100/100

Avg Time: 618.1ms

Run ID: 170

Click for detailed results

100 860.8ms

Score: 100/100

Avg Time: 860.8ms

Run ID: 171

Click for detailed results

100 665.8ms

Score: 100/100

Avg Time: 665.8ms

Run ID: 73

Click for detailed results

100 2185.1ms

Score: 100/100

Avg Time: 2185.1ms

Run ID: 199

Click for detailed results

Unit Conversion
Language

ID: 0022_unit_conversion

Description: A benchmark to evaluate a model's ability to accurately convert between different units of measurement.

0 430.2ms

Score: 0/100

Avg Time: 430.2ms

Run ID: 80

Click for detailed results

55 463.1ms

Score: 55/100

Avg Time: 463.1ms

Run ID: 82

Click for detailed results

27 826.3ms

Score: 27/100

Avg Time: 826.3ms

Run ID: 81

Click for detailed results

42 659.7ms

Score: 42/100

Avg Time: 659.7ms

Run ID: 74

Click for detailed results

50 720.3ms

Score: 50/100

Avg Time: 720.3ms

Run ID: 76

Click for detailed results

75 872.2ms

Score: 75/100

Avg Time: 872.2ms

Run ID: 257

Click for detailed results

35 943.5ms

Score: 35/100

Avg Time: 943.5ms

Run ID: 79

Click for detailed results

90 1590.5ms

Score: 90/100

Avg Time: 1590.5ms

Run ID: 77

Click for detailed results

82 2668.3ms

Score: 82/100

Avg Time: 2668.3ms

Run ID: 235

Click for detailed results

80 1637.6ms

Score: 80/100

Avg Time: 1637.6ms

Run ID: 258

Click for detailed results

90 2140.1ms

Score: 90/100

Avg Time: 2140.1ms

Run ID: 75

Click for detailed results

100 3927.1ms

Score: 100/100

Avg Time: 3927.1ms

Run ID: 78

Click for detailed results

100 1008.0ms

Score: 100/100

Avg Time: 1008.0ms

Run ID: 140

Click for detailed results

97 556.6ms

Score: 97/100

Avg Time: 556.6ms

Run ID: 180

Click for detailed results

100 1354.6ms

Score: 100/100

Avg Time: 1354.6ms

Run ID: 181

Click for detailed results

97 604.8ms

Score: 97/100

Avg Time: 604.8ms

Run ID: 196

Click for detailed results

100 1348.3ms

Score: 100/100

Avg Time: 1348.3ms

Run ID: 197

Click for detailed results

Part of Speech
Language

ID: 0032_part_of_speech

Description: A benchmark to evaluate a model's ability to identify the part of speech of a specific word in a sentence.

95 316.2ms

Score: 95/100

Avg Time: 316.2ms

Run ID: 100

Click for detailed results

85 298.0ms

Score: 85/100

Avg Time: 298.0ms

Run ID: 102

Click for detailed results

90 590.2ms

Score: 90/100

Avg Time: 590.2ms

Run ID: 101

Click for detailed results

90 517.2ms

Score: 90/100

Avg Time: 517.2ms

Run ID: 94

Click for detailed results

100 560.8ms

Score: 100/100

Avg Time: 560.8ms

Run ID: 96

Click for detailed results

100 916.2ms

Score: 100/100

Avg Time: 916.2ms

Run ID: 259

Click for detailed results

95 623.4ms

Score: 95/100

Avg Time: 623.4ms

Run ID: 99

Click for detailed results

100 1135.7ms

Score: 100/100

Avg Time: 1135.7ms

Run ID: 97

Click for detailed results

90 2392.6ms

Score: 90/100

Avg Time: 2392.6ms

Run ID: 236

Click for detailed results

95 1653.3ms

Score: 95/100

Avg Time: 1653.3ms

Run ID: 260

Click for detailed results

100 1777.3ms

Score: 100/100

Avg Time: 1777.3ms

Run ID: 95

Click for detailed results

100 2284.1ms

Score: 100/100

Avg Time: 2284.1ms

Run ID: 98

Click for detailed results

100 969.7ms

Score: 100/100

Avg Time: 969.7ms

Run ID: 141

Click for detailed results

100 468.1ms

Score: 100/100

Avg Time: 468.1ms

Run ID: 182

Click for detailed results

100 1079.2ms

Score: 100/100

Avg Time: 1079.2ms

Run ID: 183

Click for detailed results

95 756.0ms

Score: 95/100

Avg Time: 756.0ms

Run ID: 103

Click for detailed results

95 1215.2ms

Score: 95/100

Avg Time: 1215.2ms

Run ID: 204

Click for detailed results

Lemma Identification
Language

ID: 0033_lemma

Description: A benchmark to evaluate a model's ability to identify the lemma (base form) of a given word. The lemma is the dictionary form: - For nouns: the singular form (e.g., "cats" → "cat") - For verbs: the infinitive form without "to" (e.g., "running" → "run") - For adjectives: the positive form (e.g., "better" → "good")

62 221.7ms

Score: 62/100

Avg Time: 221.7ms

Run ID: 152

Click for detailed results

95 230.9ms

Score: 95/100

Avg Time: 230.9ms

Run ID: 154

Click for detailed results

85 449.8ms

Score: 85/100

Avg Time: 449.8ms

Run ID: 153

Click for detailed results

95 383.9ms

Score: 95/100

Avg Time: 383.9ms

Run ID: 146

Click for detailed results

97 395.1ms

Score: 97/100

Avg Time: 395.1ms

Run ID: 148

Click for detailed results

90 686.5ms

Score: 90/100

Avg Time: 686.5ms

Run ID: 261

Click for detailed results

97 527.7ms

Score: 97/100

Avg Time: 527.7ms

Run ID: 151

Click for detailed results

97 877.0ms

Score: 97/100

Avg Time: 877.0ms

Run ID: 149

Click for detailed results

97 1857.1ms

Score: 97/100

Avg Time: 1857.1ms

Run ID: 237

Click for detailed results

92 1243.8ms

Score: 92/100

Avg Time: 1243.8ms

Run ID: 262

Click for detailed results

97 1277.3ms

Score: 97/100

Avg Time: 1277.3ms

Run ID: 147

Click for detailed results

100 1677.4ms

Score: 100/100

Avg Time: 1677.4ms

Run ID: 150

Click for detailed results

100 779.8ms

Score: 100/100

Avg Time: 779.8ms

Run ID: 156

Click for detailed results

97 451.4ms

Score: 97/100

Avg Time: 451.4ms

Run ID: 184

Click for detailed results

100 1028.3ms

Score: 100/100

Avg Time: 1028.3ms

Run ID: 185

Click for detailed results

100 688.8ms

Score: 100/100

Avg Time: 688.8ms

Run ID: 155

Click for detailed results

100 913.6ms

Score: 100/100

Avg Time: 913.6ms

Run ID: 205

Click for detailed results

Translation (EN → FR)
Language

ID: 0050_translation_en_fr

Description: Tests ability to translate EN words to FR with multiple choice validation

84 338.2ms

Score: 84/100

Avg Time: 338.2ms

Run ID: 40

Click for detailed results

98 359.5ms

Score: 98/100

Avg Time: 359.5ms

Run ID: 42

Click for detailed results

88 638.0ms

Score: 88/100

Avg Time: 638.0ms

Run ID: 41

Click for detailed results

56 457.8ms

Score: 56/100

Avg Time: 457.8ms

Run ID: 34

Click for detailed results

94 511.9ms

Score: 94/100

Avg Time: 511.9ms

Run ID: 36

Click for detailed results

100 741.9ms

Score: 100/100

Avg Time: 741.9ms

Run ID: 263

Click for detailed results

98 697.2ms

Score: 98/100

Avg Time: 697.2ms

Run ID: 39

Click for detailed results

98 948.5ms

Score: 98/100

Avg Time: 948.5ms

Run ID: 37

Click for detailed results

98 2166.8ms

Score: 98/100

Avg Time: 2166.8ms

Run ID: 238

Click for detailed results

88 1566.3ms

Score: 88/100

Avg Time: 1566.3ms

Run ID: 264

Click for detailed results

98 1298.7ms

Score: 98/100

Avg Time: 1298.7ms

Run ID: 35

Click for detailed results

98 2874.6ms

Score: 98/100

Avg Time: 2874.6ms

Run ID: 38

Click for detailed results

98 957.4ms

Score: 98/100

Avg Time: 957.4ms

Run ID: 142

Click for detailed results

98 552.9ms

Score: 98/100

Avg Time: 552.9ms

Run ID: 186

Click for detailed results

98 989.4ms

Score: 98/100

Avg Time: 989.4ms

Run ID: 187

Click for detailed results

98 586.2ms

Score: 98/100

Avg Time: 586.2ms

Run ID: 43

Click for detailed results

100 1410.6ms

Score: 100/100

Avg Time: 1410.6ms

Run ID: 206

Click for detailed results

Translation (EN → ZH)
Language

ID: 0050_translation_en_zh

Description: Tests ability to translate EN words to ZH with multiple choice validation

92 332.7ms

Score: 92/100

Avg Time: 332.7ms

Run ID: 50

Click for detailed results

92 364.2ms

Score: 92/100

Avg Time: 364.2ms

Run ID: 52

Click for detailed results

96 624.3ms

Score: 96/100

Avg Time: 624.3ms

Run ID: 51

Click for detailed results

68 471.9ms

Score: 68/100

Avg Time: 471.9ms

Run ID: 44

Click for detailed results

60 844.6ms

Score: 60/100

Avg Time: 844.6ms

Run ID: 46

Click for detailed results

98 745.1ms

Score: 98/100

Avg Time: 745.1ms

Run ID: 265

Click for detailed results

100 813.4ms

Score: 100/100

Avg Time: 813.4ms

Run ID: 49

Click for detailed results

96 1595.3ms

Score: 96/100

Avg Time: 1595.3ms

Run ID: 47

Click for detailed results

96 2228.2ms

Score: 96/100

Avg Time: 2228.2ms

Run ID: 239

Click for detailed results

98 1419.2ms

Score: 98/100

Avg Time: 1419.2ms

Run ID: 266

Click for detailed results

98 1736.3ms

Score: 98/100

Avg Time: 1736.3ms

Run ID: 45

Click for detailed results

98 4027.9ms

Score: 98/100

Avg Time: 4027.9ms

Run ID: 48

Click for detailed results

100 964.5ms

Score: 100/100

Avg Time: 964.5ms

Run ID: 143

Click for detailed results

100 546.4ms

Score: 100/100

Avg Time: 546.4ms

Run ID: 188

Click for detailed results

100 919.3ms

Score: 100/100

Avg Time: 919.3ms

Run ID: 189

Click for detailed results

100 473.1ms

Score: 100/100

Avg Time: 473.1ms

Run ID: 53

Click for detailed results

100 1362.7ms

Score: 100/100

Avg Time: 1362.7ms

Run ID: 207

Click for detailed results

Translation (SW → KO)
Language

ID: 0050_translation_sw_ko

Description: Tests ability to translate SW words to KO with multiple choice validation

9 392.6ms

Score: 9/100

Avg Time: 392.6ms

Run ID: 60

Click for detailed results

19 398.9ms

Score: 19/100

Avg Time: 398.9ms

Run ID: 62

Click for detailed results

21 757.5ms

Score: 21/100

Avg Time: 757.5ms

Run ID: 61

Click for detailed results

3 613.0ms

Score: 3/100

Avg Time: 613.0ms

Run ID: 54

Click for detailed results

25 630.7ms

Score: 25/100

Avg Time: 630.7ms

Run ID: 56

Click for detailed results

23 763.5ms

Score: 23/100

Avg Time: 763.5ms

Run ID: 267

Click for detailed results

39 894.3ms

Score: 39/100

Avg Time: 894.3ms

Run ID: 59

Click for detailed results

23 1954.3ms

Score: 23/100

Avg Time: 1954.3ms

Run ID: 57

Click for detailed results

41 2460.0ms

Score: 41/100

Avg Time: 2460.0ms

Run ID: 240

Click for detailed results

29 2254.3ms

Score: 29/100

Avg Time: 2254.3ms

Run ID: 268

Click for detailed results

66 1433.9ms

Score: 66/100

Avg Time: 1433.9ms

Run ID: 55

Click for detailed results

66 3836.5ms

Score: 66/100

Avg Time: 3836.5ms

Run ID: 58

Click for detailed results

78 922.4ms

Score: 78/100

Avg Time: 922.4ms

Run ID: 144

Click for detailed results

80 481.6ms

Score: 80/100

Avg Time: 481.6ms

Run ID: 190

Click for detailed results

80 822.0ms

Score: 80/100

Avg Time: 822.0ms

Run ID: 191

Click for detailed results

82 473.6ms

Score: 82/100

Avg Time: 473.6ms

Run ID: 63

Click for detailed results

98 1761.5ms

Score: 98/100

Avg Time: 1761.5ms

Run ID: 208

Click for detailed results

Pinyin Letter Count
Language

ID: 0051_pinyin_letters

Description: A benchmark to evaluate a model's ability to count how many times a specific letter appears in the Pinyin representation of a Chinese sentence.

52 234.8ms

Score: 52/100

Avg Time: 234.8ms

Run ID: 120

Click for detailed results

47 290.6ms

Score: 47/100

Avg Time: 290.6ms

Run ID: 122

Click for detailed results

36 537.1ms

Score: 36/100

Avg Time: 537.1ms

Run ID: 121

Click for detailed results

31 481.8ms

Score: 31/100

Avg Time: 481.8ms

Run ID: 114

Click for detailed results

36 546.9ms

Score: 36/100

Avg Time: 546.9ms

Run ID: 116

Click for detailed results

5 1634.5ms

Score: 5/100

Avg Time: 1634.5ms

Run ID: 245

Click for detailed results

42 623.5ms

Score: 42/100

Avg Time: 623.5ms

Run ID: 119

Click for detailed results

15 1061.6ms

Score: 15/100

Avg Time: 1061.6ms

Run ID: 117

Click for detailed results

47 1307.7ms

Score: 47/100

Avg Time: 1307.7ms

Run ID: 231

Click for detailed results

26 1309.6ms

Score: 26/100

Avg Time: 1309.6ms

Run ID: 115

Click for detailed results

36 2778.9ms

Score: 36/100

Avg Time: 2778.9ms

Run ID: 280

Click for detailed results

42 2188.6ms

Score: 42/100

Avg Time: 2188.6ms

Run ID: 118

Click for detailed results

52 1278.3ms

Score: 52/100

Avg Time: 1278.3ms

Run ID: 136

Click for detailed results

31 453.2ms

Score: 31/100

Avg Time: 453.2ms

Run ID: 172

Click for detailed results

47 1002.1ms

Score: 47/100

Avg Time: 1002.1ms

Run ID: 173

Click for detailed results

42 654.9ms

Score: 42/100

Avg Time: 654.9ms

Run ID: 123

Click for detailed results

84 1851.6ms

Score: 84/100

Avg Time: 1851.6ms

Run ID: 200

Click for detailed results

English to IPA
Language

ID: 0061_english_to_ipa

Description: A benchmark to evaluate a model's ability to convert English words to their IPA (International Phonetic Alphabet) pronunciation.

5 276.1ms

Score: 5/100

Avg Time: 276.1ms

Run ID: 163

Click for detailed results

2 314.6ms

Score: 2/100

Avg Time: 314.6ms

Run ID: 165

Click for detailed results

12 519.7ms

Score: 12/100

Avg Time: 519.7ms

Run ID: 164

Click for detailed results

7 530.0ms

Score: 7/100

Avg Time: 530.0ms

Run ID: 157

Click for detailed results

12 513.2ms

Score: 12/100

Avg Time: 513.2ms

Run ID: 159

Click for detailed results

12 838.4ms

Score: 12/100

Avg Time: 838.4ms

Run ID: 269

Click for detailed results

17 695.4ms

Score: 17/100

Avg Time: 695.4ms

Run ID: 162

Click for detailed results

25 988.8ms

Score: 25/100

Avg Time: 988.8ms

Run ID: 160

Click for detailed results

15 2809.1ms

Score: 15/100

Avg Time: 2809.1ms

Run ID: 241

Click for detailed results

12 2967.0ms

Score: 12/100

Avg Time: 2967.0ms

Run ID: 270

Click for detailed results

15 1340.8ms

Score: 15/100

Avg Time: 1340.8ms

Run ID: 158

Click for detailed results

45 2100.2ms

Score: 45/100

Avg Time: 2100.2ms

Run ID: 161

Click for detailed results

50 999.4ms

Score: 50/100

Avg Time: 999.4ms

Run ID: 167

Click for detailed results

30 727.6ms

Score: 30/100

Avg Time: 727.6ms

Run ID: 192

Click for detailed results

37 1215.8ms

Score: 37/100

Avg Time: 1215.8ms

Run ID: 193

Click for detailed results

52 618.8ms

Score: 52/100

Avg Time: 618.8ms

Run ID: 166

Click for detailed results

50 1950.2ms

Score: 50/100

Avg Time: 1950.2ms

Run ID: 209

Click for detailed results

Geography Knowledge
Language

ID: 0120_geography

Description: A benchmark to evaluate a model's knowledge of world geography through multiple-choice questions about countries, capitals, physical features, and other geographical information.

87 313.1ms

Score: 87/100

Avg Time: 313.1ms

Run ID: 130

Click for detailed results

100 388.9ms

Score: 100/100

Avg Time: 388.9ms

Run ID: 132

Click for detailed results

100 778.4ms

Score: 100/100

Avg Time: 778.4ms

Run ID: 131

Click for detailed results

80 495.2ms

Score: 80/100

Avg Time: 495.2ms

Run ID: 124

Click for detailed results

100 403.4ms

Score: 100/100

Avg Time: 403.4ms

Run ID: 126

Click for detailed results

100 893.2ms

Score: 100/100

Avg Time: 893.2ms

Run ID: 271

Click for detailed results

97 1184.8ms

Score: 97/100

Avg Time: 1184.8ms

Run ID: 129

Click for detailed results

100 1154.6ms

Score: 100/100

Avg Time: 1154.6ms

Run ID: 127

Click for detailed results

95 2554.2ms

Score: 95/100

Avg Time: 2554.2ms

Run ID: 242

Click for detailed results

100 2425.7ms

Score: 100/100

Avg Time: 2425.7ms

Run ID: 272

Click for detailed results

100 1486.2ms

Score: 100/100

Avg Time: 1486.2ms

Run ID: 125

Click for detailed results

100 3340.2ms

Score: 100/100

Avg Time: 3340.2ms

Run ID: 128

Click for detailed results

100 981.8ms

Score: 100/100

Avg Time: 981.8ms

Run ID: 145

Click for detailed results

100 441.9ms

Score: 100/100

Avg Time: 441.9ms

Run ID: 194

Click for detailed results

100 918.3ms

Score: 100/100

Avg Time: 918.3ms

Run ID: 195

Click for detailed results

100 559.6ms

Score: 100/100

Avg Time: 559.6ms

Run ID: 133

Click for detailed results

100 1471.2ms

Score: 100/100

Avg Time: 1471.2ms

Run ID: 210

Click for detailed results