Last updated:
Gemma 3 Small
Model: gemma3:1b:Q4_K_M Launch Date: 2025-03-12 Size: 815 MB License: Gemma License |
QWEN25 Small
Model: qwen2.5:1.5b:Q4_K_M Launch Date: 2024-09-15 Size: 986 MB License: Apache License |
Gemma2 Small
Model: gemma2:2b:Q4_0 Launch Date: 2024-06-07 Size: 1600 MB License: Gemma License |
SmolLM 2
Model: smollm2:1.7b:Q8_0 Launch Date: 2024-10-31 Size: 1800 MB License: Apache License |
Llama 3.2
Model: llama3.2:3b:Q4_K_M Launch Date: 2024-09-25 Size: 2000 MB License: Llama 3.2 License |
GPT-4o-mini
Model: gpt-4o-mini-2024-07-18 Launch Date: 2024-07-18 Size: 2047 MB License: Closed Model |
Gemma 3
Model: gemma3:4b:Q4_K_M Launch Date: 2025-03-12 Size: 4300 MB License: Gemma License |
QWEN 2.5
Model: qwen2.5:7b:Q4_K_M Launch Date: 2024-09-15 Size: 4700 MB License: Apache License |
Gemma 2
Model: gemma2:9b:Q4_0 Launch Date: 2024-06-27 Size: 5400 MB License: Gemma License |
Phi 4
Model: phi4:14b:Q4_K_M Launch Date: 2025-01-08 Size: 9100 MB License: MIT License |
|
---|---|---|---|---|---|---|---|---|---|---|
Word Length
Benchmark ID: 0011_word_length Description: A benchmark to evaluate a model's ability to count the total number of letters in a given word. |
||||||||||
Letter Count
Benchmark ID: 0012_letter_count Description: A benchmark to evaluate a model's ability to count how many times a specific letter appears in a word. |
||||||||||
Spell Check
Benchmark ID: 0015_spell_check Description: A benchmark to evaluate a model's ability to identify misspelled words in a sentence and provide their correct spelling. |
||||||||||
Antonym Identification
Benchmark ID: 0016_antonym Description: Tests ability to identify the correct antonym from a list of options. |
||||||||||
Definitions
Benchmark ID: 0020_definitions Description: A benchmark to evaluate a model's ability to identify the correct definition of words. |
||||||||||
Unit Conversion
Benchmark ID: 0022_unit_conversion Description: A benchmark to evaluate a model's ability to accurately convert between different units of measurement. |
||||||||||
Part of Speech
Benchmark ID: 0032_part_of_speech Description: A benchmark to evaluate a model's ability to identify the part of speech of a specific word in a sentence. |
||||||||||
Translation (EN → FR)
Benchmark ID: 0050_translation_en_fr Description: Tests ability to translate EN words to FR with multiple choice validation |
||||||||||
Translation (EN → ZH)
Benchmark ID: 0050_translation_en_zh Description: Tests ability to translate EN words to ZH with multiple choice validation |
||||||||||
Translation (SW → KO)
Benchmark ID: 0050_translation_sw_ko Description: Tests ability to translate SW words to KO with multiple choice validation |
||||||||||
Pinyin Letter Count
Benchmark ID: 0051_pinyin_letters Description: A benchmark to evaluate a model's ability to count how many times a specific letter appears in the Pinyin representation of a Chinese sentence. |
||||||||||
Geography Knowledge
Benchmark ID: 0120_geography Description: A benchmark to evaluate a model's knowledge of world geography through multiple-choice questions about countries, capitals, physical features, and other geographical information. |