New LLM benchmark: llmtester

themarkymark

Published: 07 Apr 2026 › Updated: 07 Apr 2026 New LLM benchmark: llmtester

New LLM benchmark: llmtester

I just published a new LLM Benchmark tool called llmtester.

Over 50,000 tests over 13 categories!

This is the easiest way to benchmark LLM models and see where they go wrong.

Interactive CLI - Keyboard-driven benchmark selection and configuration
Multi-Provider Support - OpenAI, Anthropic, Together.ai, Groq, Fireworks AI, Perplexity, OpenRouter, and any OpenAI-compatible API
LLM-as-Judge - Optional secondary model evaluation for code, math, SQL, bash, and truthfulness benchmarks
Progress Tracking - Resume interrupted evaluations from where you left off
Result Explorer - Built-in TUI to browse past results, filter by pass/fail, and inspect individual responses
Config Persistence - Saves provider, endpoint, and model settings between runs
Shuffle & Sampling - Run a percentage of each benchmark with optional shuffling for diverse distribution

Run the latest version without any install with npx llmtester

What makes llmtester so awesome?

50,000+ tests, second LLM judges results on more complex tests, ability to fully explore past tests.

I got tired of the doing everything manually, so built a full test runner.

Includes tests in many domains from Grade school math, advanced math, reasoning, programming, and even sql. Some of these tests are impossible to grade without a judge, llmtester will handle this all for you!

You can find the package on npm and github.

Leave New LLM benchmark: llmtester to:

Written by Marky

Best Posts From Marky

We have not curated any of themarkymark's posts yet. But you can encourage our curation team to review posts by visiting them regularly and by referring other readers. Because we give priority to frequently read content.

New LLM benchmark: llmtester