New LLM benchmark: llmtester
I just published a new LLM Benchmark tool called llmtester.
Over 50,000 tests over 13 categories!
This is the easiest way to benchmark LLM models and see where they go wrong.
- Interactive CLI - Keyboard-driven benchmark selection and configuration
- Multi-Provider Support - OpenAI, Anthropic, Together.ai, Groq, Fireworks AI, Perplexity, OpenRouter, and any OpenAI-compatible API
- LLM-as-Judge - Optional secondary model evaluation for code, math, SQL, bash, and truthfulness benchmarks
- Progress Tracking - Resume interrupted evaluations from where you left off
- Result Explorer - Built-in TUI to browse past results, filter by pass/fail, and inspect individual responses
- Config Persistence - Saves provider, endpoint, and model settings between runs
- Shuffle & Sampling - Run a percentage of each benchmark with optional shuffling for diverse distribution
Run the latest version without any install with npx llmtester
What makes llmtester so awesome?
50,000+ tests, second LLM judges results on more complex tests, ability to fully explore past tests.
I got tired of the doing everything manually, so built a full test runner.
Includes tests in many domains from Grade school math, advanced math, reasoning, programming, and even sql. Some of these tests are impossible to grade without a judge, llmtester will handle this all for you!
You can find the package on npm and github.
Leave New LLM benchmark: llmtester to:
Read more #ai posts
Best Posts From Marky
We have not curated any of themarkymark's posts yet. But you can encourage our curation team to review posts by visiting them regularly and by referring other readers. Because we give priority to frequently read content.
More Posts From Marky
- Tan Stack Scanner
- New LLM benchmark: llmtester
- Receive proactive alerts when your openclaw instance is unavailable
- My two favorite Openclaw hacks
- If you are running openclaw, make sure you are updated.
- Some things I wish someone told me when I setup Openclaw
- If you are running openclaw, make sure you are updated.
- Hive Hot or Not improvements
- Hive Hot or Not!
- Block Bandits update
- Absolutely killed it in Miner Wars this week
- Been mining bitcoin with Gomining for a month now
- How to boost in Miner Wars without going broke
- Want to join Block Bandits?
- Boiling down short form content on Hive
- Nvidia RTX 6000 Pro power efficiency testing
- Hive Analytics 2026 Proposal
- Meet Number 6
- Upgrading Home Server Cluster
- PSA: BroFund/Bro/ManCave likely compromised