Verdict
0.2.0
API Reference
GitHub
Discord
Whitepaper
API Reference
GitHub
Discord
Whitepaper
Paper Implementations
#
Paper Implementations
Title
Colab Link
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
(EMNLP 2023)
Large Language Model Evaluators are not Fair Evaluators
(ACL 2024)
LLM Evaluators Recognize and Favor Their Own Generations
(NeurIPS 2024)
Debating with More Persuasive LLMs Leads to More Truthful Answers
(ICML 2024)
On scalable oversight with weak LLMs judging strong LLMs
(NeurIPS 2024)
LMUnit: Fine-grained Evaluation with Natural Language Unit Tests
(2024)