Research | Kigex

Kigex AI

Riemann-Bench: Evaluating Moonshot Mathematical Reasoning

A private set of research-level problems designed to evaluate long-form theorem work beyond contest-style shortcuts.

Kigex AI

We show how dense professional simulations can improve tool use, planning, and transfer to unseen workplace tasks.

Kigex AI

An empirical framework for measuring whether agents can plan, adapt, stay grounded, and recover from realistic ambiguity.

Meta x Kigex AI

A benchmark for reward models that must judge image editing, interleaved generation, and multimodal reasoning together.

Kigex AI x Meridian Lab

A study of how expert disagreement changes when rubrics ask evaluators to weigh correctness, usefulness, and judgment.

Kigex AI

A controlled evaluation of whether models preserve facts when prompts include misleading dates, names, and citations.