Benchmarks · Peripamo Fakta Engine + Risk Engine vs Generic AI Stack

/01 · Benchmark 01 · Compliance & audit queries

Peripamo Fakta Engine vs Google Drive MCP

How compliance, legal, and internal audit teams work a policy stack day to day: precise clause retrieval, cross-document reconciliation, and threshold verification across jurisdictions. The benchmark includes cross-policy trap questions that catch models prone to confident summarization without sourcing.

Peripamo Fakta Engine

Google Drive MCPLLM + generic document retrieval

GPT-5.5OpenAI

0.899

0.565

DeepSeek v4 ProDeepSeek

0.811

0.256

Opus 4.7Anthropic

0.770

0.648

Gemini 3.1 ProGoogle

0.731

0.442

Kimi K2.6Moonshot

0.675

0.263

/01.b Full results · all 10 runs Show table

Rank	Model	Accuracy	Token Eff.	Time Eff.	Composite
01	GPT-5.5	0.900	1.000	0.689	0.899
02	DeepSeek v4 Pro	0.900	0.696	0.414	0.811
03	Opus 4.7	0.750	0.727	1.000	0.770
04	Gemini 3.1 Pro	0.700	0.942	0.530	0.731
05	Kimi K2.6	0.700	0.828	0.196	0.675
06	Opus 4.7	0.650	0.511	0.906	0.648
07	GPT-5.5	0.450	0.844	0.816	0.565
08	Gemini 3.1 Pro	0.400	0.494	0.628	0.442
09	Kimi K2.6	0.200	0.529	0.168	0.263
10	DeepSeek v4 Pro	0.200	0.385	0.385	0.256

/02 · Benchmark 02 · Portfolio & risk analysis

Peripamo Risk Engine vs LLM + Tool Use

How portfolio managers, risk analysts, and treasury teams actually work their books: VaR, Greeks, duration, convexity, and multi-step portfolio attribution across asset classes. The benchmark includes sign-trap questions that catch the long/short and bullish/bearish errors generalist AI confidently produces, even when armed with web search, code execution, and live market data.

Peripamo Risk Engine

LLM + Tool UseWeb Search + Python + Yahoo Finance

GPT-5.5OpenAI

0.979

0.317

DeepSeek v4 ProDeepSeek

0.914

0.304

Opus 4.7Anthropic

0.818

0.310

Kimi K2.6Moonshot

0.778

0.268

Gemini 3.1 ProGoogle

0.763

0.499

/02.b Full results · all 10 runs · token & time telemetry Show table

Rank	Model	Substrate	Accuracy	Token Eff.	Time Eff.	Composite	Score	Tokens	Time (s)
01	GPT-5.5		1.000	0.941	0.908	0.9789	20 / 20	40,086	250
02	DeepSeek v4 Pro		1.000	0.724	0.695	0.9143	20 / 20	52,054	327
03	Opus 4.7		0.900	0.440	1.000	0.8181	18 / 20	85,643	227
04	Kimi K2.6		0.800	1.000	0.176	0.7776	16 / 20	37,705	1,291
05	Gemini 3.1 Pro		0.700	0.917	0.895	0.7628	14 / 20	41,138	254
06	Gemini 3.1 Pro	++	0.400	0.738	0.714	0.4991	8 / 20	51,081	318
07	GPT-5.5	++	0.400	0.034	0.298	0.3165	8 / 20	1,123,841	762
08	Opus 4.7	++	0.400	0.008	0.279	0.3096	8 / 20	4,582,286	814
09	DeepSeek v4 Pro	++	0.400	0.032	0.174	0.3038	8 / 20	1,179,506	1,306
10	Kimi K2.6	++	0.300	0.178	0.226	0.2682	6 / 20	211,642	1,008

Agent-firstHarness Engineering

Peripamo Fakta Engine vs Google Drive MCP

Peripamo Risk Engine vs LLM + Tool Use

Agent-first
Harness Engineering