Solutions AI Agents Benchmarks Company Careers Trust Center Contact Us Request Demo
Peripamo · Benchmarks

Agent-first
Harness Engineering

We ran five frontier models against the Peripamo engines, then against the generic stack an LLM would otherwise reach for.

Document Understanding Quantitative Finance Token Efficiency Speed Auditability
Peripamo engines
Generic AI stack
/01  ·  Benchmark 01 · Compliance & audit queries

Peripamo Fakta Engine vs Google Drive MCP

How compliance, legal, and internal audit teams work a policy stack day to day: precise clause retrieval, cross-document reconciliation, and threshold verification across jurisdictions. The benchmark includes cross-policy trap questions that catch models prone to confident summarization without sourcing.

Peripamo Fakta Engine
OpenAI Anthropic Gemini DeepSeek Kimi + Google Drive
Google Drive MCPLLM + generic document retrieval
GPT-5.5OpenAI
0.899
0.565
DeepSeek v4 ProDeepSeek
0.811
0.256
Opus 4.7Anthropic
0.770
0.648
Gemini 3.1 ProGoogle
0.731
0.442
Kimi K2.6Moonshot
0.675
0.263
/01.b Full results · all 10 runs Show table
Rank Model System Accuracy Token Eff. Time Eff. Composite
01
GPT-5.5
0.900 1.000 0.689 0.899
02
DeepSeek v4 Pro
0.900 0.696 0.414 0.811
03
Opus 4.7
0.750 0.727 1.000 0.770
04
Gemini 3.1 Pro
0.700 0.942 0.530 0.731
05
Kimi K2.6
0.700 0.828 0.196 0.675
06
Opus 4.7
0.650 0.511 0.906 0.648
07
GPT-5.5
0.450 0.844 0.816 0.565
08
Gemini 3.1 Pro
0.400 0.494 0.628 0.442
09
Kimi K2.6
0.200 0.529 0.168 0.263
10
DeepSeek v4 Pro
0.200 0.385 0.385 0.256
/02  ·  Benchmark 02 · Portfolio & risk analysis

Peripamo Risk Engine vs LLM + Tool Use

How portfolio managers, risk analysts, and treasury teams actually work their books: VaR, Greeks, duration, convexity, and multi-step portfolio attribution across asset classes. The benchmark includes sign-trap questions that catch the long/short and bullish/bearish errors generalist AI confidently produces, even when armed with web search, code execution, and live market data.

Peripamo Risk Engine
OpenAI Anthropic Gemini DeepSeek Kimi + Google Search + Python + Yahoo Finance
LLM + Tool UseWeb Search + Python + Yahoo Finance
GPT-5.5OpenAI
0.979
0.317
DeepSeek v4 ProDeepSeek
0.914
0.304
Opus 4.7Anthropic
0.818
0.310
Kimi K2.6Moonshot
0.778
0.268
Gemini 3.1 ProGoogle
0.763
0.499
/02.b Full results · all 10 runs · token & time telemetry Show table
Rank Model Substrate Accuracy Token Eff. Time Eff. Composite Score Tokens Time (s)
01
GPT-5.5
1.000 0.941 0.908 0.9789 20 / 20 40,086 250
02
DeepSeek v4 Pro
1.000 0.724 0.695 0.9143 20 / 20 52,054 327
03
Opus 4.7
0.900 0.440 1.000 0.8181 18 / 20 85,643 227
04
Kimi K2.6
0.800 1.000 0.176 0.7776 16 / 20 37,705 1,291
05
Gemini 3.1 Pro
0.700 0.917 0.895 0.7628 14 / 20 41,138 254
06
Gemini 3.1 Pro
Web Search+Python+Yahoo Finance
0.400 0.738 0.714 0.4991 8 / 20 51,081 318
07
GPT-5.5
Web Search+Python+Yahoo Finance
0.400 0.034 0.298 0.3165 8 / 20 1,123,841 762
08
Opus 4.7
Web Search+Python+Yahoo Finance
0.400 0.008 0.279 0.3096 8 / 20 4,582,286 814
09
DeepSeek v4 Pro
Web Search+Python+Yahoo Finance
0.400 0.032 0.174 0.3038 8 / 20 1,179,506 1,306
10
Kimi K2.6
Web Search+Python+Yahoo Finance
0.300 0.178 0.226 0.2682 6 / 20 211,642 1,008
Run it on your data.

A two-week pilot against your real corpus. Same models, your queries, your benchmark.

See how the Peripamo engines are wired.

The agentic architecture behind Fakta, Risiko, AIDA, and the rest of the stack.