Eval
The agent’s “let me check” primitive: run a short PHP snippet in a sandboxed subprocess against the project’s live container and get a structured result back. Hypothesis to validation in 200ms, no temporary scripts, no manual bootstrap.
Composer: univeros/eval
Namespace: Altair\Eval
Introduction
Section titled “Introduction”When an agent (or a human) forms a hypothesis (does UserRepository::findByEmail return null or throw when no match?, what does FormatNegotiator::getContentTypeByFormat('json') actually return?), the cheapest way to validate it is to run a few lines of PHP. Doing that by hand means writing a temp script, knowing the autoloader path, bootstrapping the container, remembering to clean up. Agents do not do that well.
Eval collapses that loop into one command. bin/altair eval 'return container(SomeRepo::class)->count();' spawns a fresh PHP subprocess, requires the host’s Composer autoloader, resolves the host container from config/container.php (the skeleton convention) or an explicit bootstrap path, runs the snippet inside a closure that captures the return value, and emits a typed JSON shape: result type and value, captured stdout, exception with class and stack trace, duration, peak memory. The subprocess runs under disable_functions (no exec / shell_exec / proc_open / popen / assert, plus the network primitives unless --network), open_basedir confinement to the project root, a hard memory cap, and a wall-clock budget that the parent enforces by SIGTERM-then-SIGKILL. Each evaluation is one-shot: a fresh subprocess, no state carried across calls.
This is the most dangerous tool in the agent’s palette: eval is eval. The package is honest about what its ini-level guards can and cannot enforce. disable_functions blocks the function-based escape hatches (exec, proc_open, the network primitives), open_basedir blocks filesystem writes outside the project tree, and memory_limit + the parent’s wall-clock kill enforce the resource budget. What ini-level sandboxing cannot do: block eval() (a language construct, not a function), kernel-level network blocking (we disable the function-based network calls, not raw sockets the OS can open), or enforce read-only database access (an ALTAIR_EVAL_ALLOW_WRITES env var is passed for hosts to honour cooperatively; the host’s persistence config can read it). The subprocess sandbox bounds the damage even when those soft guards don’t catch a malicious snippet, and the --unsafe flag that lifts every ini-level guard simultaneously emits a kind=eval event into .altair/events.jsonl so a “we let it write” decision leaves an audit trail.
Installation
Section titled “Installation”Standalone:
composer require --dev univeros/evalYou will usually want this as a dev dependency, as it executes arbitrary code at runtime, which is exactly what a production tree should never offer. If you install the full framework, composer require univeros/framework already bundles it.
It depends on univeros/cli, univeros/configuration, univeros/container, and univeros/events (for the --unsafe audit event). It needs no PHP extension beyond the standard proc_open.
Quick start
Section titled “Quick start”The “does this return null?” check:
bin/altair eval 'return container(App\User\UserRepository::class)->findByEmail("nobody@example.com");'✓ null duration=12ms memory=8192 KB exit=0Capture an exception cleanly:
bin/altair eval 'throw new RuntimeException("boom");'✗ RuntimeException: boom at /srv/app/.altair/eval/.../wrapper.php:53 duration=0ms memory=4096 KB exit=1Emit JSON for an agent or a CI step:
bin/altair eval 'return ["a" => 1, "b" => 2];' --format=json{ "ok": true, "result": { "type": "array", "is_list": false, "count": 2, "value": { "a": { "type": "int", "value": 1 }, "b": { "type": "int", "value": 2 } } }, "stdout": "", "stderr": "", "exception": null, "duration_ms": 0, "memory_peak_bytes": 4194304, "exit_code": 0, "timed_out": false}Read the snippet from a file (useful for multi-line snippets and editor workflows):
bin/altair eval --file=tools/probe.phpPermit writes (host-cooperative; the host must honour ALTAIR_EVAL_ALLOW_WRITES=1) or network egress:
bin/altair eval --writes 'container(EntityManager::class)->flush();'bin/altair eval --network 'return file_get_contents("https://api.example.com/health");'The wall-clock budget is enforced by the parent: a runaway snippet gets SIGTERM with a short grace period, then SIGKILL. Exit code is 124 (the GNU timeout convention) and timed_out=true:
bin/altair eval 'while (true) {}' --timeout-ms=300✗ Timed out after 405ms. duration=405ms memory=0 KB exit=124Concepts
Section titled “Concepts”The subprocess is the security boundary. Every guard is enforced at the process level: php -d disable_functions=..., php -d open_basedir=..., php -d memory_limit=..., parent-managed SIGTERM/SIGKILL.
The snippet is delivered as a separate file, not embedded into the wrapper source. The wrapper requires the snippet from <projectRoot>/.altair/eval/<uniqid>/snippet.php, which keeps the snippet inside its own file scope, so a })(); payload in the snippet cannot inject statements into the wrapper’s file scope and therefore cannot, for example, race the wrapper’s own file_put_contents(resultFile, ...) by symlink()-ing the result file path. Combined with symlink/link being in disable_functions, this closes the wrapper-injection class of attack the early design was vulnerable to.
Result delivery is out-of-band. Stdout is reserved for the snippet’s own echo / print output (captured via ob_start) and stderr is reserved for PHP fatals / warnings. The structured payload (encoded return value, encoded exception, peak memory, duration) is written to a dedicated result file inside <projectRoot>/.altair/eval/<uniqid>/result.json that the parent reads after the subprocess exits. So a chatty snippet, a hard fatal, and a clean run are all distinguishable by their stream contents.
The container helper resolves the host’s container, not a bare framework one. The wrapper looks for, in order: an explicit --bootstrap=<path> argument, the ALTAIR_EVAL_BOOTSTRAP env var, and finally the config/container.php skeleton convention (a file that returns a fully-configured Container). When all three miss, it falls back to a fresh new Container() so container() at least returns something without crashing, but the typical host gets its real bindings simply by being a real host. The bootstrap file must live inside the project root: open_basedir is exactly what blocks a snippet from pointing the bootstrap at /tmp/exfil.php.
Each evaluation is one-shot. No persistent container, no state carried across calls, no REPL session. If you want a REPL, use univeros/tinker: it is an in-process, human-driven tool that lives in the same memory as the host. Eval is the out-of-process, agent-driven complement.
--unsafe is the escape hatch with an audit trail. It lifts every ini-level guard (no disable_functions, no open_basedir, no enforced memory cap) and emits a kind=eval event to the mutation log so the decision is recoverable from bin/altair events:tail. Use it when a sandboxed run cannot answer the question (e.g. the snippet legitimately needs to spawn a subprocess); never use it by default.
CLI surface
Section titled “CLI surface”| Flag | Default | Effect |
|---|---|---|
<snippet> (positional) | (none) | PHP code to execute. Omit and use --file=… instead. |
--file=<path> | (none) | Read the snippet from disk. |
--timeout-ms=<n> | 5000 | Wall-clock budget; clamped to [100, 60000]. |
--memory-mb=<n> | 128 | memory_limit; clamped to [16, 512]. |
--writes | off | Set ALTAIR_EVAL_ALLOW_WRITES=1 (host-cooperative). |
--network | off | Permit outbound HTTP/sockets; otherwise the network function calls are in disable_functions and allow_url_fopen=0. |
--unsafe | off | DANGEROUS: lift every ini guard; emit an audit event. |
--bootstrap=<path> | (none) | Override container bootstrap (must be inside the project root). |
--format=human|json | human | Output format. |
Exit code: 0 on a clean run, 1 on snippet exception / timeout / non-zero subprocess exit, 2 on usage error.
MCP tool
Section titled “MCP tool”univeros/mcp exposes one tool:
| Tool | Input | Returns |
|---|---|---|
framework__eval | { snippet: string (required), timeout_ms?: int, allow_writes?: bool, allow_network?: bool } | The full EvalResult JSON (ok, result, stdout, stderr, exception, duration_ms, memory_peak_bytes, exit_code, timed_out). |
--unsafe is deliberately not exposed to MCP: lifting every ini-level guard is a CLI-only, human-or-explicitly-audited action.
The return-value shape
Section titled “The return-value shape”Encoder\ValueEncoder produces a small, bounded payload:
['type' => 'null', 'value' => null]['type' => 'bool', 'value' => true]['type' => 'int', 'value' => 42]['type' => 'float', 'value' => 3.14] // 'NaN' / 'Infinity' / '-Infinity' as strings['type' => 'string', 'value' => '...'] // truncated past 10000 chars (length + truncated:true added)['type' => 'array', 'is_list' => true, 'count' => 3, 'value' => [/* recursive */]]['type' => 'object', 'class' => 'App\\…', 'id' => 7, 'properties' => [/* recursive */]] // __debugInfo() preferred['type' => 'iterable', 'class' => 'Generator', 'preview' => [/* first 50 items */], 'exhausted' => false]['type' => 'reference', 'class' => 'App\\…', 'id' => 7] // emitted when an object cycle is detectedRecursion stops at three levels of nesting; iterables yield at most fifty items into preview (and report exhausted so an infinite generator never overruns); strings beyond ten thousand characters are tail-truncated. So a pathological return value (a deep graph, an infinite generator, a megabyte of binary) still produces a small, agent-readable payload.
Exceptions encode as {class, message, file, line, code, stack_trace: [...], previous: [...]} with the trace capped at 30 frames and the previous chain at 10 wraps.
Programmatically
Section titled “Programmatically”The Evaluator is the top-level orchestrator:
use Altair\Eval\Evaluator;use Altair\Eval\EvalRequest;
$result = (new Evaluator())->evaluate(new EvalRequest( snippet: 'return container(App\User\UserRepository::class)->count();', projectRoot: getcwd(), timeoutMs: 5000, memoryLimitMb: 128, allowWrites: false, allowNetwork: false, bootstrap: null, // null → config/container.php convention));
$result->ok(); // true if clean run + exit 0 + no timeout$result->result; // ['type' => 'int', 'value' => 7] | null on exception$result->exception; // encoded exception | null$result->stdout; // snippet's echo/print$result->stderr; // PHP fatals / warnings$result->durationMs;$result->memoryPeakBytes;$result->timedOut; // true ⇒ exitCode 124, parent-killedWriting a host bootstrap
Section titled “Writing a host bootstrap”The skeleton-generated config/container.php already does this: it builds a Container, applies the host’s Configuration chain, and returns the container. No special “eval bootstrap” file is needed; eval reuses the same convention.
<?phpdeclare(strict_types=1);
use Altair\Configuration\Contracts\ConfigurationInterface;use Altair\Container\Container;
$container = new Container();
/** @var list<ConfigurationInterface> $configurations */$configurations = require __DIR__ . '/configurations.php';foreach ($configurations as $configuration) { $configuration->apply($container);}
return $container;A snippet running through eval can then container(App\Foo::class) and get the real, fully-wired service.
Configuration
Section titled “Configuration”The eval CLI command builds a default Evaluator when none is bound, so no Container wiring is required to use it. EvalConfiguration is for hosts that want a specific PHP binary (e.g. an environment where php does not resolve to the CLI binary):
use Altair\Eval\Configuration\EvalConfiguration;
(new EvalConfiguration( phpBinary: '/usr/local/bin/php8.3',))->apply($container);Testing
Section titled “Testing”The published tests under tests/Eval/ are real subprocess tests; they prove the sandbox actually sandboxes, not just that the flags are present:
- tests/Eval/Encoder/ValueEncoderTest.php: golden tests for every encoded shape, including object cycles, infinite generators, and depth-cap truncation.
- tests/Eval/Encoder/ExceptionEncoderTest.php: frame rendering and chain-walking.
- tests/Eval/Runner/SecurityProfileTest.php: the
php -dflag matrix for default /--network/--unsafe. - tests/Eval/Runner/WrapperBuilderTest.php: generated source contains the snippet verbatim and passes
php -l. - tests/Eval/EvaluatorTest.php: full subprocess integration. Asserts that
disable_functionsblocksexec, thatopen_basedirblocks writes outside the project root, that a runaway loop is killed at the wall-clock deadline, and thatcontainer()resolves bindings from an explicit bootstrap file. - tests/Eval/Cli/EvalCommandTest.php: every command path including
--unsafeevent recording and--file=….
When you add a guard or a security policy, mirror this pattern: write a snippet that would break it, run it through the real Evaluator, and assert the guard held.
Related packages
Section titled “Related packages”univeros/tinker: the in-process REPL. Use Tinker when you want a session against the live container; use Eval when you want a one-shot, structured, audit-loggable check.univeros/events: the append-only mutation log.--unsaferecords akind=evalevent there so a “we let it write” decision is recoverable.univeros/mcp: exposesframework__eval(without--unsafe) so shell-less agents share the primitive.univeros/container: whatcontainer()resolves; the same Container the rest of the framework uses.
Limitations
Section titled “Limitations”eval()the language construct cannot be disabled viadisable_functions. It is not a function. The subprocess sandbox (open_basedir,memory_limit, the wall-clock kill) still bounds the damage, but a snippet that callseval('…')can run further PHP that the same sandbox covers.- Network blocking is best-effort.
disable_functionsblockscurl_exec/fsockopen/stream_socket_client, andallow_url_fopen=0blocks the http:// stream wrappers. A determined snippet that opens raw sockets via an unblocked path is out of scope for an ini-level sandbox; kernel-level firewalling is the host’s job. --writesis host-cooperative. PHP has no generic “this PDO connection is read-only” toggle from outside the host’s wiring. The flag setsALTAIR_EVAL_ALLOW_WRITES=0|1for the host’s persistence Configuration to honour. A host that ignores the env var will run writes regardless.- The bootstrap file must live inside the project root: that is the
open_basedirconfinement, working as intended. Point--bootstrapat a file under the root, or rely on theconfig/container.phpconvention. - One-shot only. No persistent state, no REPL semantics, no shared scope across calls. If you want a session, use
univeros/tinker. --unsafeis unsafe. It lifts every ini-level guard. The audit event in.altair/events.jsonlis the safety net, not a substitute. Treat it as a deliberate, time-bounded action.- The snippet inherits the parent’s environment variables. That is by design (a snippet using
container(EntityManager::class)needsDB_*to be set), but it also means a snippet has read access to every secret the parent process can see (DB_PASSWORD,APP_KEY, etc.) viagetenv(). The function-level network block makes exfiltration hard but not impossible; treat secrets in the parent process environment as exposed to the snippet’s author. - Exception payloads leak internal file paths.
ExceptionEncoderincludesgetFile()/getLine()and the stack trace, including absolute paths into framework and vendor source. Useful for debugging; intentional path disclosure for an agent context. --fileis not confined. The CLI’s--file=<path>reads any path readable by the operator (no project-root check). This is a CLI surface (the operator is trusted) and is not exposed via the MCP tool; the MCP tool only accepts an inlinesnippet.