Skip to content

Benchmarks

Tulpar’s tagline is “as easy as Python, as fast as C”. On CPU it sits in Rust/Go territory; on HTTP its multi-core listen_pool server out-throughputs Go’s net/http and leaves FastAPI far behind — all from a single self-contained binary with no runtime to ship.

benchmarks/fib.tpr uses the typed AOT path (explicit : int return type) for native LLVM i64 codegen. The recursion depth is read from an env var so LLVM can’t partially evaluate the result at compile time.

EnvironmentCTulpar AOTRatio
Windows 11 / MinGW64, fib(35), best-of-383 ms114 ms1.37× C
Linux WSL2 / gcc 15 -O2, fib(40), best-of-5140 ms261 ms1.86× C

The ratio depends on the C compiler, not just on Tulpar. Both targets are LLVM/AOT-native; the gap is wider against gcc -O2 (a very aggressive C) than against MinGW’s older GCC. Either way Tulpar lands in the ~1.4–1.9× C band — the same neighbourhood as Rust and Go, and orders of magnitude ahead of interpreted Python.

benchmarks/loadtest (a native C load generator) hammers each server with GET / returning JSON {"hello":"world"} over keep-alive connections, concurrency swept 1–12 (kept under the box’s core count so the load generator never starves the server), 4 s per level, best run confirmed by a second pass. Box: 14-vCPU WSL2. Each runtime in its recommended single-process config.

Serverreq/secp50 latencyConfiguration
Tulpar listen_pool~36k0.32 msall 14 cores, 1 process
Go net/http~30k0.38 msall cores (default), 1 process
Node.js http~8.7k1.06 ms1 thread (default)
FastAPI (uvicorn)~3.5k3.31 ms1 worker (default)
Tulpar listen~4–4.7k0.22 ms1 thread, serial accept loop

Threading models differ and are labelled above: listen_pool and Go’s net/http use every core out of the box, while Node and a single uvicorn worker default to one. Tulpar’s single-thread listen() is a serial accept loop — it has the lowest per-request latency (0.22 ms p50) but serialises keep-alive connections, so for throughput use listen_pool (or listen_async). Notably, single-thread listen() trails single-thread Node here; Tulpar’s lead comes from listen_pool scaling cleanly across cores (p50 stays at 0.32 ms at 36k req/s, with a clean sub-millisecond tail).

Versus FastAPI specifically, Tulpar also wins decisively on latency (~10× lower p50) and footprint — see the dedicated Wings vs FastAPI writeup (p50 0.31 ms vs 28 ms under load, 6.7 MB vs 54 MB RSS, 2 MB self-contained binary vs Python + ~50 MB of deps).

Hot-path optimisations applied:

  • call(handler_name) dlsym cache (256-slot FNV-1a hash) — eliminates the symbol-table walk per request.
  • TCP_NODELAY on accept — removes Nagle’s 40ms batching delay on small JSON responses (+13% req/sec).
  • Static thread-local recv buffer — drops a 64 KB malloc/free pair per keep-alive request.
  • Per-request arena reset + per-request malloc region — bounded memory on long-running servers without leaking.
  • Thread-local scratch buffers in built-ins — non-TLS statics raced under listen_pool (a toString buffer caused ~1.1% spurious 404s until fixed).
  • break / continue real codegen — was silently no-op’d before, prevents LLVM from emitting suboptimal phi nodes around induction variables.

Trade-offs we explicitly skipped (analysed, low value):

  • Object-key inline caching (req["method"] ~0.3 % of HTTP path).
  • String concat coalescing (a + b + c ~0.1 % of HTTP path).
  • CPU: gcc -O2 (Linux gcc 15 / Windows MinGW64). Best-of-N wall-clock. The workload input is opaque to the compiler (read at runtime) so -O2 can’t constant-fold it, and N is large enough that process startup is negligible.
  • HTTP: native benchmarks/loadtest, single box, keep-alive, concurrency kept ≤ core count so the load generator and server don’t fight for CPU. Toolchains: Go 1.23, Node 22, Python 3.14 + FastAPI/uvicorn. Servers run in their default single-process configuration (core usage labelled per row).
  • Absolute numbers are box-specific; treat the ratios and latency as the portable signal. Reproduce CPU via benchmarks/run_benchmarks.sh; the HTTP servers + driver used here are minimal equivalents returning the same JSON.