Skip to content

Benchmarks

Tulpar’s tagline is “as easy as Python, as fast as C”. The CPU side of that claim is verified on every commit; the HTTP side gets us into Node.js territory single-threaded.

benchmarks/loopsum.tpr and benchmarks/fib.tpr use the typed AOT path (explicit : int return type) for native LLVM i64 codegen.

BenchmarkC (gcc -O2)Tulpar AOTRatio
loopsum (10M sum)67 ms88 ms1.31× C
fib(35) recursive83 ms114 ms1.37× C

Tulpar AOT lands in the 1.3–1.4× C range — same neighbourhood as Rust and Go on identical hardware. Best-of-3 wall-clock times on Windows 11 + MinGW64.

benchmarks/http_bench.py — 5000 GET requests over 4 keep-alive TCP connections, single-threaded servers, JSON {"hello":"world"} body.

ServerWall (s)req/secNotes
Tulpar Wings0.193~26 000Single-thread listen(), NODELAY on accept, dlsym handler cache.
Node.js http0.184~27 200http.createServer, V8 22.x.
Python ThreadingHTTP0.354~14 100ThreadingHTTPServer, single CPython process.

listen_async() enables a multi-threaded variant — handler dispatch still serialises under _wings_handler_mu until LLVM thread-local globals land, but parallel recv / send lifts throughput on keep-alive workloads where many connections sit idle.

Hot-path optimisations applied:

  • call(handler_name) dlsym cache (256-slot FNV-1a hash) — eliminates the symbol-table walk per request.
  • TCP_NODELAY on accept — removes Nagle’s 40ms batching delay on small JSON responses (+13% req/sec).
  • Static thread-local recv buffer — drops a 64 KB malloc/free pair per keep-alive request.
  • Per-request arena reset — bounded memory on long-running servers without leaking.
  • break / continue real codegen — was silently no-op’d before, prevents LLVM from emitting suboptimal phi nodes around induction variables.

Trade-offs we explicitly skipped (analysed, low value):

  • Object-key inline caching (req["method"] ~0.3 % of HTTP path).
  • String concat coalescing (a + b + c ~0.1 % of HTTP path).
  • Native compilers: gcc -O2. Rust: rustc -C opt-level=3. Go: go build defaults.
  • HTTP results use single-threaded servers everywhere; running Node with cluster or Tulpar with a thread-pool mode is a separate measurement.
  • Numbers regenerated via benchmarks/run_benchmarks.sh (CPU) and python benchmarks/http_bench.py (HTTP).