Benchmarks
Tulpar’s tagline is “as easy as Python, as fast as C”. The CPU side of that claim is verified on every commit; the HTTP side gets us into Node.js territory single-threaded.
CPU benchmarks
Section titled “CPU benchmarks”benchmarks/loopsum.tpr and benchmarks/fib.tpr use the typed AOT
path (explicit : int return type) for native LLVM i64 codegen.
| Benchmark | C (gcc -O2) | Tulpar AOT | Ratio |
|---|---|---|---|
loopsum (10M sum) | 67 ms | 88 ms | 1.31× C |
fib(35) recursive | 83 ms | 114 ms | 1.37× C |
Tulpar AOT lands in the 1.3–1.4× C range — same neighbourhood as Rust and Go on identical hardware. Best-of-3 wall-clock times on Windows 11 + MinGW64.
HTTP throughput
Section titled “HTTP throughput”benchmarks/http_bench.py — 5000 GET requests over 4 keep-alive TCP
connections, single-threaded servers, JSON {"hello":"world"} body.
| Server | Wall (s) | req/sec | Notes |
|---|---|---|---|
| Tulpar Wings | 0.193 | ~26 000 | Single-thread listen(), NODELAY on accept, dlsym handler cache. |
| Node.js http | 0.184 | ~27 200 | http.createServer, V8 22.x. |
| Python ThreadingHTTP | 0.354 | ~14 100 | ThreadingHTTPServer, single CPython process. |
listen_async() enables a multi-threaded variant — handler dispatch
still serialises under _wings_handler_mu until LLVM thread-local
globals land, but parallel recv / send lifts throughput on
keep-alive workloads where many connections sit idle.
What got us here
Section titled “What got us here”Hot-path optimisations applied:
call(handler_name)dlsym cache (256-slot FNV-1a hash) — eliminates the symbol-table walk per request.- TCP_NODELAY on accept — removes Nagle’s 40ms batching delay on small JSON responses (+13% req/sec).
- Static thread-local recv buffer — drops a 64 KB malloc/free pair per keep-alive request.
- Per-request arena reset — bounded memory on long-running servers without leaking.
break/continuereal codegen — was silently no-op’d before, prevents LLVM from emitting suboptimal phi nodes around induction variables.
Trade-offs we explicitly skipped (analysed, low value):
- Object-key inline caching (
req["method"]~0.3 % of HTTP path). - String concat coalescing (
a + b + c~0.1 % of HTTP path).
Methodology
Section titled “Methodology”- Native compilers:
gcc -O2. Rust:rustc -C opt-level=3. Go:go builddefaults. - HTTP results use single-threaded servers everywhere; running Node with cluster or Tulpar with a thread-pool mode is a separate measurement.
- Numbers regenerated via
benchmarks/run_benchmarks.sh(CPU) andpython benchmarks/http_bench.py(HTTP).