# Performance Tuning

Vulnerability-Lookup is a multi-layered system. Tuning each layer appropriately
is important for production deployments handling large volumes of vulnerability data.

This guide covers the main components: Gunicorn (web server), Kvrocks (primary storage),
Valkey/Redis (cache), PostgreSQL (user accounts and metadata), and optional PgBouncer
(connection pooling).

## Gunicorn (Web Server)

Gunicorn serves the Flask application. Its settings are controlled via
`config/generic.json` and applied in `bin/start_website.py`.

### Workers

The `website_workers` setting in `config/generic.json` controls the number of
Gunicorn worker processes. A common starting point:

```text
workers = (2 × CPU_CORES) + 1
```

For I/O-bound workloads (typical for Vulnerability-Lookup), you can go higher since the
`gevent` async worker class is used. Monitor memory usage — each worker is a separate
process.

### Timeouts

The default request timeout is **300 seconds** (5 minutes). This accommodates large API
responses (e.g., full vulnerability exports). The graceful shutdown timeout is **2 seconds**.

Key Gunicorn flags used:

```text
--worker-class gevent       # async worker for concurrent I/O
--timeout 300               # max request processing time
--graceful-timeout 2        # grace period on shutdown
--reuse-port                # SO_REUSEPORT for load balancing across workers
--proxy-protocol            # preserve client IPs behind a reverse proxy
```

## Kvrocks (Primary Storage)

Kvrocks is the primary data store for all vulnerability records. Its configuration
is in `storage/kvrocks.conf`.

### Worker Threads

```ini
workers 8
```

The number of I/O threads processing client commands. Increase on machines with more
CPU cores and high concurrency. A reasonable value is the number of CPU cores available
to the Kvrocks process.

### Connection Limits

```ini
maxclients 10000
tcp-backlog 511
```

`maxclients` sets the maximum number of concurrent client connections. Ensure the
OS file descriptor limit (`ulimit -n`) is higher than this value.

For `tcp-backlog`, also verify the kernel setting:

```bash
sysctl net.core.somaxconn        # should be >= tcp-backlog
sysctl net.ipv4.tcp_max_syn_backlog
```

### RocksDB Tuning

Kvrocks uses RocksDB as its storage engine. Key parameters:

```ini
rocksdb.block_cache_size 4096           # block cache in MB (default 4 GB)
rocksdb.max_open_files 8096             # file descriptors for SST files
rocksdb.write_buffer_size 64            # memtable size in MB
rocksdb.max_write_buffer_number 4       # concurrent memtables
rocksdb.max_background_jobs 4           # compaction and flush threads
rocksdb.target_file_size_base 128       # SST file target size in MB
rocksdb.max_total_wal_size 512          # WAL size limit in MB
```

**Recommendations for large instances:**

- **block_cache_size**: Increase to fit the hot dataset in memory. On a dedicated server with
  plenty of RAM, allocate 25-50% of available memory.
- **max_open_files**: Set to `-1` to keep all files open (avoids open/close overhead).
- **write_buffer_size**: Increase for write-heavy workloads (e.g., initial bulk import).
  Larger buffers reduce write amplification.
- **max_background_jobs**: Increase on multi-core systems to speed up compaction.

### Disk I/O Throttling

```ini
max-io-mb 0                 # 0 = no limit on flush/compaction write rate
max-replication-mb 0        # 0 = no limit on replication rate
```

Set `max-io-mb` to a non-zero value if compaction I/O impacts foreground latency.

## Valkey/Redis (Cache Layer)

The cache layer uses a Unix domain socket for lowest-latency local communication.
Its configuration is in `cache/cache.conf`.

```ini
port 0                      # TCP disabled — Unix socket only
unixsocket cache.sock
unixsocketperm 700
tcp-keepalive 300
```

Since the cache runs locally on a Unix socket (no TCP overhead), it is already
optimized for latency. The main tuning concern is memory:

- Monitor memory usage with `redis-cli -s cache/cache.sock INFO memory`.
- Set `maxmemory` and an eviction policy (e.g., `allkeys-lru`) if the cache
  should not grow unbounded.

## PostgreSQL

PostgreSQL stores user accounts, comments, bundles, sightings, and watchlists.
Tune based on your expected user base and concurrent connections.

```ini
# Connection / memory
max_connections = 200                     # keep moderate when using PgBouncer
shared_buffers = 64GB                     # ~25% of RAM
work_mem = 64MB                           # per query memory; adjust if heavy queries
maintenance_work_mem = 4GB

# WAL / checkpoints
wal_buffers = 16MB
checkpoint_completion_target = 0.9
max_wal_size = 4GB
min_wal_size = 1GB

# Autovacuum
autovacuum = on
autovacuum_max_workers = 10
autovacuum_naptime = 10s
autovacuum_vacuum_cost_limit = 2000

# Logging
log_connections = on
log_disconnections = on
log_min_duration_statement = 2000         # log slow queries > 2s
```

:::{note}
The `shared_buffers` value should be adjusted based on your actual server RAM.
A common guideline is ~25% of available memory.
:::

## PgBouncer (Optional)

PgBouncer is a connection pooler that sits between the application and PostgreSQL.
It reduces the overhead of establishing new connections and manages a pool of
persistent connections. This is recommended when running many Gunicorn workers
(each worker may hold its own database connections).

```ini
[databases]
vulnlookup = host=127.0.0.1 port=5432 dbname=vulnlookup

[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
admin_users = vuln-lookup

pool_mode = transaction                     # best for web apps
default_pool_size = 200                     # number of server connections per DB
min_pool_size = 50
reserve_pool_size = 50
reserve_pool_timeout = 5.0                  # seconds to wait for reserve
max_client_conn = 5000

# Logging / stats
logfile = /var/log/pgbouncer/pgbouncer.log
log_connections = 1
log_disconnections = 1
```

:::{note}
When using PgBouncer, point `DB_CONFIG_DICT` in `config/website.py` to the
PgBouncer port (6432) instead of PostgreSQL directly (5432).
:::

:::{note}
Here `default_pool_size` (200) equals PostgreSQL's `max_connections` (200).
Consider reserving a few connections for direct admin access by either increasing
`max_connections` or slightly reducing `default_pool_size`.
:::

## SQLAlchemy Connection Pool

The SQLAlchemy engine options in `config/website.py` control the application-level
connection pool to PostgreSQL:

**Without PgBouncer** (default):

```python
SQLALCHEMY_ENGINE_OPTIONS = {
    "pool_size": 100,               # persistent connections in the pool
    "max_overflow": 50,             # extra connections during traffic spikes
    "pool_timeout": 30,             # seconds to wait for a connection
    "pool_recycle": 1800,           # recycle connections every 30 minutes
    "pool_pre_ping": True,          # verify connection liveness before use
}
```

:::{warning}
Each Gunicorn worker process creates its own connection pool. The total number of
possible connections is `pool_size × website_workers`. For example, with
`pool_size=100` and 49 workers, the application could open up to 4,900 connections,
far exceeding PostgreSQL's `max_connections`. Without PgBouncer, reduce `pool_size`
so that `pool_size × website_workers` stays well below `max_connections`.
:::

**With PgBouncer** (let PgBouncer manage the pool):

```python
SQLALCHEMY_ENGINE_OPTIONS = {
    "pool_size": 0,                 # no persistent connections; rely on PgBouncer
    "max_overflow": 0,
    "pool_pre_ping": True,
    "pool_timeout": 30,
    "pool_recycle": 3600,
}
```

## Logging

Reducing log verbosity in production decreases disk I/O and improves throughput:

- `config/generic.json`: set `loglevel` to `WARNING` or `ERROR`
- `config/website.py`: set `LOG_LEVEL` to `WARNING`
- `storage/kvrocks.conf`: `log-level warning` (default)
- Per-feeder log levels in `config/modules.cfg`: change `level = DEBUG` to
  `level = WARNING` for production

The logging configuration in `config/logging.json` uses rotating file handlers
(1 MB per file, 5 backups) which bounds disk usage.

## Operating System Tuning

### File Descriptors

Kvrocks, Valkey, and Gunicorn all benefit from high file descriptor limits:

```bash
# /etc/security/limits.conf  (or systemd unit override)
* soft nofile 65536
* hard nofile 65536
```

### Network Stack

For high-concurrency deployments:

```bash
sysctl -w net.core.somaxconn=4096
sysctl -w net.ipv4.tcp_max_syn_backlog=4096
sysctl -w net.core.netdev_max_backlog=4096
sysctl -w vm.overcommit_memory=1              # recommended for Redis/Kvrocks
```

### Transparent Huge Pages (THP)

Disable THP for Kvrocks and Valkey to avoid latency spikes:

```bash
echo never > /sys/kernel/mm/transparent_hugepage/enabled
```

## Reverse Proxy (Nginx)

When running behind Nginx, a minimal performance-oriented configuration:

```nginx
upstream vulnlookup {
    server 127.0.0.1:10001;
    keepalive 64;
}

server {
    listen 443 ssl http2;
    server_name vulnerability.example.org;

    client_max_body_size 10m;

    location / {
        proxy_pass http://vulnlookup;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        proxy_connect_timeout 10;
        proxy_read_timeout 300;           # match Gunicorn timeout
        proxy_send_timeout 300;
    }
}
```

:::{note}
Gunicorn is started with `--proxy-protocol`. If your reverse proxy supports
the PROXY protocol, enable it. Otherwise, rely on `X-Forwarded-For` headers
and consider removing `--proxy-protocol` from `bin/start_website.py`.
:::

## Summary

| Layer | Key Settings | Tune When |
|-------|-------------|-----------|
| Gunicorn | `website_workers`, `--timeout`, `--worker-class gevent` | High concurrent users, slow responses |
| Kvrocks | `workers`, `maxclients`, `rocksdb.block_cache_size` | Large dataset, slow queries, high write load |
| Valkey/Redis | `maxmemory`, eviction policy | Cache memory growing unbounded |
| PostgreSQL | `max_connections`, `shared_buffers`, `work_mem` | Slow user/account queries, high concurrency |
| PgBouncer | `pool_mode`, `default_pool_size`, `max_client_conn` | Many workers exhausting PostgreSQL connections |
| SQLAlchemy | `pool_size`, `max_overflow`, `pool_pre_ping` | Connection errors, stale connections |
| OS | `nofile`, `somaxconn`, THP | Connection refused errors, latency spikes |
