GHSA-6PR9-RP53-2PMC

Vulnerability from github – Published: 2026-06-17 14:06 – Updated: 2026-06-17 14:06
VLAI
Summary
vLLM: OOM Denial of Service via Audio Decompression Bomb
Details

Summary

vLLM's /v1/audio/transcriptions endpoint limits compressed upload size but not decoded PCM output. A 25MB OPUS file expands to ~14.9GB of float32 PCM at decode time. Tested on vLLM v0.19.0.

Details

SpeechToTextProcessor rejects uploads over VLLM_MAX_AUDIO_CLIP_FILESIZE_MB (default 25MB) based on compressed byte length, but the audio decoder in audio.py accumulates all decoded frames into memory with no size limit before returning:

# speech_to_text.py L184-189
if len(audio_data) / 1024 ** 2 > self.max_audio_filesize_mb:
    raise VLLMValidationError(...)
y, sr = load_audio(buf, sr=self.asr_config.sample_rate)  # decoded size unchecked

# audio.py L77-107
chunks: list[npt.NDArray] = []
for frame in container.decode(stream):
    chunks.append(frame.to_ndarray())
audio = np.concatenate(chunks, axis=-1).astype(np.float32)  # single contiguous allocation

A 25MB OPUS file at 6kbps encodes ~8.7 hours of audio. Decoding produces ~5.7GB of float32 PCM (232x amplification), and np.concatenate then allocates a second contiguous array, bringing peak RSS to ~14.9GB from a single request. SpeechToTextConfig.max_audio_clip_s (default 30s) applies only after the full decode and does not prevent the allocation.

Impact

An unauthenticated attacker can exhaust server memory with a small number of concurrent requests, each a valid upload within the documented size limit. Severity was assessed with reference to prior OOM vulnerability reports in vLLM.

Fix

A fix for this vulnerability was merged here: https://github.com/vllm-project/vllm/pull/44970

Show details on source website

{
  "affected": [
    {
      "package": {
        "ecosystem": "PyPI",
        "name": "vllm"
      },
      "ranges": [
        {
          "events": [
            {
              "introduced": "0"
            },
            {
              "last_affected": "0.23.0"
            }
          ],
          "type": "ECOSYSTEM"
        }
      ]
    }
  ],
  "aliases": [
    "CVE-2026-54233"
  ],
  "database_specific": {
    "cwe_ids": [
      "CWE-409"
    ],
    "github_reviewed": true,
    "github_reviewed_at": "2026-06-17T14:06:22Z",
    "nvd_published_at": null,
    "severity": "MODERATE"
  },
  "details": "### Summary\nvLLM\u0027s `/v1/audio/transcriptions` endpoint limits compressed upload size but not decoded PCM output. A 25MB OPUS file expands to ~14.9GB of float32 PCM at decode time. Tested on vLLM v0.19.0.\n\n### Details\n`SpeechToTextProcessor` rejects uploads over `VLLM_MAX_AUDIO_CLIP_FILESIZE_MB` (default 25MB) based on compressed byte length, but the audio decoder in `audio.py` accumulates all decoded frames into memory with no size limit before returning:\n\n```python\n# speech_to_text.py L184-189\nif len(audio_data) / 1024 ** 2 \u003e self.max_audio_filesize_mb:\n    raise VLLMValidationError(...)\ny, sr = load_audio(buf, sr=self.asr_config.sample_rate)  # decoded size unchecked\n\n# audio.py L77-107\nchunks: list[npt.NDArray] = []\nfor frame in container.decode(stream):\n    chunks.append(frame.to_ndarray())\naudio = np.concatenate(chunks, axis=-1).astype(np.float32)  # single contiguous allocation\n```\n\nA 25MB OPUS file at 6kbps encodes ~8.7 hours of audio. Decoding produces ~5.7GB of float32 PCM (232x amplification), and `np.concatenate` then allocates a second contiguous array, bringing peak RSS to ~14.9GB from a single request. `SpeechToTextConfig.max_audio_clip_s` (default 30s) applies only after the full decode and does not prevent the allocation.\n\n### Impact\nAn unauthenticated attacker can exhaust server memory with a small number of concurrent requests, each a valid upload within the documented size limit. Severity was assessed with reference to prior OOM vulnerability reports in vLLM.\n\n### Fix\n\nA fix for this vulnerability was merged here: https://github.com/vllm-project/vllm/pull/44970",
  "id": "GHSA-6pr9-rp53-2pmc",
  "modified": "2026-06-17T14:06:22Z",
  "published": "2026-06-17T14:06:22Z",
  "references": [
    {
      "type": "WEB",
      "url": "https://github.com/vllm-project/vllm/security/advisories/GHSA-6pr9-rp53-2pmc"
    },
    {
      "type": "WEB",
      "url": "https://github.com/vllm-project/vllm/pull/44970"
    },
    {
      "type": "WEB",
      "url": "https://github.com/vllm-project/vllm/commit/1b1359c33269446f13c05da9a90c25174cbea590"
    },
    {
      "type": "PACKAGE",
      "url": "https://github.com/vllm-project/vllm"
    },
    {
      "type": "WEB",
      "url": "https://github.com/vllm-project/vllm/releases/tag/v0.23.1rc0"
    }
  ],
  "schema_version": "1.4.0",
  "severity": [
    {
      "score": "CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H",
      "type": "CVSS_V3"
    }
  ],
  "summary": "vLLM: OOM Denial of Service via Audio Decompression Bomb"
}


Log in or create an account to share your comment.




Tags
Taxonomy of the tags.


Loading…

Loading…

Loading…

Forecast uses a logistic model when the trend is rising, or an exponential decay model when the trend is falling. Fitted via linearized least squares.

Sightings

Author Source Type Date Other

Nomenclature

  • Seen: The vulnerability was mentioned, discussed, or observed by the user.
  • Confirmed: The vulnerability has been validated from an analyst's perspective.
  • Published Proof of Concept: A public proof of concept is available for this vulnerability.
  • Exploited: The vulnerability was observed as exploited by the user who reported the sighting.
  • Patched: The vulnerability was observed as successfully patched by the user who reported the sighting.
  • Not exploited: The vulnerability was not observed as exploited by the user who reported the sighting.
  • Not confirmed: The user expressed doubt about the validity of the vulnerability.
  • Not patched: The vulnerability was not observed as successfully patched by the user who reported the sighting.

Loading…

Detection rules are retrieved from Rulezet.

Loading…

Loading…