Vulnerability-Lookup

CVE-2026-44223 (GCVE-0-2026-44223)

Vulnerability from cvelistv5 – Published: 2026-05-12 19:58 – Updated: 2026-06-22 21:49

Title

vLLM: extract_hidden_states speculative decoding crashes server on any request with penalty parameters

Summary

vLLM is an inference and serving engine for large language models (LLMs). From 0.18.0 to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.

Severity

6.5 (Medium)


                        
                          CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

SSVC

Exploitation: poc Automatable: no Technical Impact: partial

CISA Coordinator (v2.0.3)

CWE

CWE-131 - Incorrect Calculation of Buffer Size
CWE-704 - Incorrect Type Conversion or Cast

Assigner

GitHub_M

References

2 references

URL	Tags
https://github.com/vllm-project/vllm/security/adv…	x_refsource_CONFIRM
https://github.com/vllm-project/vllm/pull/38610	x_refsource_MISC

Impacted products

1 product

Vendor	Product	Version
vllm-project	vllm	Affected: >= 0.18.0, < 0.20.0

Show details on NVD website

JSON

To clipboard

{
  "containers": {
    "adp": [
      {
        "metrics": [
          {
            "other": {
              "content": {
                "id": "CVE-2026-44223",
                "options": [
                  {
                    "Exploitation": "poc"
                  },
                  {
                    "Automatable": "no"
                  },
                  {
                    "Technical Impact": "partial"
                  }
                ],
                "role": "CISA Coordinator",
                "timestamp": "2026-05-15T14:44:05.012494Z",
                "version": "2.0.3"
              },
              "type": "ssvc"
            }
          }
        ],
        "providerMetadata": {
          "dateUpdated": "2026-05-15T14:46:25.695Z",
          "orgId": "134c704f-9b21-4f2e-91b3-4a467353bcc0",
          "shortName": "CISA-ADP"
        },
        "references": [
          {
            "tags": [
              "exploit"
            ],
            "url": "https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw"
          },
          {
            "tags": [
              "exploit"
            ],
            "url": "https://github.com/vllm-project/vllm/pull/38610"
          }
        ],
        "title": "CISA ADP Vulnrichment"
      }
    ],
    "cna": {
      "affected": [
        {
          "product": "vllm",
          "vendor": "vllm-project",
          "versions": [
            {
              "status": "affected",
              "version": "\u003e= 0.18.0, \u003c 0.20.0"
            }
          ]
        }
      ],
      "descriptions": [
        {
          "lang": "en",
          "value": "vLLM is an inference and serving engine for large language models (LLMs). From 0.18.0 to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., \"repetition_penalty\": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0."
        }
      ],
      "metrics": [
        {
          "cvssV3_1": {
            "attackComplexity": "LOW",
            "attackVector": "NETWORK",
            "availabilityImpact": "HIGH",
            "baseScore": 6.5,
            "baseSeverity": "MEDIUM",
            "confidentialityImpact": "NONE",
            "integrityImpact": "NONE",
            "privilegesRequired": "LOW",
            "scope": "UNCHANGED",
            "userInteraction": "NONE",
            "vectorString": "CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H",
            "version": "3.1"
          }
        }
      ],
      "problemTypes": [
        {
          "descriptions": [
            {
              "cweId": "CWE-131",
              "description": "CWE-131: Incorrect Calculation of Buffer Size",
              "lang": "en",
              "type": "CWE"
            }
          ]
        },
        {
          "descriptions": [
            {
              "cweId": "CWE-704",
              "description": "CWE-704: Incorrect Type Conversion or Cast",
              "lang": "en",
              "type": "CWE"
            }
          ]
        }
      ],
      "providerMetadata": {
        "dateUpdated": "2026-06-22T21:49:24.277Z",
        "orgId": "a0819718-46f1-4df5-94e2-005712e83aaa",
        "shortName": "GitHub_M"
      },
      "references": [
        {
          "name": "https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw",
          "tags": [
            "x_refsource_CONFIRM"
          ],
          "url": "https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw"
        },
        {
          "name": "https://github.com/vllm-project/vllm/pull/38610",
          "tags": [
            "x_refsource_MISC"
          ],
          "url": "https://github.com/vllm-project/vllm/pull/38610"
        }
      ],
      "source": {
        "advisory": "GHSA-83vm-p52w-f9pw",
        "discovery": "UNKNOWN"
      },
      "title": "vLLM: extract_hidden_states speculative decoding crashes server on any request with penalty parameters"
    }
  },
  "cveMetadata": {
    "assignerOrgId": "a0819718-46f1-4df5-94e2-005712e83aaa",
    "assignerShortName": "GitHub_M",
    "cveId": "CVE-2026-44223",
    "datePublished": "2026-05-12T19:58:40.862Z",
    "dateReserved": "2026-05-05T15:42:40.518Z",
    "dateUpdated": "2026-06-22T21:49:24.277Z",
    "state": "PUBLISHED"
  },
  "dataType": "CVE_RECORD",
  "dataVersion": "5.2",
  "vulnerability-lookup:meta": {
    "epss": {
      "cve": "CVE-2026-44223",
      "date": "2026-07-27",
      "epss": "0.00367",
      "percentile": "0.29231"
    },
    "nvd": "{\"cve\":{\"id\":\"CVE-2026-44223\",\"sourceIdentifier\":\"security-advisories@github.com\",\"published\":\"2026-05-12T20:16:43.293\",\"lastModified\":\"2026-06-22T22:16:45.507\",\"vulnStatus\":\"Modified\",\"cveTags\":[],\"descriptions\":[{\"lang\":\"en\",\"value\":\"vLLM is an inference and serving engine for large language models (LLMs). From 0.18.0 to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., \\\"repetition_penalty\\\": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.\"}],\"affected\":[{\"source\":\"security-advisories@github.com\",\"affectedData\":[{\"vendor\":\"vllm-project\",\"product\":\"vllm\",\"versions\":[{\"version\":\"\u003e= 0.18.0, \u003c 0.20.0\",\"status\":\"affected\"}]}]}],\"metrics\":{\"cvssMetricV31\":[{\"source\":\"security-advisories@github.com\",\"type\":\"Secondary\",\"cvssData\":{\"version\":\"3.1\",\"vectorString\":\"CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H\",\"baseScore\":6.5,\"baseSeverity\":\"MEDIUM\",\"attackVector\":\"NETWORK\",\"attackComplexity\":\"LOW\",\"privilegesRequired\":\"LOW\",\"userInteraction\":\"NONE\",\"scope\":\"UNCHANGED\",\"confidentialityImpact\":\"NONE\",\"integrityImpact\":\"NONE\",\"availabilityImpact\":\"HIGH\"},\"exploitabilityScore\":2.8,\"impactScore\":3.6}],\"ssvcV203\":[{\"source\":\"134c704f-9b21-4f2e-91b3-4a467353bcc0\",\"ssvcData\":{\"timestamp\":\"2026-05-15T14:44:05.012494Z\",\"id\":\"CVE-2026-44223\",\"options\":[{\"exploitation\":\"poc\"},{\"automatable\":\"no\"},{\"technicalImpact\":\"partial\"}],\"role\":\"CISA Coordinator\",\"version\":\"2.0.3\"}}]},\"weaknesses\":[{\"source\":\"security-advisories@github.com\",\"type\":\"Secondary\",\"description\":[{\"lang\":\"en\",\"value\":\"CWE-131\"},{\"lang\":\"en\",\"value\":\"CWE-704\"}]}],\"configurations\":[{\"nodes\":[{\"operator\":\"OR\",\"negate\":false,\"cpeMatch\":[{\"vulnerable\":true,\"criteria\":\"cpe:2.3:a:vllm:vllm:*:*:*:*:*:*:*:*\",\"versionStartIncluding\":\"0.18.0\",\"versionEndExcluding\":\"0.20.0\",\"matchCriteriaId\":\"443F32C2-B323-4FA0-AB97-F6445C91C89E\"}]}]}],\"references\":[{\"url\":\"https://github.com/vllm-project/vllm/pull/38610\",\"source\":\"security-advisories@github.com\",\"tags\":[\"Issue Tracking\",\"Patch\"]},{\"url\":\"https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw\",\"source\":\"security-advisories@github.com\",\"tags\":[\"Mitigation\",\"Vendor Advisory\"]},{\"url\":\"https://github.com/vllm-project/vllm/pull/38610\",\"source\":\"134c704f-9b21-4f2e-91b3-4a467353bcc0\",\"tags\":[\"Issue Tracking\",\"Patch\"]},{\"url\":\"https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw\",\"source\":\"134c704f-9b21-4f2e-91b3-4a467353bcc0\",\"tags\":[\"Mitigation\",\"Vendor Advisory\"]}]}}",
    "vulnrichment": {
      "containers": "{\"adp\": [{\"title\": \"CISA ADP Vulnrichment\", \"metrics\": [{\"other\": {\"type\": \"ssvc\", \"content\": {\"id\": \"CVE-2026-44223\", \"role\": \"CISA Coordinator\", \"options\": [{\"Exploitation\": \"poc\"}, {\"Automatable\": \"no\"}, {\"Technical Impact\": \"partial\"}], \"version\": \"2.0.3\", \"timestamp\": \"2026-05-15T14:44:05.012494Z\"}}}], \"references\": [{\"url\": \"https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw\", \"tags\": [\"exploit\"]}, {\"url\": \"https://github.com/vllm-project/vllm/pull/38610\", \"tags\": [\"exploit\"]}], \"providerMetadata\": {\"orgId\": \"134c704f-9b21-4f2e-91b3-4a467353bcc0\", \"shortName\": \"CISA-ADP\", \"dateUpdated\": \"2026-05-15T14:43:40.735Z\"}}], \"cna\": {\"title\": \"vLLM: extract_hidden_states speculative decoding crashes server on any request with penalty parameters\", \"source\": {\"advisory\": \"GHSA-83vm-p52w-f9pw\", \"discovery\": \"UNKNOWN\"}, \"metrics\": [{\"cvssV3_1\": {\"scope\": \"UNCHANGED\", \"version\": \"3.1\", \"baseScore\": 6.5, \"attackVector\": \"NETWORK\", \"baseSeverity\": \"MEDIUM\", \"vectorString\": \"CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H\", \"integrityImpact\": \"NONE\", \"userInteraction\": \"NONE\", \"attackComplexity\": \"LOW\", \"availabilityImpact\": \"HIGH\", \"privilegesRequired\": \"LOW\", \"confidentialityImpact\": \"NONE\"}}], \"affected\": [{\"vendor\": \"vllm-project\", \"product\": \"vllm\", \"versions\": [{\"status\": \"affected\", \"version\": \"\u003e= 0.18.0, \u003c 0.20.0\"}]}], \"references\": [{\"url\": \"https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw\", \"name\": \"https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw\", \"tags\": [\"x_refsource_CONFIRM\"]}, {\"url\": \"https://github.com/vllm-project/vllm/pull/38610\", \"name\": \"https://github.com/vllm-project/vllm/pull/38610\", \"tags\": [\"x_refsource_MISC\"]}], \"descriptions\": [{\"lang\": \"en\", \"value\": \"vLLM is an inference and serving engine for large language models (LLMs). From 0.18.0 to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., \\\"repetition_penalty\\\": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.\"}], \"problemTypes\": [{\"descriptions\": [{\"lang\": \"en\", \"type\": \"CWE\", \"cweId\": \"CWE-131\", \"description\": \"CWE-131: Incorrect Calculation of Buffer Size\"}]}, {\"descriptions\": [{\"lang\": \"en\", \"type\": \"CWE\", \"cweId\": \"CWE-704\", \"description\": \"CWE-704: Incorrect Type Conversion or Cast\"}]}], \"providerMetadata\": {\"orgId\": \"a0819718-46f1-4df5-94e2-005712e83aaa\", \"shortName\": \"GitHub_M\", \"dateUpdated\": \"2026-06-22T21:49:24.277Z\"}}}",
      "cveMetadata": "{\"cveId\": \"CVE-2026-44223\", \"state\": \"PUBLISHED\", \"dateUpdated\": \"2026-06-22T21:49:24.277Z\", \"dateReserved\": \"2026-05-05T15:42:40.518Z\", \"assignerOrgId\": \"a0819718-46f1-4df5-94e2-005712e83aaa\", \"datePublished\": \"2026-05-12T19:58:40.862Z\", \"assignerShortName\": \"GitHub_M\"}",
      "dataType": "CVE_RECORD",
      "dataVersion": "5.2"
    }
  }
}

FKIE_CVE-2026-44223

Vulnerability from fkie_nvd - Published: 2026-05-12 20:16 - Updated: 2026-06-22 22:16

Severity

6.5 (Medium) - CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Summary

References

URL	Tags
security-advisories@github.com	https://github.com/vllm-project/vllm/pull/38610	Issue Tracking, Patch
security-advisories@github.com	https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw	Mitigation, Vendor Advisory
134c704f-9b21-4f2e-91b3-4a467353bcc0	https://github.com/vllm-project/vllm/pull/38610	Issue Tracking, Patch
134c704f-9b21-4f2e-91b3-4a467353bcc0	https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw	Mitigation, Vendor Advisory

Impacted products

	Vendor	Product	Version
	vllm	vllm	*

JSON

To clipboard

{
  "affected": [
    {
      "affectedData": [
        {
          "product": "vllm",
          "vendor": "vllm-project",
          "versions": [
            {
              "status": "affected",
              "version": "\u003e= 0.18.0, \u003c 0.20.0"
            }
          ]
        }
      ],
      "source": "security-advisories@github.com"
    }
  ],
  "configurations": [
    {
      "nodes": [
        {
          "cpeMatch": [
            {
              "criteria": "cpe:2.3:a:vllm:vllm:*:*:*:*:*:*:*:*",
              "matchCriteriaId": "443F32C2-B323-4FA0-AB97-F6445C91C89E",
              "versionEndExcluding": "0.20.0",
              "versionStartIncluding": "0.18.0",
              "vulnerable": true
            }
          ],
          "negate": false,
          "operator": "OR"
        }
      ]
    }
  ],
  "cveTags": [],
  "descriptions": [
    {
      "lang": "en",
      "value": "vLLM is an inference and serving engine for large language models (LLMs). From 0.18.0 to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., \"repetition_penalty\": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0."
    }
  ],
  "id": "CVE-2026-44223",
  "lastModified": "2026-06-22T22:16:45.507",
  "metrics": {
    "cvssMetricV31": [
      {
        "cvssData": {
          "attackComplexity": "LOW",
          "attackVector": "NETWORK",
          "availabilityImpact": "HIGH",
          "baseScore": 6.5,
          "baseSeverity": "MEDIUM",
          "confidentialityImpact": "NONE",
          "integrityImpact": "NONE",
          "privilegesRequired": "LOW",
          "scope": "UNCHANGED",
          "userInteraction": "NONE",
          "vectorString": "CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H",
          "version": "3.1"
        },
        "exploitabilityScore": 2.8,
        "impactScore": 3.6,
        "source": "security-advisories@github.com",
        "type": "Secondary"
      }
    ],
    "ssvcV203": [
      {
        "source": "134c704f-9b21-4f2e-91b3-4a467353bcc0",
        "ssvcData": {
          "id": "CVE-2026-44223",
          "options": [
            {
              "exploitation": "poc"
            },
            {
              "automatable": "no"
            },
            {
              "technicalImpact": "partial"
            }
          ],
          "role": "CISA Coordinator",
          "timestamp": "2026-05-15T14:44:05.012494Z",
          "version": "2.0.3"
        }
      }
    ]
  },
  "published": "2026-05-12T20:16:43.293",
  "references": [
    {
      "source": "security-advisories@github.com",
      "tags": [
        "Issue Tracking",
        "Patch"
      ],
      "url": "https://github.com/vllm-project/vllm/pull/38610"
    },
    {
      "source": "security-advisories@github.com",
      "tags": [
        "Mitigation",
        "Vendor Advisory"
      ],
      "url": "https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw"
    },
    {
      "source": "134c704f-9b21-4f2e-91b3-4a467353bcc0",
      "tags": [
        "Issue Tracking",
        "Patch"
      ],
      "url": "https://github.com/vllm-project/vllm/pull/38610"
    },
    {
      "source": "134c704f-9b21-4f2e-91b3-4a467353bcc0",
      "tags": [
        "Mitigation",
        "Vendor Advisory"
      ],
      "url": "https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw"
    }
  ],
  "sourceIdentifier": "security-advisories@github.com",
  "vulnStatus": "Modified",
  "weaknesses": [
    {
      "description": [
        {
          "lang": "en",
          "value": "CWE-131"
        },
        {
          "lang": "en",
          "value": "CWE-704"
        }
      ],
      "source": "security-advisories@github.com",
      "type": "Secondary"
    }
  ]
}

GHSA-83VM-P52W-F9PW

Vulnerability from github – Published: 2026-05-06 21:45 – Updated: 2026-06-08 19:52

Summary

vLLM: extract_hidden_states speculative decoding crashes server on any request with penalty parameters

Details

Summary

The extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty).

A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. The crash is deterministic and immediate — no concurrency, race condition, or special workload is required.

Details

In vLLM v0.17.0, the extract_hidden_states proposer's propose() method returned sampled_token_ids.unsqueeze(-1), producing a tensor of shape (batch_size, 1).

In PR #37013 (first released in v0.18.0), the KV connector interface was refactored out of propose(). The return type changed from tuple[Tensor, KVConnectorOutput | None] to Tensor, and the .unsqueeze(-1) call was removed along with the KV connector output:

# Before (v0.17.0):
return sampled_token_ids.unsqueeze(-1), kv_connector_output  # shape (batch_size, 1)

# After (v0.18.0+):
return sampled_token_ids  # shape (batch_size, 2) after first decode step

The refactor missed that sampled_token_ids changed semantics between the first and subsequent decode steps. After the first decode step, the rejection sampler allocates its output as (batch_size, max_spec_len + 1). With num_speculative_tokens=1, this produces shape (batch_size, 2) instead of the expected (batch_size, 1), causing a broadcast shape mismatch during penalty application.

Impact

Any vLLM deployment between v0.18.0 and v0.19.1 (inclusive) configured with extract_hidden_states speculative decoding is affected. A single API request containing any penalty parameter immediately and permanently crashes the EngineCore process, resulting in complete loss of service availability.

Patches

Fixed in PR #38610, first included in vLLM v0.20.0. The fix slices the return value to sampled_token_ids[:, :1], ensuring the correct (batch_size, 1) shape regardless of the rejection sampler's output dimensions.

Workarounds

Upgrade to vLLM v0.20.0 or later.
If upgrading is not possible, avoid using extract_hidden_states as the speculative decoding method on affected versions.
Alternatively, reject or strip penalty parameters (repetition_penalty, frequency_penalty, presence_penalty) from incoming requests at an API gateway before they reach vLLM.

Severity

6.5 (Medium)


                  
                    CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Show details on source website

JSON

To clipboard

{
  "affected": [
    {
      "package": {
        "ecosystem": "PyPI",
        "name": "vllm"
      },
      "ranges": [
        {
          "events": [
            {
              "introduced": "0.18.0"
            },
            {
              "fixed": "0.20.0"
            }
          ],
          "type": "ECOSYSTEM"
        }
      ]
    }
  ],
  "aliases": [
    "CVE-2026-44223"
  ],
  "database_specific": {
    "cwe_ids": [
      "CWE-131",
      "CWE-704"
    ],
    "github_reviewed": true,
    "github_reviewed_at": "2026-05-06T21:45:51Z",
    "nvd_published_at": "2026-05-12T20:16:43Z",
    "severity": "MODERATE"
  },
  "details": "### Summary\n\nThe `extract_hidden_states` speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a `RuntimeError` that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (`repetition_penalty`, `frequency_penalty`, or `presence_penalty`).\n\nA single request with a penalty parameter (e.g., `\"repetition_penalty\": 1.1`) is sufficient to crash the server. The crash is deterministic and immediate \u2014 no concurrency, race condition, or special workload is required.\n\n### Details\n\nIn vLLM v0.17.0, the `extract_hidden_states` proposer\u0027s `propose()` method returned `sampled_token_ids.unsqueeze(-1)`, producing a tensor of shape `(batch_size, 1)`.\n\nIn [PR #37013](https://github.com/vllm-project/vllm/pull/37013) (first released in v0.18.0), the KV connector interface was refactored out of `propose()`. The return type changed from `tuple[Tensor, KVConnectorOutput | None]` to `Tensor`, and the `.unsqueeze(-1)` call was removed along with the KV connector output:\n\n```python\n# Before (v0.17.0):\nreturn sampled_token_ids.unsqueeze(-1), kv_connector_output  # shape (batch_size, 1)\n\n# After (v0.18.0+):\nreturn sampled_token_ids  # shape (batch_size, 2) after first decode step\n```\n\nThe refactor missed that `sampled_token_ids` changed semantics between the first and subsequent decode steps. After the first decode step, the rejection sampler allocates its output as `(batch_size, max_spec_len + 1)`. With `num_speculative_tokens=1`, this produces shape `(batch_size, 2)` instead of the expected `(batch_size, 1)`, causing a broadcast shape mismatch during penalty application.\n\n### Impact\n\nAny vLLM deployment between v0.18.0 and v0.19.1 (inclusive) configured with `extract_hidden_states` speculative decoding is affected. A single API request containing any penalty parameter immediately and permanently crashes the EngineCore process, resulting in complete loss of service availability.\n\n### Patches\n\nFixed in [PR #38610](https://github.com/vllm-project/vllm/pull/38610), first included in vLLM v0.20.0. The fix slices the return value to `sampled_token_ids[:, :1]`, ensuring the correct `(batch_size, 1)` shape regardless of the rejection sampler\u0027s output dimensions.\n\n### Workarounds\n\n- Upgrade to vLLM v0.20.0 or later.\n- If upgrading is not possible, avoid using `extract_hidden_states` as the speculative decoding method on affected versions.\n- Alternatively, reject or strip penalty parameters (`repetition_penalty`, `frequency_penalty`, `presence_penalty`) from incoming requests at an API gateway before they reach vLLM.",
  "id": "GHSA-83vm-p52w-f9pw",
  "modified": "2026-06-08T19:52:35Z",
  "published": "2026-05-06T21:45:51Z",
  "references": [
    {
      "type": "WEB",
      "url": "https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw"
    },
    {
      "type": "ADVISORY",
      "url": "https://nvd.nist.gov/vuln/detail/CVE-2026-44223"
    },
    {
      "type": "WEB",
      "url": "https://github.com/vllm-project/vllm/pull/38610"
    },
    {
      "type": "WEB",
      "url": "https://github.com/pypa/advisory-database/tree/main/vulns/vllm/PYSEC-2026-145.yaml"
    },
    {
      "type": "PACKAGE",
      "url": "https://github.com/vllm-project/vllm"
    }
  ],
  "schema_version": "1.4.0",
  "severity": [
    {
      "score": "CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H",
      "type": "CVSS_V3"
    }
  ],
  "summary": "vLLM: extract_hidden_states speculative decoding crashes server on any request with penalty parameters"
}

PYSEC-2026-145

Vulnerability from pysec - Published: 2026-05-12 20:16 - Updated: 2026-05-20 09:19

Details

vLLM is an inference and serving engine for large language models (LLMs). From to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.

Severity

6.5 (Medium)


                  
                    CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Impacted products

Name	purl
vllm	pkg:pypi/vllm

Aliases

JSON

To clipboard

{
  "affected": [
    {
      "package": {
        "ecosystem": "PyPI",
        "name": "vllm",
        "purl": "pkg:pypi/vllm"
      },
      "ranges": [
        {
          "events": [
            {
              "introduced": "0.18.0"
            },
            {
              "fixed": "0.20.0"
            }
          ],
          "type": "ECOSYSTEM"
        }
      ],
      "versions": [
        "0.18.0",
        "0.18.1",
        "0.19.0",
        "0.19.1"
      ]
    }
  ],
  "aliases": [
    "CVE-2026-44223",
    "GHSA-83vm-p52w-f9pw"
  ],
  "details": "vLLM is an inference and serving engine for large language models (LLMs). From  to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., \"repetition_penalty\": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.",
  "id": "PYSEC-2026-145",
  "modified": "2026-05-20T09:19:21.596358Z",
  "published": "2026-05-12T20:16:43.293Z",
  "references": [
    {
      "type": "ADVISORY",
      "url": "https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw"
    },
    {
      "type": "FIX",
      "url": "https://github.com/vllm-project/vllm/pull/38610"
    }
  ],
  "severity": [
    {
      "score": "CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H",
      "type": "CVSS_V3"
    }
  ]
}

WID-SEC-W-2026-1299

Vulnerability from csaf_certbund - Published: 2026-04-28 22:00 - Updated: 2026-05-12 22:00

Summary

vllm: Schwachstelle ermöglicht Denial of Service

Severity

Mittel

Notes

Das BSI ist als Anbieter für die eigenen, zur Nutzung bereitgestellten Inhalte nach den allgemeinen Gesetzen verantwortlich. Nutzerinnen und Nutzer sind jedoch dafür verantwortlich, die Verwendung und/oder die Umsetzung der mit den Inhalten bereitgestellten Informationen sorgfältig im Einzelfall zu prüfen.

Produktbeschreibung: Open Source vLLM ist eine Open-Source-Bibliothek für schnelle und effiziente Inferenz von Large Language Models (LLMs).

Angriff: Ein entfernter, authentisierter Angreifer kann eine Schwachstelle in vllm ausnutzen, um einen Denial of Service Angriff durchzuführen.

Betroffene Betriebssysteme: - Linux - UNIX

CVE-2026-44223

Affected products

Known affected 1 product

Product	Identifier	Version	Remediation
Open Source vllm <0.20.0 Open Source / vllm		<0.20.0

References

3 references

URL	Category
https://wid.cert-bund.de/.well-known/csaf/white/2…	self
https://wid.cert-bund.de/portal/wid/securityadvis…	self
https://github.com/vllm-project/vllm/security/adv…	external

JSON

To clipboard

{
  "document": {
    "aggregate_severity": {
      "text": "mittel"
    },
    "category": "csaf_base",
    "csaf_version": "2.0",
    "distribution": {
      "tlp": {
        "label": "WHITE",
        "url": "https://www.first.org/tlp/"
      }
    },
    "lang": "de-DE",
    "notes": [
      {
        "category": "legal_disclaimer",
        "text": "Das BSI ist als Anbieter f\u00fcr die eigenen, zur Nutzung bereitgestellten Inhalte nach den allgemeinen Gesetzen verantwortlich. Nutzerinnen und Nutzer sind jedoch daf\u00fcr verantwortlich, die Verwendung und/oder die Umsetzung der mit den Inhalten bereitgestellten Informationen sorgf\u00e4ltig im Einzelfall zu pr\u00fcfen."
      },
      {
        "category": "description",
        "text": "Open Source vLLM ist eine Open-Source-Bibliothek f\u00fcr schnelle und effiziente Inferenz von Large Language Models (LLMs).",
        "title": "Produktbeschreibung"
      },
      {
        "category": "summary",
        "text": "Ein entfernter, authentisierter Angreifer kann eine Schwachstelle in vllm ausnutzen, um einen Denial of Service Angriff durchzuf\u00fchren.",
        "title": "Angriff"
      },
      {
        "category": "general",
        "text": "- Linux\n- UNIX",
        "title": "Betroffene Betriebssysteme"
      }
    ],
    "publisher": {
      "category": "other",
      "contact_details": "csaf-provider@cert-bund.de",
      "name": "Bundesamt f\u00fcr Sicherheit in der Informationstechnik",
      "namespace": "https://www.bsi.bund.de"
    },
    "references": [
      {
        "category": "self",
        "summary": "WID-SEC-W-2026-1299 - CSAF Version",
        "url": "https://wid.cert-bund.de/.well-known/csaf/white/2026/wid-sec-w-2026-1299.json"
      },
      {
        "category": "self",
        "summary": "WID-SEC-2026-1299 - Portal Version",
        "url": "https://wid.cert-bund.de/portal/wid/securityadvisory?name=WID-SEC-2026-1299"
      },
      {
        "category": "external",
        "summary": "GitHub Security Advisory GHSA-83vm-p52w-f9pw vom 2026-04-28",
        "url": "https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw"
      }
    ],
    "source_lang": "en-US",
    "title": "vllm: Schwachstelle erm\u00f6glicht Denial of Service",
    "tracking": {
      "current_release_date": "2026-05-12T22:00:00.000+00:00",
      "generator": {
        "date": "2026-05-13T06:21:20.185+00:00",
        "engine": {
          "name": "BSI-WID",
          "version": "1.5.0"
        }
      },
      "id": "WID-SEC-W-2026-1299",
      "initial_release_date": "2026-04-28T22:00:00.000+00:00",
      "revision_history": [
        {
          "date": "2026-04-28T22:00:00.000+00:00",
          "number": "1",
          "summary": "Initiale Fassung"
        },
        {
          "date": "2026-05-12T22:00:00.000+00:00",
          "number": "2",
          "summary": "CVE erg\u00e4nzt"
        }
      ],
      "status": "final",
      "version": "2"
    }
  },
  "product_tree": {
    "branches": [
      {
        "branches": [
          {
            "branches": [
              {
                "category": "product_version_range",
                "name": "\u003c0.20.0",
                "product": {
                  "name": "Open Source vllm \u003c0.20.0",
                  "product_id": "T053386"
                }
              },
              {
                "category": "product_version",
                "name": "0.20.0",
                "product": {
                  "name": "Open Source vllm 0.20.0",
                  "product_id": "T053386-fixed",
                  "product_identification_helper": {
                    "cpe": "cpe:/a:vllm:vllm:0.20.0"
                  }
                }
              }
            ],
            "category": "product_name",
            "name": "vllm"
          }
        ],
        "category": "vendor",
        "name": "Open Source"
      }
    ]
  },
  "vulnerabilities": [
    {
      "cve": "CVE-2026-44223",
      "product_status": {
        "known_affected": [
          "T053386"
        ]
      },
      "release_date": "2026-04-28T22:00:00.000+00:00",
      "title": "CVE-2026-44223"
    }
  ]
}

Sightings

Author	Source	Type	Date	Other

Nomenclature

Seen: The vulnerability was mentioned, discussed, or observed by the user.
Confirmed: The vulnerability has been validated from an analyst's perspective.
Published Proof of Concept: A public proof of concept is available for this vulnerability.
Exploited: The vulnerability was observed as exploited by the user who reported the sighting.
Patched: The vulnerability was observed as successfully patched by the user who reported the sighting.
Not exploited: The vulnerability was not observed as exploited by the user who reported the sighting.
Not confirmed: The user expressed doubt about the validity of the vulnerability.
Not patched: The vulnerability was not observed as successfully patched by the user who reported the sighting.

Detection rules are retrieved from Rulezet.

The MITRE ATT&CK techniques below are AI-generated suggestions, inferred from the description of the vulnerability by the CIRCL/vulnerability-attack-technique-classification-roberta-base model, served locally by ML-Gateway. They have not been verified by an analyst and are provided for guidance only.

Action not permitted

CVE-2026-44223 (GCVE-0-2026-44223)

FKIE_CVE-2026-44223

GHSA-83VM-P52W-F9PW

Summary

Details

Impact

Patches

Workarounds

PYSEC-2026-145

WID-SEC-W-2026-1299

Tags

Sightings

Nomenclature