CVE-2026-44223 (GCVE-0-2026-44223)
Vulnerability from cvelistv5 – Published: 2026-05-12 19:58 – Updated: 2026-06-22 21:49
VLAI
Title
vLLM: extract_hidden_states speculative decoding crashes server on any request with penalty parameters
Summary
vLLM is an inference and serving engine for large language models (LLMs). From 0.18.0 to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.
Severity
6.5 (Medium)
SSVC
Exploitation: poc
Automatable: no
Technical Impact: partial
CISA Coordinator (v2.0.3)
Assigner
References
2 references
| URL | Tags |
|---|---|
| https://github.com/vllm-project/vllm/security/adv… | x_refsource_CONFIRM |
| https://github.com/vllm-project/vllm/pull/38610 | x_refsource_MISC |
Impacted products
1 product
| Vendor | Product | Version | |
|---|---|---|---|
| vllm-project | vllm |
Affected:
>= 0.18.0, < 0.20.0
|
{
"containers": {
"adp": [
{
"metrics": [
{
"other": {
"content": {
"id": "CVE-2026-44223",
"options": [
{
"Exploitation": "poc"
},
{
"Automatable": "no"
},
{
"Technical Impact": "partial"
}
],
"role": "CISA Coordinator",
"timestamp": "2026-05-15T14:44:05.012494Z",
"version": "2.0.3"
},
"type": "ssvc"
}
}
],
"providerMetadata": {
"dateUpdated": "2026-05-15T14:46:25.695Z",
"orgId": "134c704f-9b21-4f2e-91b3-4a467353bcc0",
"shortName": "CISA-ADP"
},
"references": [
{
"tags": [
"exploit"
],
"url": "https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw"
},
{
"tags": [
"exploit"
],
"url": "https://github.com/vllm-project/vllm/pull/38610"
}
],
"title": "CISA ADP Vulnrichment"
}
],
"cna": {
"affected": [
{
"product": "vllm",
"vendor": "vllm-project",
"versions": [
{
"status": "affected",
"version": "\u003e= 0.18.0, \u003c 0.20.0"
}
]
}
],
"descriptions": [
{
"lang": "en",
"value": "vLLM is an inference and serving engine for large language models (LLMs). From 0.18.0 to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., \"repetition_penalty\": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0."
}
],
"metrics": [
{
"cvssV3_1": {
"attackComplexity": "LOW",
"attackVector": "NETWORK",
"availabilityImpact": "HIGH",
"baseScore": 6.5,
"baseSeverity": "MEDIUM",
"confidentialityImpact": "NONE",
"integrityImpact": "NONE",
"privilegesRequired": "LOW",
"scope": "UNCHANGED",
"userInteraction": "NONE",
"vectorString": "CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H",
"version": "3.1"
}
}
],
"problemTypes": [
{
"descriptions": [
{
"cweId": "CWE-131",
"description": "CWE-131: Incorrect Calculation of Buffer Size",
"lang": "en",
"type": "CWE"
}
]
},
{
"descriptions": [
{
"cweId": "CWE-704",
"description": "CWE-704: Incorrect Type Conversion or Cast",
"lang": "en",
"type": "CWE"
}
]
}
],
"providerMetadata": {
"dateUpdated": "2026-06-22T21:49:24.277Z",
"orgId": "a0819718-46f1-4df5-94e2-005712e83aaa",
"shortName": "GitHub_M"
},
"references": [
{
"name": "https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw",
"tags": [
"x_refsource_CONFIRM"
],
"url": "https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw"
},
{
"name": "https://github.com/vllm-project/vllm/pull/38610",
"tags": [
"x_refsource_MISC"
],
"url": "https://github.com/vllm-project/vllm/pull/38610"
}
],
"source": {
"advisory": "GHSA-83vm-p52w-f9pw",
"discovery": "UNKNOWN"
},
"title": "vLLM: extract_hidden_states speculative decoding crashes server on any request with penalty parameters"
}
},
"cveMetadata": {
"assignerOrgId": "a0819718-46f1-4df5-94e2-005712e83aaa",
"assignerShortName": "GitHub_M",
"cveId": "CVE-2026-44223",
"datePublished": "2026-05-12T19:58:40.862Z",
"dateReserved": "2026-05-05T15:42:40.518Z",
"dateUpdated": "2026-06-22T21:49:24.277Z",
"state": "PUBLISHED"
},
"dataType": "CVE_RECORD",
"dataVersion": "5.2",
"vulnerability-lookup:meta": {
"epss": {
"cve": "CVE-2026-44223",
"date": "2026-06-26",
"epss": "0.00367",
"percentile": "0.28508"
},
"nvd": "{\"cve\":{\"id\":\"CVE-2026-44223\",\"sourceIdentifier\":\"security-advisories@github.com\",\"published\":\"2026-05-12T20:16:43.293\",\"lastModified\":\"2026-05-15T15:16:52.560\",\"vulnStatus\":\"Modified\",\"cveTags\":[],\"descriptions\":[{\"lang\":\"en\",\"value\":\"vLLM is an inference and serving engine for large language models (LLMs). From to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., \\\"repetition_penalty\\\": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.\"}],\"metrics\":{\"cvssMetricV31\":[{\"source\":\"security-advisories@github.com\",\"type\":\"Secondary\",\"cvssData\":{\"version\":\"3.1\",\"vectorString\":\"CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H\",\"baseScore\":6.5,\"baseSeverity\":\"MEDIUM\",\"attackVector\":\"NETWORK\",\"attackComplexity\":\"LOW\",\"privilegesRequired\":\"LOW\",\"userInteraction\":\"NONE\",\"scope\":\"UNCHANGED\",\"confidentialityImpact\":\"NONE\",\"integrityImpact\":\"NONE\",\"availabilityImpact\":\"HIGH\"},\"exploitabilityScore\":2.8,\"impactScore\":3.6}]},\"weaknesses\":[{\"source\":\"security-advisories@github.com\",\"type\":\"Secondary\",\"description\":[{\"lang\":\"en\",\"value\":\"CWE-131\"},{\"lang\":\"en\",\"value\":\"CWE-704\"}]}],\"configurations\":[{\"nodes\":[{\"operator\":\"OR\",\"negate\":false,\"cpeMatch\":[{\"vulnerable\":true,\"criteria\":\"cpe:2.3:a:vllm:vllm:*:*:*:*:*:*:*:*\",\"versionStartIncluding\":\"0.18.0\",\"versionEndExcluding\":\"0.20.0\",\"matchCriteriaId\":\"443F32C2-B323-4FA0-AB97-F6445C91C89E\"}]}]}],\"references\":[{\"url\":\"https://github.com/vllm-project/vllm/pull/38610\",\"source\":\"security-advisories@github.com\",\"tags\":[\"Issue Tracking\",\"Patch\"]},{\"url\":\"https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw\",\"source\":\"security-advisories@github.com\",\"tags\":[\"Mitigation\",\"Vendor Advisory\"]},{\"url\":\"https://github.com/vllm-project/vllm/pull/38610\",\"source\":\"134c704f-9b21-4f2e-91b3-4a467353bcc0\",\"tags\":[\"Issue Tracking\",\"Patch\"]},{\"url\":\"https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw\",\"source\":\"134c704f-9b21-4f2e-91b3-4a467353bcc0\",\"tags\":[\"Mitigation\",\"Vendor Advisory\"]}]}}",
"vulnrichment": {
"containers": "{\"adp\": [{\"title\": \"CISA ADP Vulnrichment\", \"metrics\": [{\"other\": {\"type\": \"ssvc\", \"content\": {\"id\": \"CVE-2026-44223\", \"role\": \"CISA Coordinator\", \"options\": [{\"Exploitation\": \"poc\"}, {\"Automatable\": \"no\"}, {\"Technical Impact\": \"partial\"}], \"version\": \"2.0.3\", \"timestamp\": \"2026-05-15T14:44:05.012494Z\"}}}], \"references\": [{\"url\": \"https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw\", \"tags\": [\"exploit\"]}, {\"url\": \"https://github.com/vllm-project/vllm/pull/38610\", \"tags\": [\"exploit\"]}], \"providerMetadata\": {\"orgId\": \"134c704f-9b21-4f2e-91b3-4a467353bcc0\", \"shortName\": \"CISA-ADP\", \"dateUpdated\": \"2026-05-15T14:43:40.735Z\"}}], \"cna\": {\"title\": \"vLLM: extract_hidden_states speculative decoding crashes server on any request with penalty parameters\", \"source\": {\"advisory\": \"GHSA-83vm-p52w-f9pw\", \"discovery\": \"UNKNOWN\"}, \"metrics\": [{\"cvssV3_1\": {\"scope\": \"UNCHANGED\", \"version\": \"3.1\", \"baseScore\": 6.5, \"attackVector\": \"NETWORK\", \"baseSeverity\": \"MEDIUM\", \"vectorString\": \"CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H\", \"integrityImpact\": \"NONE\", \"userInteraction\": \"NONE\", \"attackComplexity\": \"LOW\", \"availabilityImpact\": \"HIGH\", \"privilegesRequired\": \"LOW\", \"confidentialityImpact\": \"NONE\"}}], \"affected\": [{\"vendor\": \"vllm-project\", \"product\": \"vllm\", \"versions\": [{\"status\": \"affected\", \"version\": \"\u003e= 0.18.0, \u003c 0.20.0\"}]}], \"references\": [{\"url\": \"https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw\", \"name\": \"https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw\", \"tags\": [\"x_refsource_CONFIRM\"]}, {\"url\": \"https://github.com/vllm-project/vllm/pull/38610\", \"name\": \"https://github.com/vllm-project/vllm/pull/38610\", \"tags\": [\"x_refsource_MISC\"]}], \"descriptions\": [{\"lang\": \"en\", \"value\": \"vLLM is an inference and serving engine for large language models (LLMs). From 0.18.0 to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., \\\"repetition_penalty\\\": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.\"}], \"problemTypes\": [{\"descriptions\": [{\"lang\": \"en\", \"type\": \"CWE\", \"cweId\": \"CWE-131\", \"description\": \"CWE-131: Incorrect Calculation of Buffer Size\"}]}, {\"descriptions\": [{\"lang\": \"en\", \"type\": \"CWE\", \"cweId\": \"CWE-704\", \"description\": \"CWE-704: Incorrect Type Conversion or Cast\"}]}], \"providerMetadata\": {\"orgId\": \"a0819718-46f1-4df5-94e2-005712e83aaa\", \"shortName\": \"GitHub_M\", \"dateUpdated\": \"2026-06-22T21:49:24.277Z\"}}}",
"cveMetadata": "{\"cveId\": \"CVE-2026-44223\", \"state\": \"PUBLISHED\", \"dateUpdated\": \"2026-06-22T21:49:24.277Z\", \"dateReserved\": \"2026-05-05T15:42:40.518Z\", \"assignerOrgId\": \"a0819718-46f1-4df5-94e2-005712e83aaa\", \"datePublished\": \"2026-05-12T19:58:40.862Z\", \"assignerShortName\": \"GitHub_M\"}",
"dataType": "CVE_RECORD",
"dataVersion": "5.2"
}
}
}
Loading…
Loading…
Experimental. This forecast is provided for visualization only and may change without notice. Do not use it for operational decisions.
Forecast uses a logistic model when the trend is rising, or an exponential decay model when the trend is falling. Fitted via linearized least squares.
Sightings
| Author | Source | Type | Date | Other |
|---|
Nomenclature
- Seen: The vulnerability was mentioned, discussed, or observed by the user.
- Confirmed: The vulnerability has been validated from an analyst's perspective.
- Published Proof of Concept: A public proof of concept is available for this vulnerability.
- Exploited: The vulnerability was observed as exploited by the user who reported the sighting.
- Patched: The vulnerability was observed as successfully patched by the user who reported the sighting.
- Not exploited: The vulnerability was not observed as exploited by the user who reported the sighting.
- Not confirmed: The user expressed doubt about the validity of the vulnerability.
- Not patched: The vulnerability was not observed as successfully patched by the user who reported the sighting.
Loading…
Loading…