ghsa-g9c5-q45g-cc74
Vulnerability from github
In the Linux kernel, the following vulnerability has been resolved:
device-dax: correct pgoff align in dax_set_mapping()
pgoff should be aligned using ALIGN_DOWN() instead of ALIGN(). Otherwise, vmf->address not aligned to fault_size will be aligned to the next alignment, that can result in memory failure getting the wrong address.
It's a subtle situation that only can be observed in page_mapped_in_vma() after the page is page fault handled by dev_dax_huge_fault. Generally, there is little chance to perform page_mapped_in_vma in dev-dax's page unless in specific error injection to the dax device to trigger an MCE - memory-failure. In that case, page_mapped_in_vma() will be triggered to determine which task is accessing the failure address and kill that task in the end.
We used self-developed dax device (which is 2M aligned mapping) , to perform error injection to random address. It turned out that error injected to non-2M-aligned address was causing endless MCE until panic. Because page_mapped_in_vma() kept resulting wrong address and the task accessing the failure address was never killed properly:
[ 3783.719419] Memory failure: 0x200c9742: recovery action for dax page: Recovered [ 3784.049006] mce: Uncorrected hardware memory error in user-access at 200c9742380 [ 3784.049190] Memory failure: 0x200c9742: recovery action for dax page: Recovered [ 3784.448042] mce: Uncorrected hardware memory error in user-access at 200c9742380 [ 3784.448186] Memory failure: 0x200c9742: recovery action for dax page: Recovered [ 3784.792026] mce: Uncorrected hardware memory error in user-access at 200c9742380 [ 3784.792179] Memory failure: 0x200c9742: recovery action for dax page: Recovered [ 3785.162502] mce: Uncorrected hardware memory error in user-access at 200c9742380 [ 3785.162633] Memory failure: 0x200c9742: recovery action for dax page: Recovered [ 3785.461116] mce: Uncorrected hardware memory error in user-access at 200c9742380 [ 3785.461247] Memory failure: 0x200c9742: recovery action for dax page: Recovered [ 3785.764730] mce: Uncorrected hardware memory error in user-access at 200c9742380 [ 3785.764859] Memory failure: 0x200c9742: recovery action for dax page: Recovered [ 3786.042128] mce: Uncorrected hardware memory error in user-access at 200c9742380 [ 3786.042259] Memory failure: 0x200c9742: recovery action for dax page: Recovered [ 3786.464293] mce: Uncorrected hardware memory error in user-access at 200c9742380 [ 3786.464423] Memory failure: 0x200c9742: recovery action for dax page: Recovered [ 3786.818090] mce: Uncorrected hardware memory error in user-access at 200c9742380 [ 3786.818217] Memory failure: 0x200c9742: recovery action for dax page: Recovered [ 3787.085297] mce: Uncorrected hardware memory error in user-access at 200c9742380 [ 3787.085424] Memory failure: 0x200c9742: recovery action for dax page: Recovered
It took us several weeks to pinpoint this problem, but we eventually used bpftrace to trace the page fault and mce address and successfully identified the issue.
Joao added:
; Likely we never reproduce in production because we always pin : device-dax regions in the region align they provide (Qemu does : similarly with prealloc in hugetlb/file backed memory). I think this : bug requires that we touch unpinned device-dax regions unaligned to : the device-dax selected alignment (page size i.e. 4K/2M/1G)
{ "affected": [], "aliases": [ "CVE-2024-50022" ], "database_specific": { "cwe_ids": [], "github_reviewed": false, "github_reviewed_at": null, "nvd_published_at": "2024-10-21T20:15:15Z", "severity": "MODERATE" }, "details": "In the Linux kernel, the following vulnerability has been resolved:\n\ndevice-dax: correct pgoff align in dax_set_mapping()\n\npgoff should be aligned using ALIGN_DOWN() instead of ALIGN(). Otherwise,\nvmf-\u003eaddress not aligned to fault_size will be aligned to the next\nalignment, that can result in memory failure getting the wrong address.\n\nIt\u0027s a subtle situation that only can be observed in\npage_mapped_in_vma() after the page is page fault handled by\ndev_dax_huge_fault. Generally, there is little chance to perform\npage_mapped_in_vma in dev-dax\u0027s page unless in specific error injection\nto the dax device to trigger an MCE - memory-failure. In that case,\npage_mapped_in_vma() will be triggered to determine which task is\naccessing the failure address and kill that task in the end.\n\n\nWe used self-developed dax device (which is 2M aligned mapping) , to\nperform error injection to random address. It turned out that error\ninjected to non-2M-aligned address was causing endless MCE until panic.\nBecause page_mapped_in_vma() kept resulting wrong address and the task\naccessing the failure address was never killed properly:\n\n\n[ 3783.719419] Memory failure: 0x200c9742: recovery action for dax page: \nRecovered\n[ 3784.049006] mce: Uncorrected hardware memory error in user-access at \n200c9742380\n[ 3784.049190] Memory failure: 0x200c9742: recovery action for dax page: \nRecovered\n[ 3784.448042] mce: Uncorrected hardware memory error in user-access at \n200c9742380\n[ 3784.448186] Memory failure: 0x200c9742: recovery action for dax page: \nRecovered\n[ 3784.792026] mce: Uncorrected hardware memory error in user-access at \n200c9742380\n[ 3784.792179] Memory failure: 0x200c9742: recovery action for dax page: \nRecovered\n[ 3785.162502] mce: Uncorrected hardware memory error in user-access at \n200c9742380\n[ 3785.162633] Memory failure: 0x200c9742: recovery action for dax page: \nRecovered\n[ 3785.461116] mce: Uncorrected hardware memory error in user-access at \n200c9742380\n[ 3785.461247] Memory failure: 0x200c9742: recovery action for dax page: \nRecovered\n[ 3785.764730] mce: Uncorrected hardware memory error in user-access at \n200c9742380\n[ 3785.764859] Memory failure: 0x200c9742: recovery action for dax page: \nRecovered\n[ 3786.042128] mce: Uncorrected hardware memory error in user-access at \n200c9742380\n[ 3786.042259] Memory failure: 0x200c9742: recovery action for dax page: \nRecovered\n[ 3786.464293] mce: Uncorrected hardware memory error in user-access at \n200c9742380\n[ 3786.464423] Memory failure: 0x200c9742: recovery action for dax page: \nRecovered\n[ 3786.818090] mce: Uncorrected hardware memory error in user-access at \n200c9742380\n[ 3786.818217] Memory failure: 0x200c9742: recovery action for dax page: \nRecovered\n[ 3787.085297] mce: Uncorrected hardware memory error in user-access at \n200c9742380\n[ 3787.085424] Memory failure: 0x200c9742: recovery action for dax page: \nRecovered\n\nIt took us several weeks to pinpoint this problem,\u00a0 but we eventually\nused bpftrace to trace the page fault and mce address and successfully\nidentified the issue.\n\n\nJoao added:\n\n; Likely we never reproduce in production because we always pin\n: device-dax regions in the region align they provide (Qemu does\n: similarly with prealloc in hugetlb/file backed memory). I think this\n: bug requires that we touch *unpinned* device-dax regions unaligned to\n: the device-dax selected alignment (page size i.e. 4K/2M/1G)", "id": "GHSA-g9c5-q45g-cc74", "modified": "2024-10-25T15:31:26Z", "published": "2024-10-21T21:30:53Z", "references": [ { "type": "ADVISORY", "url": "https://nvd.nist.gov/vuln/detail/CVE-2024-50022" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/7fcbd9785d4c17ea533c42f20a9083a83f301fa6" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/9c4198dfdca818c5ce19c764d90eabd156bbc6da" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/b822007e8db341d6f175c645ed79866db501ad86" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/e877427d218159ac29c9326100920d24330c9ee6" } ], "schema_version": "1.4.0", "severity": [ { "score": "CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H", "type": "CVSS_V3" } ] }
Sightings
Author | Source | Type | Date |
---|
Nomenclature
- Seen: The vulnerability was mentioned, discussed, or seen somewhere by the user.
- Confirmed: The vulnerability is confirmed from an analyst perspective.
- Exploited: This vulnerability was exploited and seen by the user reporting the sighting.
- Patched: This vulnerability was successfully patched by the user reporting the sighting.
- Not exploited: This vulnerability was not exploited or seen by the user reporting the sighting.
- Not confirmed: The user expresses doubt about the veracity of the vulnerability.
- Not patched: This vulnerability was not successfully patched by the user reporting the sighting.