fkie_cve-2024-40918
Vulnerability from fkie_nvd
Published
2024-07-12 13:15
Modified
2024-11-21 09:31
Severity ?
Summary
In the Linux kernel, the following vulnerability has been resolved:
parisc: Try to fix random segmentation faults in package builds
PA-RISC systems with PA8800 and PA8900 processors have had problems
with random segmentation faults for many years. Systems with earlier
processors are much more stable.
Systems with PA8800 and PA8900 processors have a large L2 cache which
needs per page flushing for decent performance when a large range is
flushed. The combined cache in these systems is also more sensitive to
non-equivalent aliases than the caches in earlier systems.
The majority of random segmentation faults that I have looked at
appear to be memory corruption in memory allocated using mmap and
malloc.
My first attempt at fixing the random faults didn't work. On
reviewing the cache code, I realized that there were two issues
which the existing code didn't handle correctly. Both relate
to cache move-in. Another issue is that the present bit in PTEs
is racy.
1) PA-RISC caches have a mind of their own and they can speculatively
load data and instructions for a page as long as there is a entry in
the TLB for the page which allows move-in. TLBs are local to each
CPU. Thus, the TLB entry for a page must be purged before flushing
the page. This is particularly important on SMP systems.
In some of the flush routines, the flush routine would be called
and then the TLB entry would be purged. This was because the flush
routine needed the TLB entry to do the flush.
2) My initial approach to trying the fix the random faults was to
try and use flush_cache_page_if_present for all flush operations.
This actually made things worse and led to a couple of hardware
lockups. It finally dawned on me that some lines weren't being
flushed because the pte check code was racy. This resulted in
random inequivalent mappings to physical pages.
The __flush_cache_page tmpalias flush sets up its own TLB entry
and it doesn't need the existing TLB entry. As long as we can find
the pte pointer for the vm page, we can get the pfn and physical
address of the page. We can also purge the TLB entry for the page
before doing the flush. Further, __flush_cache_page uses a special
TLB entry that inhibits cache move-in.
When switching page mappings, we need to ensure that lines are
removed from the cache. It is not sufficient to just flush the
lines to memory as they may come back.
This made it clear that we needed to implement all the required
flush operations using tmpalias routines. This includes flushes
for user and kernel pages.
After modifying the code to use tmpalias flushes, it became clear
that the random segmentation faults were not fully resolved. The
frequency of faults was worse on systems with a 64 MB L2 (PA8900)
and systems with more CPUs (rp4440).
The warning that I added to flush_cache_page_if_present to detect
pages that couldn't be flushed triggered frequently on some systems.
Helge and I looked at the pages that couldn't be flushed and found
that the PTE was either cleared or for a swap page. Ignoring pages
that were swapped out seemed okay but pages with cleared PTEs seemed
problematic.
I looked at routines related to pte_clear and noticed ptep_clear_flush.
The default implementation just flushes the TLB entry. However, it was
obvious that on parisc we need to flush the cache page as well. If
we don't flush the cache page, stale lines will be left in the cache
and cause random corruption. Once a PTE is cleared, there is no way
to find the physical address associated with the PTE and flush the
associated page at a later time.
I implemented an updated change with a parisc specific version of
ptep_clear_flush. It fixed the random data corruption on Helge's rp4440
and rp3440, as well as on my c8000.
At this point, I realized that I could restore the code where we only
flush in flush_cache_page_if_present if the page has been accessed.
However, for this, we also need to flush the cache when the accessed
bit is cleared in
---truncated---
References
Impacted products
Vendor | Product | Version |
---|
{ cveTags: [], descriptions: [ { lang: "en", value: "In the Linux kernel, the following vulnerability has been resolved:\n\nparisc: Try to fix random segmentation faults in package builds\n\nPA-RISC systems with PA8800 and PA8900 processors have had problems\nwith random segmentation faults for many years. Systems with earlier\nprocessors are much more stable.\n\nSystems with PA8800 and PA8900 processors have a large L2 cache which\nneeds per page flushing for decent performance when a large range is\nflushed. The combined cache in these systems is also more sensitive to\nnon-equivalent aliases than the caches in earlier systems.\n\nThe majority of random segmentation faults that I have looked at\nappear to be memory corruption in memory allocated using mmap and\nmalloc.\n\nMy first attempt at fixing the random faults didn't work. On\nreviewing the cache code, I realized that there were two issues\nwhich the existing code didn't handle correctly. Both relate\nto cache move-in. Another issue is that the present bit in PTEs\nis racy.\n\n1) PA-RISC caches have a mind of their own and they can speculatively\nload data and instructions for a page as long as there is a entry in\nthe TLB for the page which allows move-in. TLBs are local to each\nCPU. Thus, the TLB entry for a page must be purged before flushing\nthe page. This is particularly important on SMP systems.\n\nIn some of the flush routines, the flush routine would be called\nand then the TLB entry would be purged. This was because the flush\nroutine needed the TLB entry to do the flush.\n\n2) My initial approach to trying the fix the random faults was to\ntry and use flush_cache_page_if_present for all flush operations.\nThis actually made things worse and led to a couple of hardware\nlockups. It finally dawned on me that some lines weren't being\nflushed because the pte check code was racy. This resulted in\nrandom inequivalent mappings to physical pages.\n\nThe __flush_cache_page tmpalias flush sets up its own TLB entry\nand it doesn't need the existing TLB entry. As long as we can find\nthe pte pointer for the vm page, we can get the pfn and physical\naddress of the page. We can also purge the TLB entry for the page\nbefore doing the flush. Further, __flush_cache_page uses a special\nTLB entry that inhibits cache move-in.\n\nWhen switching page mappings, we need to ensure that lines are\nremoved from the cache. It is not sufficient to just flush the\nlines to memory as they may come back.\n\nThis made it clear that we needed to implement all the required\nflush operations using tmpalias routines. This includes flushes\nfor user and kernel pages.\n\nAfter modifying the code to use tmpalias flushes, it became clear\nthat the random segmentation faults were not fully resolved. The\nfrequency of faults was worse on systems with a 64 MB L2 (PA8900)\nand systems with more CPUs (rp4440).\n\nThe warning that I added to flush_cache_page_if_present to detect\npages that couldn't be flushed triggered frequently on some systems.\n\nHelge and I looked at the pages that couldn't be flushed and found\nthat the PTE was either cleared or for a swap page. Ignoring pages\nthat were swapped out seemed okay but pages with cleared PTEs seemed\nproblematic.\n\nI looked at routines related to pte_clear and noticed ptep_clear_flush.\nThe default implementation just flushes the TLB entry. However, it was\nobvious that on parisc we need to flush the cache page as well. If\nwe don't flush the cache page, stale lines will be left in the cache\nand cause random corruption. Once a PTE is cleared, there is no way\nto find the physical address associated with the PTE and flush the\nassociated page at a later time.\n\nI implemented an updated change with a parisc specific version of\nptep_clear_flush. It fixed the random data corruption on Helge's rp4440\nand rp3440, as well as on my c8000.\n\nAt this point, I realized that I could restore the code where we only\nflush in flush_cache_page_if_present if the page has been accessed.\nHowever, for this, we also need to flush the cache when the accessed\nbit is cleared in\n---truncated---", }, { lang: "es", value: "En el kernel de Linux, se ha resuelto la siguiente vulnerabilidad: parisc: intenta corregir fallas de segmentación aleatoria en compilaciones de paquetes. Los sistemas PA-RISC con procesadores PA8800 y PA8900 han tenido problemas con fallas de segmentación aleatoria durante muchos años. Los sistemas con procesadores anteriores son mucho más estables. Los sistemas con procesadores PA8800 y PA8900 tienen una gran caché L2 que necesita un vaciado por página para un rendimiento decente cuando se vacía un rango grande. La caché combinada en estos sistemas también es más sensible a alias no equivalentes que las cachés de sistemas anteriores. La mayoría de los fallos de segmentación aleatoria que he observado parecen ser daños en la memoria asignada mediante mmap y malloc. Mi primer intento de solucionar los fallos aleatorios no funcionó. Al revisar el código de caché, me di cuenta de que había dos problemas que el código existente no manejaba correctamente. Ambos se relacionan con la entrada de caché. Otro problema es que la actualidad en las PTE es picante. 1) Los cachés PA-RISC tienen mente propia y pueden cargar especulativamente datos e instrucciones para una página siempre que haya una entrada en el TLB para la página que permita la entrada. Los TLB son locales para cada CPU. Por lo tanto, la entrada TLB de una página debe eliminarse antes de eliminarla. Esto es particularmente importante en los sistemas SMP. En algunas de las rutinas de vaciado, se llamaría a la rutina de vaciado y luego se purgaría la entrada TLB. Esto se debía a que la rutina de lavado necesitaba la entrada TLB para realizar el lavado. 2) Mi enfoque inicial para intentar solucionar las fallas aleatorias fue intentar usar Flush_cache_page_if_present para todas las operaciones de descarga. En realidad, esto empeoró las cosas y provocó un par de bloqueos de hardware. Finalmente me di cuenta de que algunas líneas no se estaban borrando porque el código de verificación del pte era picante. Esto resultó en asignaciones aleatorias no equivalentes a páginas físicas. La descarga tmpalias __flush_cache_page configura su propia entrada TLB y no necesita la entrada TLB existente. Siempre que podamos encontrar el puntero pte de la página vm, podemos obtener el pfn y la dirección física de la página. También podemos purgar la entrada TLB de la página antes de realizar el vaciado. Además, __flush_cache_page utiliza una entrada TLB especial que inhibe la entrada de caché. Al cambiar las asignaciones de páginas, debemos asegurarnos de que las líneas se eliminen del caché. No es suficiente simplemente borrar las líneas de la memoria, ya que pueden volver. Esto dejó en claro que necesitábamos implementar todas las operaciones de vaciado necesarias utilizando rutinas tmpalias. Esto incluye vaciados para páginas de usuario y de kernel. Después de modificar el código para usar tmpalias vaciados, quedó claro que los errores de segmentación aleatoria no se resolvieron por completo. La frecuencia de fallas fue peor en sistemas con 64 MB L2 (PA8900) y sistemas con más CPU (rp4440). La advertencia que agregué a Flush_cache_page_if_present para detectar páginas que no se podían vaciar se activaba con frecuencia en algunos sistemas. Helge y yo miramos las páginas que no se podían vaciar y descubrimos que la PTE estaba vacía o era una página de intercambio. Ignorar las páginas que se intercambiaron parecía estar bien, pero las páginas con PTE borradas parecían problemáticas. Miré rutinas relacionadas con pte_clear y noté ptep_clear_flush. La implementación predeterminada simplemente vacía la entrada TLB. Sin embargo, era obvio que en París también necesitamos vaciar la página de caché. Si no limpiamos la página de caché, quedarán líneas obsoletas en la caché y provocarán daños aleatorios. Una vez que se borra una PTE, no hay forma de encontrar la dirección física asociada con la PTE y borrar la página asociada más adelante. ---truncado---", }, ], id: "CVE-2024-40918", lastModified: "2024-11-21T09:31:51.417", metrics: {}, published: "2024-07-12T13:15:14.863", references: [ { source: "416baaa9-dc9f-4396-8d5f-8c081fb06d67", url: "https://git.kernel.org/stable/c/5bf196f1936bf93df31112fbdfb78c03537c07b0", }, { source: "416baaa9-dc9f-4396-8d5f-8c081fb06d67", url: "https://git.kernel.org/stable/c/72d95924ee35c8cd16ef52f912483ee938a34d49", }, { source: "416baaa9-dc9f-4396-8d5f-8c081fb06d67", url: "https://git.kernel.org/stable/c/d66f2607d89f760cdffed88b22f309c895a2af20", }, { source: "af854a3a-2127-422b-91ae-364da2661108", url: "https://git.kernel.org/stable/c/5bf196f1936bf93df31112fbdfb78c03537c07b0", }, { source: "af854a3a-2127-422b-91ae-364da2661108", url: "https://git.kernel.org/stable/c/72d95924ee35c8cd16ef52f912483ee938a34d49", }, { source: "af854a3a-2127-422b-91ae-364da2661108", url: "https://git.kernel.org/stable/c/d66f2607d89f760cdffed88b22f309c895a2af20", }, ], sourceIdentifier: "416baaa9-dc9f-4396-8d5f-8c081fb06d67", vulnStatus: "Awaiting Analysis", }
Log in or create an account to share your comment.
Security Advisory comment format.
This schema specifies the format of a comment related to a security advisory.
Title of the comment
Description of the comment
Loading…
Loading…
Loading…
Sightings
Author | Source | Type | Date |
---|
Nomenclature
- Seen: The vulnerability was mentioned, discussed, or seen somewhere by the user.
- Confirmed: The vulnerability is confirmed from an analyst perspective.
- Exploited: This vulnerability was exploited and seen by the user reporting the sighting.
- Patched: This vulnerability was successfully patched by the user reporting the sighting.
- Not exploited: This vulnerability was not exploited or seen by the user reporting the sighting.
- Not confirmed: The user expresses doubt about the veracity of the vulnerability.
- Not patched: This vulnerability was not successfully patched by the user reporting the sighting.