The original software-based compare-and-swap method (CAS) was shown to be inefficient due to high intra-warp thread contention, whereas the improved software-based warp-aggregated method (WAG) and Kahan summation method (KAS) eliminated the thread contention and performed very well on Kepler and Maxwell GPUs, being more than 13 times faster than CAS in our tests. This paper discusses several atomic-add tally methods with reduced numerical errors used throughout ARCHER development. However, the complexity lies in the fact that some GPUs (Nvidia GPUs prior to the Pascal generation all current AMD GPUs) do not readily offer such double-precision function at hardware level, and that software emulation is too slow to use if not optimized properly. To more » mitigate this problem, the least intrusive solution in theory is to replace the single-precision atomic-add tally function with a double-precision version. It has been known that calculation using single-precision is more prone to numerical round-off errors, especially when a single tally data is accumulated 'atomically' and repeatedly by thousands of GPU threads. The majority of these studies adopted single-precision floating point format because of the higher peak floating point operations per second (FLOPS) the GPUs can deliver than double-precision. GPU implementation of Monte Carlo radiation transport for dose calculations has been reported by many investigators. Over the past several years, the graphics processing unit (GPU) technology has rapidly gained ground in scientific computing due to its outstanding performance and programmability. Mathematical, Physical and Engineering Sciences Additional Journal Information: Journal Volume: 476 Journal Issue: 2243 Journal ID: ISSN 1364-5021 Publisher: The Royal Society Publishing Country of Publication: United States Language: English Subject: 97 MATHEMATICS AND COMPUTING half precision arithmetic mixed precision solvers LU factorization iterative refinement GMRES GPU = , (ORNL), Oak Ridge, TN (United States) Sponsoring Org.: USDOE Office of Science (SC) OSTI Identifier: 1787013 Grant/Contract Number: EP/P020720/1 Departtment of Energy 17-SC-20-SC NVIDIA Resource Type: Journal Article: Accepted Manuscript Journal Name: Proceedings of the Royal Society. Publication Date: Wed Nov 25 00:00: Research Org.: Oak Ridge National Lab. Computer Science and Mathematics Division Univ. of Electrical Engineering and Computer Science Oak Ridge National Lab. of Tennessee, Knoxville, TN (United States). of Electrical Engineering and Computer Science
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |