Boyd, Eric L.Eric L.BoydAzeem, WaqarWaqarAzeemLee, Hsien-Hsin S.Hsien-Hsin S.LeeShih, Tien-PaoTien-PaoShihSHIH-HAO HUNG2020-05-042020-05-04199401903918https://scholars.lib.ntu.edu.tw/handle/123456789/489694https://www.scopus.com/inward/record.uri?eid=2-s2.0-84904330840&doi=10.1109%2fICPP.1994.30&partnerID=40&md5=256d04abc703d3c42e4222a16de29038We have developed a hierarchical performance bounding methodology that attempts to explain the performance of loop-dominated scientific applications on particular systems. The Kendall Square Research KSR1 is used as a running example. We model the throughput of key hardware units that arc common bottlenecks in concurrent machines. The four units currently used are: memory port, floating-point, instruction issue, and a loop-carried dependence pseudo-unit. We propose a workload characterization, and derive upper bounds on the performance of specific machine-workload pairs. Comparing delivered performance with bounds focuses attention on areas for improvement and indicates how much improvement might be attainable. We delineate a comprehensive approach to modeling and improving application performance on the KSR1. Application of this approach is being automated for the KSR1 with a series of tools including K-MA and K-MACSTAT (which enable the calculation of the MACS hierarchy of performance bounds), K-Trace (which allows parallel code to be instrumented to produce a memory reference trace), and K-Cache (which simulates inter-cache communications based on a memory reference trace). © 1994 IEEE.Hierarchical systems; Application performance; Hardware units; Hierarchical approach; Loop-carried dependence; Memory reference traces; Performance bounds; Scientific applications; Workload characterization; Cache memoryA Hierarchical Approach to Modeling and Improving the Performance of Scientific Applications on the KSR1.conference paper10.1109/ICPP.1994.302-s2.0-84904330840https://doi.org/10.1109/ICPP.1994.30