Server and high performance
It’s a challenge: but it’s also vitally important. Optimizing the performance of these systems translates directly into dollars. In e-commerce, it’s well-known that every microsecond of user latency reduces ad clicks and the probability that the user will complete a transaction.
And the sheer size of HPC systems means that scaling and performance optimization has a significant impact on capital cost: the classic example is the non-fatal bug that caused 25% of Google’s entire disk fleet to run significantly more slowly than expected for three years. Fixing that behavior – which was subtle, and perhaps not even sufficiently serious to be classed as a bug – improved the performance of 25% of the company’s servers – literally millions of compute nodes.
There is also the question of “long tail performance”. In scale-out environments, a single “outlier” event is statistically likely to affect many routes through the system. The longest response time will gate the entire response: if one process in a hundred is impacted, most tasks at the system level will end up with a 99%-ile worst-case performance.
UltraSoC’s embedded analytics IP has been used to solve these problems since our earliest days. Our first publicly-announced customer, PMC-Sierra, used our monitoring and analytics architecture to implement in-field monitoring within its disk drive controllers. Embedding non-intrusive, wire-speed smart monitors within the chip allows collection of high-granularity data on the real-world behavior, not just of the chip, but also of the wider system. Conditional capture and pre-processing of the data means that only relevant information need be collected.
This approach provides makes it substantially easier to home in on performance issues than is possible with traditional solutions such as sampling profilers or application- and system-level instrumentation: and has the added benefit of being entirely non-intrusive. The hardware-based approach can detect hard-to-identify issues that impact performance – for example affinity management policies, contention and cache coherence.
For SoC manufacturers, it provides a major point of differentiation – products that allow customers and even end users to refine and optimize the performance of the server infrastructure into which the chips are built, while they are running.
Server optimization presentation at Linley Spring Processor Conference
Embedded monitoring and analytics hardware within an SoC allows collection of high-granularity data on the real-world behavior of both the chip and the wider system. This presentation outlines the use of such hardware-based infrastructures to support performance optimization in high-performance computing environments.
The hardware-based approach can detect hard-to-identify issues such as contention and cache coherence, and makes it substantially easier to home in on non-fatal bugs than is possible with traditional solutions such as sampling profilers or application- and system-level instrumentation.
Please click on above link or visit our Resource area to download a copy of the presentation.