Ensuring SoC reliability via on-chip analytics and monitoring

You may have read the news a couple of weeks ago that we’re working with Moortec Semiconductor on next-generation intelligent process, voltage and temperature (PVT) sensor systems. We’ve combined UltraSoC’s digital monitoring and optimization capabilities with Moortec’s leading PVT products to enable real improvements in SoC performance and reliability.

Intelligent on-chip PVT monitoring

Moortec is a leading supplier of physical IP for PVT sensors that are embedded into chips to sense analog values and report them back to the main system: for example to trigger an over-temperature alert, or to help tune voltage in dynamic voltage and frequency scaling (DVFS) schemes. They have in-chip monitoring blocks for most of the commonly-used process technologies in the industry.

What UltraSoC brings to the partnership is our message-based on-chip connectivity infrastructure, and a range of “smart” blocks that make it much easier to make practical use of the information gathered by the PVT sensor.

Long-term SoC reliability monitoring

This has some very interesting applications, particularly in long-term SoC reliability monitoring. For example, we can create a system that builds a picture of long-term temperature stress: so that even if the on-chip temperature never exceeds an acceptable threshold, cumulative effects can be accounted for. That’s important because managing the long-term thermal picture is key to heading off harmful effects such as electromigration.

It’s also important to remember that this monitoring takes place on a per-chip basis. So we can build smart on-chip monitoring and analytics systems that account for process variability in the fab and the exact use case that each device experiences in the field: which are amongst the top careabouts for SoC designers today.

These SoC reliability issues are a particularly big concern in applications like automotive, where standards like ISO26262 for functional safety have become an integral part of the development cycle.

Although the current ISO 26262 standard, released in 2011, doesn’t include a section on semiconductors, the next release – Edition 2, which is expected to be published by 2018 – will do so in some considerable detail.

SoC load balancing and power management

Another example of the application of this cumulative on-chip temperature monitoring capability is in keeping track of activity across the various CPUs within a multi-core SoC. Again, because we are working on a per-chip basis, the data we gather becomes actionable information in a very real sense. We can use it to load-balance between processors on-chip, extending product lifetimes and ensuring continuing SoC performance.

There are also important implications for energy and power management: intelligently gathering information about a chip’s electrical performance in real-time makes it possible to substantially improve efficiency. The obvious market in which this power management capability comes to the fore is in portable devices, and in particular mobile phones: improved battery life is a key selling point for customers. But the need to minimize power consumption is equally important in larger-scale deployments like server farms, where including cooling functionality can substantially increase equipment cost, and power is one of the biggest elements in on-going operating cost.

These practical applications illustrate a wider point about SoC design – particularly at advanced process nodes. The days are over when it was possible to assume that SoCs are ideal digital devices. Today’s devices are complex quasi-analog systems. Performance can vary greatly from chip-to-chip. So, more than ever, SoC architects and designers need to understand the real-world behavior of their devices, and our partnership with Moortec allows them to do just that.