Groq, a rapidly growing startup that previously hired ten executives from Google for developing chipset architecture, has announced its new architecture named Tensor Streaming Processor that can perform 1 Peta operations per second on a single chip.
Tensor Streaming Processor (TSP) is the world’s first architecture to achieve this feat of performing 1 Peta or 1 quadrillion operations or 1e15 ops/s. Groq’s new architecture can also perform up to 250 trillion floating-point operations per second (FLOPS).
TSP architecture is inspired by a software-first mindset and offers an innovative approach towards accelerated computation.
The white paper describing the architecture reads: “In Groq’s architecture, the compiler choreographs the operation of the hardware. All execution planning happens in software, freeing up valuable silicon space for additional processing capabilities.”
This tight control offered by the architecture allows fast and predictable performance on current and future workloads.
The company boasts that its new architecture aims to achieve both massive parallelism and flexibility without the limitations and bottlenecks faced by traditional CPU and GPU architectures.
TSP architecture is supported by both traditional and new machine learning models. It can be deployed on both x86 and non-x86 systems.
According to Dennis Abts, Chief Architect at Groq, “Its (TSP) performance, coupled with its simplicity, makes it an ideal platform for any high-performance, data- or compute-intensive workload.”
We hope that we’ll see TSP in action soon in chipsets in machines around us.