As the demand for artificial intelligence (AI) computing continues to rise, traditional electronic processors are increasingly constrained by speed and energy consumption. In particular, the bottlenecks of data movement and parallel processing limit the scalability of conventional hardware in handling high-throughput visual tasks. Photonics, with its inherent parallelism and low energy consumption, is emerging as a compelling alternative for post-Moore computing.
Over the past decade, silicon photonics has pioneered chip-scale optical computing, mainly based on two-dimensional integrated waveguides and programmable interferometer networks. These platforms have successfully demonstrated on-chip matrix–vector multiplications and are compatible with CMOS fabrication. However, the two-dimensional physical layout inherently restricts both device count and throughput.
By contrast, diffractive neural networks (DNNs) offer massive parallelism, ultralow latency, and scalability thanks to their three-dimensional free-space physical architecture. Yet, conventional free-space systems suffer from bulky footprints, lack chip-scale integration, and operate at relatively low computational frequencies, limiting their practical deployment.
In a new paper published in eLight , a research team from the University of Shanghai for Science and Technology, led by Professor Min Gu, reports a breakthrough that bridges this divide: the first vertically integrated photonic chip—named Gezhi—that realizes three-dimensional free-space optical computing in a compact, chip-scale form factor. This architecture vertically integrates an addressable VCSEL array, a mutually incoherent diffractive neural networks (MI-DNNs) chip, and detectors into a hand-held system.
Unlike traditional DNNs that rely on coherent light sources, the VCSEL array generates mutually incoherent light fields. In this framework, MI-DNNs exploit the individually coherent property of each VCSEL for element-wise multiplication, while leveraging mutual incoherence for addition—fundamentally redefining the computational paradigm of optical DNNs. This hybrid approach not only preserves the advantages of coherent-light-based DNNs but also achieves higher diffraction efficiency (up to 26.02%) and enhanced robustness through direct operations with spatially incoherent light.
The Gezhi chip demonstrated inference of 1,000 images in just 40 microseconds, corresponding to 25 million frames per second. On benchmark datasets such as MNIST, it achieved classification accuracy as high as 98.6%. Remarkably, it sustained computation at ultra-low light levels, consuming only 3.52 aJ/μm^2 of optical energy per frame—significantly outperforming state-of-the-art electronic accelerators in energy efficiency.
Beyond classification, the chip also functions as a versatile image processing kernel capable of high-resolution tasks such as edge extraction and denoising (Fig. 1b). Its planar input configuration further enables the parallel operation of multiple kernels.
“This vertically integrated photonic chip offers a new path for three-dimensional optical computing, distinct from silicon-based planar platforms, and holds enormous potential for large-scale expansion and high-performance applications,” the authors noted. “Leveraging the planar nature of VCSEL arrays, ultra-large-scale light source arrays with tens of thousands of units are expected to be realized, supporting even larger distributed optical computing systems.”
They added: “The current speed of 25 million frames per second is still far below the physical limits of the chip. With optimized drive circuits, processing rates of hundreds of millions of frames per second are within reach, meeting the massive data demands of the AI era. Its convolution computing capabilities also provide a foundation for broader applications across AI models. Looking ahead, this technology is expected to play a transformative role in areas such as autonomous driving, smart healthcare, machine vision, and the acceleration of large language models.”
eLight
High-throughput optical neuromorphic graphic processing at millions of images per second