科研与技术前沿

What Is a Camera Sensor's "Process Node," Really: A Layered Guide for Photographers

Photography forums throw around a confusing set of terms when discussing camera sensors. Someone says a sensor is “still stuck on 28nm.” Another says a new model “has been upgraded to 14nm.” Flagships talk about being “stacked,” “three-layer stacked,” or “partially stacked.” The numbers look like a size competition: smaller nodes sound more advanced, more layers sound stronger. But reading more carefully, the smartphone-SoC way of thinking about process nodes doesn’t carry over to image sensors: Sony’s Alpha 1 uses a 90nm process for its light-capturing layer and 40nm only for its readout logic; a stacked sensor Samsung published at ISCAS 2020 uses 65nm for the pixel layer and 14nm for the logic layer. The Alpha 1 was a 2021 flagship, and the Samsung part was the most recent stacked CIS of its generation. Neither looks like a laggard.

The problem comes from a hidden assumption: people picture a camera sensor as one planar chip, so it should have one process-node number. In reality it’s two systems stacked together. One captures light, the other reads data. These two layers have never been chasing the same goal, and there is no reason they should share a “smaller node is better” yardstick.

After walking through the layers below, the terms on a spec sheet — BSI, stacked, three-layer stacked, 2-layer transistor pixel, partial stacked — should map cleanly onto which layer they describe, what problem they target, what the cost is, and which parts manufacturers choose not to disclose.

Layer 1: The layer that captures light, and the layer that reads data

Forget the process-node numbers for a moment. Picture two very different things happening inside one sensor.

The first is capturing light. Each pixel catches incoming photons, converts them into an electrical signal, and holds them briefly until they are read out. This is where photographers’ concerns live: how much light the pixel can catch at a given exposure, whether highlights clip, how much noise shows up in low light.

The second is reading data. After the shutter fires, tens of millions of pixels must be read out, digitized, and packaged within tens of milliseconds. This is a data-throughput job, closer in nature to the high-speed circuitry inside a CPU or SoC. Burst rates, video frame rates, and rolling-shutter distortion all hinge on how fast this part is.

The physics governing these two jobs is fundamentally different. The light-capturing layer is bound by optics: the wavelength of light is fixed, and color filters, microlenses, and inter-pixel isolation don’t scale down with process nodes. Once pixels approach the wavelength of visible light, diffraction and crosstalk become visible. Stanford’s EE392B lecture notes put it clearly: CIS scaling does not follow the standard “equal-scaling” rules of logic chips. Semiconductor Engineering’s review documents the same thing: front-side illuminated pixels hit a wall around 1.4μm, and the industry had to change structures (backside illumination or BSI, backside deep trench isolation or DTI, new microlens stacks) to keep shrinking.

The readout layer is a circuit-density race: a more advanced node lets you pack more channels into the same area, run faster timing, and use less power, which translates directly into shorter readout times and wider bandwidth. The “smaller is more advanced” story from the smartphone SoC world applies here.

So a sensor is really two systems stacked on top of each other: one catches photons (a hard physics problem), one moves data (a circuit-density problem). They have always been optimizing for different things, and there is no reason to force them onto a shared “smaller node is better” ruler. This is what stacking enables: the two layers no longer share one piece of silicon, and each can pick the process best suited to its own job.

Every experience a photographer notices behind the lens maps onto one of these two layers. Burst and video depend on whether the readout layer is fast enough. Highlight recovery depends on how many electrons the light-capturing layer can store. Low-light clarity depends on inter-pixel crosstalk in the light-capturing layer. Whether a camera can sustain extreme frame rates depends on the bandwidth channel between the two layers, sometimes requiring an extra buffer layer. With this division in mind, every term that follows is just an answer to “which layer, solving what.”

Layer 2: What a process-node number actually means inside a sensor

With the two-layer division in place, “process node” splits into two distinct meanings, one for each layer.

On the readout layer, it works just like it does for an SoC: smaller nodes are more advanced. A smaller node means more channels per unit area, faster timing, and lower power — all of which translate into shorter readout times and wider bandwidth. Samsung published a very direct comparison: after upgrading the readout layer from 28nm to 14nm, the same sensor’s total power dropped 29%, with 12 megapixels running at 120fps (ISCAS 2020 paper). Sony’s Alpha 1 uses 40nm for its readout layer (ISSCC 2021 presentation), which is the level required to support its high-speed interface and 50-megapixel column-parallel AD conversion. In essence, the readout node determines the sensor’s readout-bandwidth budget.

On the light-capturing layer, “process node” means something completely different. It is not chasing density but photon-capture quality. A newer node is not necessarily better here: newer processes tend to come with higher-impurity silicon and more complex thermal histories, both of which hurt photosensitivity. Thicker gate oxides, cleaner silicon, and lower thermal budgets instead reduce dark current and white-pixel rates, and leave more room for the photodiode to build deeper potential wells for storing electrons. That is why TSMC’s CIS foundry page still lists 65nm and 40nm as its two dedicated CIS platforms today, and Tower Semiconductor still offers 110nm and 65nm. Nodes that look “ancient” by logic-chip standards make more sense on the pixel side. OmniVision’s first-gen BSI used 110nm, OmniBSI-2 moved to 65nm design rules, and Sony’s Alpha 1 uses 90nm for the pixel layer — all in this same band.

What is actually pushing pixel performance forward is not node numbers but a handful of purpose-built structural techniques. BSI lets light enter from the back of the silicon so metal wiring no longer blocks it. Backside DTI cuts crosstalk by about 50% relative to conventional structures at 1.12μm pixels (IEEE paper). Cu-Cu hybrid bonding lets the pixel and logic layers connect directly at fine pitch, replacing the older edge-routed TSV approach. Real progress on the pixel side over the past few years lives in terms like BSI, DTI, microlens stacks, and hybrid bonding — not in node numbers.

Layer 3: Where pixel size and process node stop aligning

Following the structure above, a natural question follows: how small can a pixel get, and what does that have to do with the process node?

Public data shows a few typical pixel pitches: 4.16μm on the Alpha 1, 1.4μm on Samsung’s 65/14nm stacked CIS, and 1.12μm or smaller on modern smartphone main cameras (IEEE paper). These three tiers differ by nearly an order of magnitude in pitch, yet the process nodes behind them are quite close: the pixel layers all sit in the 65–110nm band. How small a pixel can go has less to do with “nodes shrinking so pixels shrink with them” and more to do with a handful of structural innovations that push back the physical limits one step at a time.

Pixel size affects three things in practice. First, the total number of photons one pixel can collect, which is the physical ceiling on SNR. At the same exposure, a larger pixel has a higher full-well capacity and a higher dynamic-range ceiling. Second, the ratio of routing and light-blocking area. The smaller the pixel, the larger the fraction of area consumed by metal wiring and readout transistors, and the less is available for the photodiode. BSI moves metal to the back and relieves this somewhat, but does not eliminate it. Third, inter-pixel crosstalk. Smaller pixels suffer more from both optical and electrical crosstalk, which is what DTI exists to fight.

The process node plays a supporting role in all three. A more advanced pixel platform offers narrower transistors and tighter lithographic alignment, so at the same pixel pitch it can free up more photodiode area or support finer DTI. But the node itself cannot magically improve photon capture. The credit for shrinking pixels into the 1.4μm and 1.12μm tiers belongs mostly to BSI, DTI, hybrid bonding, and microlens redesign. Semiconductor Engineering’s review puts this bluntly: CIS scaling is governed by photoelectric physics, and the node number is only one of the tools.

For photographers, a more intuitive corollary follows: within the same generation of stacked technology, the advantage of a larger sensor over a smaller one comes mostly from physics (each pixel catches more photons), not from the process node. The reverse is also true: a smaller sensor, even paired with a more advanced logic layer, still has its low-light ceiling set by the physics of its pixel layer.

Layer 4: Unpacking the “stacked” family of terms

Layering as a concept became a real chip in 2013, when Sony started mass-producing the first generation of stacked CIS. Over the decade since, stacking has grown into several variants. Placing them on a single axis makes the picture easier to read.

Two-layer stacked: pixel array on top, logic circuitry below, the two wafers joined by Cu-Cu hybrid bonding. This is the structure behind most mid-to-high-end camera sensors today. Sony’s Alpha 1 uses this approach: a 90nm pixel wafer and a 40nm logic wafer connected by Cu-Cu (ISSCC 2021 materials). Relative to the earlier non-stacked structures, the core benefit is direct: the pixel layer no longer has to give up area to the logic circuitry, and the logic layer can independently adopt a more advanced node and run faster.

Three-layer stacked (pixel + DRAM + logic): Sony disclosed this structure at ISSCC and IEDM in 2017 and turned it into the 3-Layer Stacked CIS with DRAM product (Sony official announcement, Semiconductor History Museum of Japan archive, the latter preserving citations to ISSCC 2017 4.6 and IEDM 2017 3.2). The trick is inserting a dedicated DRAM wafer between the pixel and logic layers. The problem being solved is concrete: ultra-high-speed bursts and super-slow-motion video generate huge transient data volumes that the readout channel cannot carry at full speed, so an entire frame is dumped into on-chip DRAM first and then read out more slowly. At a 1/2.3-inch, 20-megapixel spec, this structure delivers super-slow-motion and rolling-shutter suppression. The key insight for three-layer stacked is this: the added layer is a different kind of device (DRAM). Its value comes from functional specialization, and has nothing to do with the node advancement of the other layers.

2-Layer Transistor Pixel: Sony disclosed this variant at the end of 2021 (Sony Semiconductor technology catalog). Ordinary stacked structures separate the pixel layer from the logic layer, but inside the pixel layer, the photodiode and pixel transistors still share the same substrate. This new generation splits those two onto separate substrates: photodiodes on one layer, pixel transistors on another, connected by stacking. According to Sony’s European press release, full-well capacity roughly doubles over the conventional structure, widening dynamic range. Pixel transistors in their own layer can be made larger, improving low-light noise. The point of this structure is that the ability to store electrons is no longer limited by the area pixel transistors consume, so pixels at the same pitch can store more electrons. It is another attempt at answering “how do you keep shrinking pixels,” and the answer is still to add a layer, not change a node.

Partial stacked: this term entered public view in 2024 with the Nikon Z6 III. The Z8 / Z9 are fully stacked; the Z6 III is described as “partially stacked.” The industry lacks a unified definition. According to Digital Camera World’s report, Nikon said only that it “enables significantly faster readout than the Z6 II, but not as fast as the fully stacked Z8 / Z9,” and did not disclose the actual chip structure. Two external sources offer indirect inferences: Luminous Landscape gives the Z6 III’s scan speed as about 1/60 s versus 1/250 s for the fully stacked Z8; Photography Life infers from flash-sync speeds that the Z6 III’s whole-frame readout sits around 12.5–16 ms, while the Z6 II is about 50 ms. Combining these clues, the most reasonable inference is that “partial stacked” means placing the high-speed readout circuits only along parts of the pixel array’s edges (say, the top and bottom strips) rather than all the way around. This is an inference, not confirmed by Nikon. And note: “partial stacked” in different manufacturers’ and different media’s mouths may mean different concrete implementations, so read spec sheets with caution when this term appears.

Laid out on one axis, the four terms become clear: stacking means splitting a sensor physically into multiple layers, with each layer free to pick its own process and optimize independently. Two-layer stacked separates “light capture” from “readout.” Three-layer stacked adds a dedicated data-buffer layer. 2-Layer Transistor Pixel splits the light-capture layer internally. Partial stacked is an incomplete split, with stacking only in the regions that need the speed boost. More layers does not mean stronger; each form corresponds to a specific product goal and tradeoff.

Layer 4 addendum: Why smartphone main cameras use Quad Bayer and N×N pixel binning

Stacking takes the “vertically split one sensor into multiple layers” route. Smartphone main cameras, over the past few years, have gone down a completely complementary route: horizontally, clustering 2×2, 3×3, or even 4×4 adjacent pixels into a group that shares a single color filter. This is what Samsung calls Tetracell / Tetrapixel, what Sony calls Quad Bayer Coding, what marketing calls “four-in-one pixels,” and what the academic literature uniformly calls Quad Bayer. It does not replace stacking or 2-Layer Transistor Pixel — the two approaches frequently coexist on the same smartphone sensor.

To see why it exists, return to the tradeoff in Layer 3: large pitch means good low-light and low resolution; small pitch means high resolution and worse low-light. Smartphone sensors are stuck between 1/2 inch and 1 inch. The product requirements want both a 50-megapixel daytime resolution (for digital zoom, text capture, social content) and 1.4μm-equivalent low-light performance. One piece of silicon cannot physically deliver both. So manufacturers chose a compromise: build 50 million 1.0μm physical pixels, but let every 2×2 cluster of adjacent pixels share one color filter. In bright light, output a 50-megapixel image. In low light, merge the charges of four adjacent same-color pixels before readout, producing an effective 12.5-megapixel, 2.0μm-pitch large-pixel sensor.

The key word is “before readout,” not “averaging after the fact in software.” A CMOS readout chain carries two kinds of noise with fundamentally different properties: photon shot noise (governed by the statistics of light itself, proportional to the square root of signal) and read noise (contributed once by the ADC and amplifier chain each time they operate, independent of signal level). Software downscaling happens after the ADC — four pixels have each completed their own readout and each carry one share of read noise as digital values, and averaging them afterward can only reduce the noise standard deviation by √4. Hardware binning happens before the ADC — the four pixels’ charges are physically summed at a floating-diffusion node, and the combined charge goes through only one ADC conversion. Only one share of read noise is produced, not four independent noise contributions. A bit of arithmetic shows that, relative to software averaging, hardware binning further reduces read-noise variance by a factor of 4 and standard deviation by a factor of 2 — roughly an additional half stop of dynamic range.

This half-stop benefit only appears in low light. When there is plenty of light, the signal is much larger than the read noise and shot noise dominates; software downscaling and hardware binning become essentially equivalent. Only in the low-light regime, where the signal is small enough that read noise is no longer negligible, does hardware merging produce a real gain. So what Quad Bayer breaks through is not shot noise — that is a physical limit set by photon statistics and no one can move it — but the read noise of the electronics chain itself, by using the engineering trick of “making four pixels share the cost of one ADC conversion.”

This is confirmed in independent sources. A darktable forum discussion on Quad Bayer demosaic support (discuss.pixls.us) notes:

This analog binning provides better SNR and about half a stop of additional dynamic range when small sensors struggle in lower light conditions.

The “additional” here refers to the extra gain relative to software averaging, not relative to single-pixel readout.

Academia treats this motivation as the starting point. A 2023 paper on Quad Bayer joint remosaicing and denoising (arXiv:2303.13571) opens with:

Pixel binning based Quad sensors have emerged as a promising solution to overcome the hardware limitations of compact cameras in low-light imaging. However, binning results in lower spatial resolution and non-Bayer CFA artifacts.

The second half of that sentence matters just as much. Quad Bayer is not a free lunch, and its costs concentrate in two places. First, color resolution drops. Four pixels sharing one color filter only contribute one piece of color information, so the native color resolution is effectively only 1/4 of the nominal pixel count. Outputting a 50-megapixel daytime image requires the ISP to remosaic the four-in-one layout back into a standard Bayer arrangement — a lossy guessing operation that tends to produce false colors and artifacts on high-frequency detail and diagonal edges. Academic work on joint demosaicing and denoising for Quad Bayer keeps producing new papers (AAAI 2024 DRNet is one example), evidence that the problem is not fully solved. Sony built the remosaic algorithm directly into the sensor chip (Sony Semiconductor Quad Bayer Coding page), essentially an engineering admission that “software remosaic alone isn’t stable enough.”

Second, the merged result is still inferior to a native large pixel. Combining four 1.0μm physical pixels into one 2.0μm-equivalent pixel does approach native 2.0μm low-light SNR, but several dimensions fall short: DTI and metal wiring consume a larger fraction of the total area across four small pixels than in a single large pixel, so the effective photon-capturing area is smaller; the post-binning full-well capacity is the sum of four small pixels, but each small pixel’s potential-well depth is unchanged, so under strong light each small pixel saturates first and the overall dynamic range does not reach that of a native 2.0μm pixel.

So Quad Bayer can be understood this way: it does not challenge the physical limits of the light-capturing layer — each 1.0μm pixel still suffers the same diffraction, crosstalk, and routing losses. What it does is accept that small pixels simply cannot match large-pixel low-light performance, and then add a hardware-level switch that lets the same silicon choose between “read as small pixels” or “read as merged large pixels” at runtime, removing the constraint that a sensor must correspond to exactly one pitch.

This explains why smartphone main cameras lean on it so heavily while camera sensors barely use it. Smartphones have tight area budgets and must use one camera to cover both daytime resolution and nighttime low-light needs; Quad Bayer provides the best tradeoff at the cost of some color resolution loss. Camera sensors are much larger — the Alpha 1’s 4.16μm is already in native large-pixel territory, so Quad Bayer’s marginal benefit is low and the color-resolution cost is not worth paying, so cameras retain the traditional Bayer layout. The pixel-side paths of the two product categories diverge from here: smartphones go toward “extremely small pixels + N×N binning + complex ISP algorithms,” while cameras go toward “large pixels + stacking + 2-Layer Transistor Pixel + fast readout.”

Placed back into the article’s layered coordinate system: stacking splits the sensor vertically into multiple layers, each using its most appropriate process; Quad Bayer groups adjacent pixels horizontally into switchable clusters, allowing the same silicon to present two different effective pitches at runtime. Both routes attack the same fundamental constraint — that no single number can serve all use cases — but along different axes. High-end smartphone sensors (Sony IMX989, Samsung HP3) typically use both.

Layer 5: What to look for on a spec sheet, and what can mislead you

Back to the starting point. When photographers next encounter marketing phrases like “this sensor is 28nm / 14nm” or “this is stacked / partial stacked,” this checklist helps sort them out.

Which layer is a node number describing? Without qualification, it usually refers to the logic layer (readout circuitry). The pixel layer’s node is typically “older” because it is chasing optical quality and low dark current, not density. Within a single product generation, the logic-layer node alone carries limited information: it tells you about readout-bandwidth potential and power, but nothing about pixel-layer image quality.

Is the sensor “stacked”? This is today’s dividing line for whether a mid-to-high-end camera can deliver fast electronic shutters, high-frame-rate video, and low rolling shutter. Whether you can rely on a full electronic shutter in daily use is the clearest product-level gap between stacked and non-stacked sensors. But stacked does not automatically mean better pixel image quality. What two-layer stacking gives the pixel layer is area and node freedom; what the pixel layer actually does with that freedom depends on the specific implementation.

What is the extra layer solving? When you see three-layer stacked or 2-Layer Transistor Pixel, don’t just count layers: ask what the extra layer does. Is it a data buffer (DRAM), or is it an internal rearrangement of the pixel (moving transistors to their own layer)? The former sets the bandwidth ceiling for high-speed scenarios; the latter shapes full-well capacity and dynamic range.

“Partial stacked” is a term to handle with care. It has no industry-wide definition, and different manufacturers use it differently. When you see it, published numbers (readout time, flash-sync speed, scan speed) are more informative than the word itself. Canon’s Europe explainer page is a typical example: it will confirm that a sensor uses a stacked or BSI structure, but usually does not publish the specific node or the specific number of stacked layers. Descriptive words deserve scrutiny: note which parts are public fact and which are left blank by the manufacturer.

One last point: a sensor’s ceiling on image quality and speed is determined systemically — by the physics of the pixel layer, the architecture of the readout layer, the bandwidth channel between them, and the fit between all three and the product positioning. 28nm, 14nm, 90nm are each one choice for one layer inside this system. Once this relationship is clear, the spec sheet stops being a string of numbers anyone can inflate, and turns into a layered diagram you can read on your own.


A note on what is confirmed, inferred, and undisclosed

Confirmed facts in this article include: Sony Alpha 1 using a 90nm pixel layer + 40nm logic layer, Cu-Cu bonding, 4.16μm pitch, 50.1Mp (ISSCC 2021 paper 7.6 presentation PDF); Samsung’s 65/14nm stacked CIS showing a 29% power reduction over 65/28nm, 1.4μm pitch, 2PD, 120fps (IEEE ISCAS 2020 abstract); OmniVision OmniBSI-2 publicly adopting 65nm design rules (OmniVision official technology page); OmniVision’s early BSI using 110nm CMOS (IISW 2009 paper); TSMC’s and Tower’s publicly offered CIS foundry nodes (TSMC / Tower); the roughly 2× full-well increase of Sony’s 2-Layer Transistor Pixel (Sony Europe press release, technology page); Sony’s 2017 3-Layer Stacked CIS with DRAM and its original-paper citations (Semiconductor History Museum of Japan archive, Sony official announcement); backside DTI suppressing crosstalk at 1.12μm by about 50% (R Discovery paper abstract).

Inferred parts include: partial stacked corresponding to “high-speed readout circuitry placed only along parts of the pixel array’s edges,” based on the gap between the Z6 III’s ~12.5–16ms readout time and the Z8 / Z9’s ~4ms readout tier, citing Photography Life and Luminous Landscape.

Undisclosed by the industry: the Z6 III’s specific pixel- and logic-layer nodes and precise stacking geometry; the process nodes of most Canon sensors; and the node combinations of most full-frame camera sensors other than the Alpha 1 — these manufacturers generally confirm the architectural direction (stacked / BSI) without publishing nodes.