The reception for the RTX 4080 was confusing for consumers at announce. Even when talking with other tech junkies they asked “Hey, are you reviewing the good 4080 or the bad one?” which speaks volumes to the way consumers perceived the announcement for this card. Truthfully? None of that matters. In the end, perception won’t give you more frames or make your game any smoother – performance is the important thing, and that’s what we are here to measure.
Likely due to a combination of things including chip shortages and market conditions as they compete against the launch of cards from rivals Intel and AMD, NVIDIA flipped the script on their usual launch approach. Rather than launching with a base array of cards to give consumers multiple options for each price point, NVIDIA launched their flagship product by itself. They claimed massive uplifts in performance – double, triple, and in some cases four times the performance of the last generation. Normal uplifts are in the 30% range, so it was hard to believe. Turns out, when paired with their new DLSS 3 technology, you can achieve incredible framerates. This newest technology could generate additional frames, delivering 4K/120 in nearly every game we tested. Rasterized framerates also improved dramatically thanks to the generational improvements throughout the entire card. At 3 slots, it’s big, it drinks power like water, but there’s no arguing that it’s a wonderfully powerful card. A month later, with the “bad 4080” renamed properly to 4070 Ti, we have our hands on the “good” 4080. Joking aside, it’s time to check out the NVIDIA GeForce RTX 4080.
If you were imagining you’d be buying a 4080 because it’ll be smaller, I’ve got news for you – the RTX 4080 is precisely the same size as the RTX 4090. Like the 4090, the 4080 will take up three slots, and it’ll crowd that next PCIe slot a bit. You’ll also want to be mindful of airflow as crowding your new GPU is probably not a wise move, though inevitably somebody will figure out how to pair this with a mini-ITX just because they can.
There are a number of important bits of tech that have gotten a massive upgrade with the 4000 series of cards, so let’s go through a few of them.
Back in 2007, NVIDIA introduced a parallel processing core unit called the CUDA, or Compute Unified Device Architecture. In the simplest of terms, these CUDA cores can process and stream graphics. The data is pushed through SMs, or Streaming Multiprocessors, which feed the CUDA cores in parallel through the cache memory. As the CUDA cores receive this mountain of data, they individually grab and process each instruction and divide it up amongst themselves. The RTX 2080 Ti, as powerful as it was, had 4,352 cores. The RTX 3080 Ti ups the ante with a whopping 10240 CUDA cores — just 200 shy of the behemoth 3090. The NVIDIA GeForce RTX 4090 ships with a whopping 16384 cores, and the RTX 4080 has 9728 of them, but that doesn’t tell the whole story. We’ll circle back to this.
So…who is Ada Lovelace, and why do I keep hearing her name in reference to this card? Well, you may or may not know this already, but NVIDIA uses code names for their processors, naming them after famous scientists. Kepler, Turing, Tesla, Fermi, and Maxwell are just a few of them, with Ada Lovelace being the most current. These folks have delivered some of the biggest leaps in technology mankind has ever known, and NVIDIA recognizes their contributions. It’s a cool nod, and if that sends you down a scientific rabbit hole, then mission accomplished.
The 4000 series of cards brings with them the next generation of tech for GPUs. We’ll get to Tensor and RT cores, but at its simplest, this next gen core is able to deliver faster and increased performance in three main areas – streaming multiprocessing, AI performance, and ray tracing. In fact, NVIDIA is stating that it can deliver double the performance across all three. I salivate at these kinds of claims, as they are very much things we can test and quantify.
What is a Tensor Core?
Here’s another example of “but why do I need it?” within GPU architecture – the Tensor Core. This technology from NVIDIA had seen wider use in high performance supercomputing and data centers before finally arriving on consumer-focused cards with the latter and more powerful 20X0 series cards. Now, with the RTX 40-series, we have the fourth generation of these processors. For frame of reference, the 2080 Ti had 240 second-gen Tensor cores, the 3080 Ti provided 320 compared, with the 3090 shipped with 328. This new 4090 ships with 544 Tensor Cores, and the RTX 4080 has 304, but once again, this doesn’t tell you the whole story, and once again, I’m asking you to put a pin in that thought. So what do they do?
Put simply, Tensor cores are your deep learning / AI / Neural Net processors, and they are the power behind technologies like DLSS. The RTX 4090 brings with it DLSS 3, a generational leap over what we saw with the 3000 series cards, and is largely responsible for the massive framerate uplift claims that NVIDIA has made. We’ll be testing that to see how much of the improvements are a result of DLSS and how much is the raw power of the new hardware. This is important, as not every game supports DLSS, but that may be changing.
One of the things DLSS 1 and 2 suffered from was adoption. Studios would have to go out of their way to train the neural network to import images and make decisions on what to do with the next frame. The results were fantastic, with 2.0 bringing cleaner images that could actually be better than the original source image. Still, adoption at the game level would be needed. Some companies really embraced it, and we got beautiful visuals from games like Metro: Exodus, Shadow of the Tomb Raider, Control, Deathloop, Ghostwire: Tokyo, Dying Light 2: Stay Human, Far Cry 6, and Cyberpunk 2077. Say what you will about the last game – visuals weren’t the problem. Still, without engine-level adoption, the growth would be slow. With DLSS 3, that’s precisely what they did.
DLSS 3 is a completely exclusive feature of the 4000 series cards. Prior generations of cards will undoubtedly fall back to DLSS 2.0 as the advanced cores (namely the 4th-Gen Tensor Cores and the new Optical Flow Accelerator) that are contained exclusively on 4000-series cards are needed for this fresh installment in DLSS. While that may be a bummer to hear, there is light at the end of the tunnel – DLSS 3 is now supported at the engine level by both Epic’s Unreal Engine 4 and 5, as well as Unity, covering the vast majority of all games being released into the indefinite future. I don’t know what additional work has to be done by developers, but having it available at a deeper level should grease the skids. Here’s a quick list of what games support DLSS 3 already:
To date there are nearly 300 games that utilize DLSS 2.0, and 39 games already supporting DLSS 3. More importantly, they are natively supporting the Frostbite Engine (The Madden series, Battlefield series, Need for Speed Unbound, the Dead Space remake, etc.), Unity (Fall Guys, Among Us, Cuphead, Genshin Impact, Pokemon Go, Ori and the Will of the Wisps, Beat Saber), and Unreal Engine 4 and 5 (The next Witcher game, Warhammer 40,000: Darktide, Hogwarts Legacy, Loopmancer, Atomic Heart, and hundreds of other unannounced games). With native engine support, it’s very likely we’ll see a drastic increase in the number of titles that support the technology going forward.
If you are unfamiliar with DLSS, it stands for Deep Learning Super Sampling, and it’s accomplished by what the name breakdown suggests. AI-driven deep learning computers will take a frame from a game, analyze it, and supersample (that is to say, raise the resolution) while sharpening and accelerating it. DLSS 1.0 and 2.0 relied on a technique where a frame is analyzed and the next frame is then projected, and the whole process continues to leap back and forth like this the entire time you are playing. DLSS 3 no longer needs these frames, instead using the new Optical Multi Frame Generation for the creation of entirely new frames. This means it is no longer just adding more pixels, but instead reconstructing portions of the scene to do it faster and cleaner.
A peek under the hood of DLSS 3 shows that the AI behind the technology is actually reconstructing ¾ of the first frame, the entirety of the second frame, and then ¾ of the third, and the entirety of the fourth, and so on. Using these four frames, alongside data from the optical flow field from the Optical Flow Accelerator, allows DLSS 3 to predict the next frame based on where any objects in the scene are, as well as where they are going. This approach generates 7/8ths of a scene using only 1/8th of the pixels, and the predictive algorithm does so in a way that is almost undetectable to the human eye. Lights, shadows, particle effects, reflections, light bounce – all of it is calculated this way, resulting in the staggering improvements I’ll be showing you in the benchmarking section of this review.
It’s not all sunshine and rainbows with DLSS, though some of this is certainly early-adopter woes, at least according to a Eurogamer report on the subject. They observed some odd artifacting and ghosting in some scenes. I can’t dispute their findings, but it’s not something I’ve observed on the 4090 or 4080 in my own tests. Whether that’s a matter of game patches, updated drivers, or blind luck, I’ve not had those experiences. Your mileage may vary, but you’ll probably be busy enjoying having a stupidly high framerate and resolution to notice. If it gives you that much heartburn, however, we’ll be showing the rasterized non-DLSS framerate in our benchmarks as well. Moving on.
The boost clock is hardly a new concept, going all the way back to the GeForce GTX 600 series, but it’s a very necessary part of wringing every frame out of your card. Essentially, the base clock is the “stock” running speed of your card that you can expect at any given time no matter the circumstances. The boost clock, on the other hand, allows the speed to be adjusted dynamically by the GPU, pushing beyond this base clock if additional power is available for use. The RTX 3080 Ti had a boost clock of 1.66GHz, with a handful of 3rd party cards sporting overclocked speeds in the 1.8GHz range. The RTX 4090 ships with a boost clock of 2.52GHz, and the 4080 is not far behind it at 2.4GHz. Though you won’t need to, enthusiasts have already shown a great deal of headroom baked into the 4090 for overclocking, and I have no doubt that the 4080 will have similar room to run. That said, now’s a good time to talk about power.
The RTX 4090 is a hungry piece of tech, and you’ll need 100 more watts of power than the RTX 3090 to power it. The 4090 has a TDP of 450W, and NVIDIA is recommending a power supply of 850W. The RTX 4080 isn’t too far behind it with a recommendation of 800W for your PSU, delivering a TDP of 400.
If you are using a PCIe Gen5 power supply (A month after the 4090 launch and there are still very few of these available, and most from companies I’ve never heard of) you’ll have a dedicated adapter from your PSU, meaning you will only be using a single power lead. Fret not if you have a Gen 4 PSU, however, you can simply use the included adapter. A four tailed whip, you can connect three 6+2 PCIe cables if you just want to power the card, but if you connect a fourth, you’ll provide the sensing and control needed for overclocking. Provided the same headroom as its predecessor exists, there should be room for overclocking, but that’s beyond the scope of this review. Undoubtedly there will be numerous people out there who feel the need to push this behemoth beyond its natural limits. My suggestion to you, dear reader, is that you check out the benchmarks in this review first. I think you’ll find that overclocking isn’t something you’ll need for a very, very long time.
Since the launch of the 4090 a great many videos have been released about the resilience of the included adapter cable. Some tech experts have bent this thing to the nth degree, causing it to malfunction. I call that level of testing pointless as you’d achieve the same endstate if you abused a PCIe Gen4 PSU cable as well. Similarly, much has been said about the limitations of the cable when manufacturers pointed out that the cable is rated to be unplugged and plugged back in about 30 times. For press folks like myself, that’s not very many – I guarantee I’ve unplugged my 4090 more than that in the last 30 days as I test various M.2 drives, video cards, and other devices. You as a consumer likely will do this once – when you install the card. Even so, with all of my abuse I’ve got pristine pins on my adapter. It’s also worth noting that the exact same plug/unplug recommendations were made for the previous generation, and nobody cried then. I guess what I’m saying, now for the second time, is “moving on”.
The RTX 3080, 3080 Ti, 3090, and 3090 Ti all used GDDR6X, and that’s providing the most possible memory bandwidth thanks to its vastly-expanded memory pipeline. The GeForce RTX 3090 Ti sported 24GB of memory, with a 384-bits wide memory lane. It allows for more instructions to be sent through the pipeline than traditional GDDR6 you’d find in a 3070 Ti, which is 256-bits wide. The RTX 4090 uses that same pipeline width and the same 24GB of GDDR6X. The RTX 4080 has 16GB of GDDR6X, with the same 21.2GB/s throughput as its bigger brother. As we move into the other cards in the 4000 series, it’s very likely we’ll see this change as the pipeline narrows and the overall pool of memory shrinks, but these two cards are the top tier.
Shader Execution Reordering:
One of the new bits of tech that is exclusive to the DLSS 3 pipeline is Shader Execution Reordering. If you are running a 4000 series card, you’ll be able to process shaders more effectively as they can be re-ordered and sequenced. Right now, shader objects (these calculate light and shadow values, as well as color gradations) are processed in the order received, meaning you are doing a lot of tasks out of order from when they’ll be consumed by the engine. It works, but it’s hardly efficient. With Shader Execution Reordering, these can be re-organized into a sequence that delivers them with other similar workloads. This has a net effect of up to 25% improvement in framerates and up to 3X improvement in ray tracing operations – something you’ll see in our benchmarks later on in this review.
What is an RT Core?
Arguably one of the most misunderstood aspects of the RTX series is the RT core. This core is a dedicated pipeline to the streaming multiprocessor (SM) where light rays and triangle intersections are calculated. Put simply, it’s the math unit that makes realtime lighting work and look its best. Multiple SMs and RT cores work together to interleave the instructions, processing them concurrently, allowing the processing of a multitude of light sources that intersect with the objects in the environment in multiple ways, all at the same time. In practical terms, it means a team of graphical artists and designers don’t have to “hand-place” lighting and shadows, and then adjust the scene based on light intersection and intensity. With RTX, they can simply place the light source in the scene and let the card do the work. I’m oversimplifying it, for sure, but that’s the general idea.
The RT core is the engine behind your realtime lighting processing. When you hear about “shader processing power”, this is what they are talking about. Again, and for comparison, the RTX 3080 Ti was capable of 34.10 TFLOPS of light and shadow processing, with the 3090 putting in 35.58. The RTX 4090? 82.58 TFLOPS across 120 cores.
If you were looking for the biggest differentiator between the 4090 and the 4080, here it is. The RTX 4080 has 76 RT cores, delivering 48.74 TFLOPS of light and shadow processing power. Where the 4090 had double with an extra scoop, the 4080 is almost exactly double. Given the price difference between the 4090 and 4080 is 25%, it’s not too out of alignment with the MSRP.
The Turing architecture cards (the 20X0 series) were the first implementations of this dedicated RT core technology. The 2080 Ti had 72 RT cores, delivering 26.9 Teraflops of throughput, whereas the RTX 3080 Ti pushes this to 80 RT cores – just shy of the 82 cores on the RTX 3090. The RTX 4090 features 128 RT Cores, and the 4080 has 110, but like the Tensor and Cuda cores, these are next-generation cores. So let’s talk about what all of these new cores combined are delivering.
We are going to take a slightly different approach to our benchmarks this time around, and when you look at this first graph, you’ll understand why. This card is so intensely powerful that measuring it at anything less than 4K is simply a waste of time. It also extends the scale of the graph so ridiculously that there’s no point in comparing it with anything other than the highest end of the previous generation. This first set of games are a blend of DLSS 2.0 and simply rasterized titles. Below is the 4090 averages for reference:
Games like Wolfenstein, Rise of the Tomb Raider, and Metro: Exodus were some of the earliest adopters of Deep Learning Super Sampling, and we see the longer maturity window’s effect represented here. Since the scale is so mangled, let’s just point at the big spires before we get to a more readable graph.
I wanted to introduce a new class of modern titles that would push this new generation of cards as hard as possible, as well as blending both DLSS 3 and native rasterized 4K performance. When a title supports DLSS 3 it’s easy to see as the scale of the graph goes through the roof. To that end, we are bringing Flight Simulator 2022, Cyberpunk 2077, A Plague Tale: Requiem, F122, and the Unity demo of Enemies, as well as the Lyra Unreal Engine 5 demo product. We’ve also added Loopmancer and F.I.S.T.: Forged In Shadow Torch to this mix this time around, looping back for a 4090 for reference.
These were recorded with all settings at maximum, all RTX options enabled, Reflex Boost turned on, and at 4K resolution. Once again I found myself flabbergasted at the results. I ran these multiple times, and confirmed them with NVIDIA, as I simply could not believe the increases. Let’s put them head-to-head with DLSS 3 enabled:
Joking aside, we pulled in the RTX 3080 and 3080 Ti to showcase our previous benchmarking suite against the 4090 and 4080. These two flagship cards from the previous generation can at least run in the same race, even if they stand absolutely no chance whatsoever. These are the average numbers, but we’ll dig into the question of how much of an effect DLSS 3 has in a moment.
Now that you can see a bit more of the detail, one thing becomes insanely clear. If you were to combine the power of the RTX 3080 and the power of the RTX 3080 Ti, you’d still be a little short of what the RTX 4090 delivers. The 4080 is not far off that mark, delivering very comparable results, and without DLSS 3 support – just native 4K rendering. Once again, staggering.
If you bring up YouTube and look up DLSS, you’ll find entirely too many channels asking “Is DLSS worth it?” with red arrows and surprised or scrunched faces. Well, just take a second and look at this graph again. Why in the wide world of sports would you NOT enable DLSS 3? You could say you are leaving frames on the table, but honestly it’s worse than that — if you don’t enable DLSS 3, you are leaving literally hundreds of frames behind at zero additional cost. Just turn it on already!
I looked down at the 4080 in my case and realized the fans weren’t spinning. I guess the new heat pipe passthrough systems are working well as it had cycled down completely. When it does fire up for a game, I still hear the clicking of my mechanical storage drive and case fans over the sound of the 4080. I pulled out my audio meter, as some of you were curious where it sits, and both cards are sitting nicely at 41dB at load. For reference, that’s most commonly associated with “quiet library sounds”, according to IAC Acoustics. My case is roughly 2 feet away from where I sit, so I should be hearing this card spin up, but even with the case side off, I don’t.
Since this card is inevitably going to be compared against the 4090, price is a factor. We see a 10-15% hit to performance from the 4090 to the 4080 in exchange for a 25% reduction in price. (though some of that is apparently attributed to the garbage code in Windows 11 — we’ll be re-running these on Windows 10 in the near future) As such, the 4090 shipped for $1599, and the 4080 will hit shelves at $1199. This puts it in the ballpark of AMD’s Radeon RX 7900 XTX at an MSRP of $999. I don’t have any AMD cards to test against, but using AMD’s own slide deck they are suggesting 72fps for Cyberpunk, though they aren’t specifying the settings. On the 4080 I had a FPS average of 101 with DLSS, and 70 without at max settings with RT turned on. It seems like these two will be going head-to-head.
It’s still worth mentioning that these prices are driven by realities in the market, not as much by greed as people think. Chip manufacturer TSMC has increased pricing across the board, and since they provide chips to AMD, Intel, NVIDIA, and more, it’s very likely that what we are seeing with this price is the tip of the proverbial iceberg. I don’t want to make value judgments on your behalf, but I will at least acknowledge that this is a premium price, even if significantly cheaper than a 4090. What I hope is at least abundantly clear is just how much that price will get you.
I mentioned that I love the boundaries between generations, and as we reach the end of this review, I’ll tell you more about why. Yes, there’s lots of oohs and aahs for new and shiny gear, but what these moments represent is the start of the improvements. When we benchmarked the 4090, DLSS 3 was brand new. In the month since, we’ve seen several major updates to the technology, and a solid jump in the number of games supporting the new tech as the 4080 launches. DLSS 2.0 continues to grow. When we eventually get to the lower end cards, DLSS will be the difference between running them at high framerates, or even at all, so that adoption is far more important than just picking a GPU manufacturer to support.
We have a bit of history to look at to predict how this card will age. Using the RTX 3080 Ti’s scores at launch and again today we see scores that improved between 30 and 40%. At the start of the 4000 series we have the next generation of hardware, but also a groundswell of support for AI-driven improvements.
GeForce RTX 4080
The RTX 4080 delivers absolutely staggering performance, whether you are playing a DLSS 3 game, or just rendering at native 4K. The RTX 4090 is obviously the flagship for NVIDIA, but the RTX 4080 manages to deliver nearly as much performance, albeit for $400 less.