Introduction to Electronics Cooling – ATS Webinar

Welcome everybody It’s a pleasure to be in front of you again and we’re going to be talking about specifics of electronics cooling or introduction to electronics cooling and I like to start
with the electronics cooling. Just to get you familiar with what the concept is, it’s a big menace for us as we as we come across variety of applications
and if you don’t manage it from a thermal standpoint whether is your
iPhone or Samsung phone, whatever phone that you’re using to some sophisticated
electronics like a multi-channel computer center that does a lot of
sophisticated computing, it all requires thermal management. The reality of it is
when we look at the application the heat kills irrespective of the market sector.
Now you can see examples of a windmill catching on fire this is the inverter
that’s caught on fire as a result of the overheat you can see examples at the PC
that is on fire, laptop, even a drone that that’s in flight it gets caught on fire. We’ve
seen examples of the laptops, computers, hard disks even the electrical boxes
that we deal with because of the excess heat that’s created not being able to
cool effectively catches on fire and fire is obviously a threat and also it
shuts down the function that we’re looking for. So if not managed heat is
a fire threat and is a showstopper for us and we have to do very diligent work
in order to make sure that it’s done and and it’s done effectively. So when you look at the electronics where is the source of heat where is it coming from
what is it that we have to do in order to be able to mitigate it? It starts with
a chip irrespective of the component that you’re using whether it’s an IGBT or powered device or a CPU GPU, there is a chip
there that is generating functionality that we’re looking for and
that chip is nothing like a but a very very small micro sized PCB. Very flat
very small profiles and it’s very difficult
to manage from the time to time depending upon the heat distribution or the chip
itself because we cannot make an assumption that the chip has a
uniform power as I mentioned it’s very much like a PCB as PCB has different
power distribution and hotspots, the chip does the same thing and this is the area
that the problem starts this chip is packaged into a variety of packaging
that’s available in the market and it is like 50 different packages available and
this creates a functionality that eventually goes onto a board and as
a result of passing current into the device these are materials that are not
perfect they have a current leakage and as a result we generate heat and the
heat as you can see is dispersed quite a bit so not only the chip itself is
affected but also the devices around it are affected as a result of the
conduction convection that we have. So we have a huge constraint when it comes to
electronics packaging in order to be able to remove the heat as thermal or
mechanical engineers we are the last link on the food chain and it is
very important that for us to be able to be creative come up with unique
solutions because oftentimes we don’t get involved at the onset of the design
we are given the final product that we have to cool with very limited ability
to be able to move the heat around once it’s circuit’s put together. So you can see
from the chip all the way to the package and the eventual system, that’s where
the menace is, that’s where our focus has to be when we go through electronics
cooling. So what is the packaging hierarchy that takes place is
sometimes people call it the C4 you start at the chip where the chips
placed onto it onto a chip carrier and then a component is produced as a result
of it and again the variety of components in the electronics market see
many many examples of it typically in most applications this component carries
the chip is placed onto a PCB that’s a printed circuit board the stuff
that we have a degree in material that’s in the background is printed wiring
board so when we put the components on it is called the PCB and then this
single PCB could be a engine by itself or oftentimes it’s put together like a
telecom chassis or rack that you see and then creates a particular functionality
like what we see here is a system that provides video streaming for the datacom market and often times these racks and chassis are put together
into a system that creates a bigger picture as far as deliverable is
concerned. Imagine you want to feed a city or a block or whatever for a variety
of calculations or delivery of data or it begins we can see a lot of
examples in the data centers or central offices of devices like this and even
more recently on the biological computing that that requires a lot of
these boards to be put together in order to provide the function that’s required.
So you can see the hierarchy of the packaging that goes through, from the
package all the way up to the system and this is where it goes into the customer
premise this is where the eventual system is residing. I’m bringing this up
I want you to sort of bear in mind when we talk about the system so when we
come to thermal analysis we see what the hierarchy is and so forth. So the packaging
hierarchy is in this direction and we’re going to see how the thermal analysis is.
Now it is important to notice how the heat is generated and what happens when
this heat is, what happens, where it goes we talked about component going onto
onto a board and rack and then eventually into into a cabinet, the
heat is here. On that chip that you saw a couple of slides ago and that’s
where a lot of heat is generated and this heat is conducted
whatever the functionality of it is that pushes the bits into the network and so
forth the wasted heat is conducted through the package and
eventually conducted and radiated off everywhere. It’s a highly
three-dimensional heat transfer and they have to
again pay attention to the importance of it as you can see in a short slide or
two. So this component goes onto a PCB this PCB goes into a bunch of racks and
eventually into a system as you can see here and this heat that is now coupled
together, now we have airflow or liquid cooling through this depending on what
your cooling applications doesn’t make any difference the advantage of a liquid
cooling is you pick it up and dump it elsewhere now you have to deal with that
dump elsewhere. Air cooling tends to be more local and there are a variety of
designs to be able to take this heat away from the rack or a bay into the
given environment and eventually goes into a system and these cabinets are
deployed across but again there’s a high degree of coupling that takes place
within the system now, if I’m generating heat in this location and depending upon
the path of airflow and the design I’ve done for my for my system, I could
be heating devices that are located here the heat that’s been absorbed for
instance from the chassis if it’s outside cabinet or it is residing right
next to another cabinet, the radiation coupling that takes place or convection
coupling that takes place between the two could potentially cause a problem.
So understanding of the heat and its sources and the level of coupling that
takes place is very important. Another very important point to note, a lot of us
as engineers have a habit and tendency of looking for a light
solutions in the daily engineering tasks and say okay you know
so answer solve this problem in such a fashion so if he was dissipating 45
watts on a tiny little board in natural convection I can do the same thing, not
the case. The reality of it is the design is as different as the designs designers
who made it, the materials that are used in the electronics packaging is so
different it is very much manufacturers dependent the fundamental materials may
be the same but when they put into a composition especially dealing with a
lot of polymers the material properties are very unique to the manufacturing
process taking place. As a result of it the data
is not transportable meaning if I was able to cool a device, a small tiny little
board with a natural convection of 45 watts that doesn’t mean you can do it. However
the procedure that I took or the process that I took to solve the problem
is very much transportable if I understand the solution path, not the
actual number or actual data, if I understand the solution path I can take that
solution path and apply it to my problem and that makes my life a little bit
easier I can learn from my colleagues that we’ve solved this problem before
I did. But my word of caution to you is don’t make an assumption because of the
fact that some articles or some webpage or whatever you’ve seen and give an answer
to a problem that’s a universal solution it’s very very difficult in a problem
that’s multifaceted like this to come up with a silver bullet. You see a lot of
companies out there you see a lot of products and solutions out there and
they think that solve the world’s problem but in reality is really specific to a particular class of problems so word of caution as we have a
nasty habit as engineers to look for the for the quickest solution and get the
answer because we’re under pressure to deliver a lot but understand this and
understand the solution procedure then the data becomes clear. So what is the
what do we mean or what’s what’s thermal management why do we need this what’s a
big deal there are two issues that we have to really worry about, one is the
functional integrity the other one is the operational reliability. We know the
semiconductor devices performance may drop with the increase of the operating
temperature hence we can get low temperature we heal satisfactorily
function what does that mean is a long sentence, a simple sentence is if I can put
you know 10 bits into the pipeline if I cool it down to the level that is
operation and not disadvantageous I can maybe put a hundred, so I can gain
efficiency on functional integrity without creating any kind of error
and as the temperature goes up I can create bit errors and a simple example
for instance let’s say I take a picture of your chest as x-ray picture but the
doctor most likely is not sitting at the hospital, he’s either at home or a
different state especially with today’s electronics and so forth. If I have electronics problem I can’t put impressions I can put dots on you on
your x-ray this is manifestation of the electronic
problem because of the temperature as one of the causal agents and so the
interpretation of the doctor is this different.
Or you transfer money from one bank account to another bank account you want
to send a thousand the electronic send 10,000 of your money to them to another
account these are the kind of functional errors that we can see a so called the
bit errors that could cause a lot of functionality. We’ve seen hiccups in our
communication, for instance we get choppiness and so forth not that’s all
from thermal but thermal could be one of the causal agents for this for this
process. The other one is operational reliability
you see you saw the pictures that I showed you on the wire bonds and
so forth that we have on the chips this is if you remember like a circuit
breaker or a fuse, if the temperature gets to a certain level the thermal stresses
are so much that’s going to cause mechanical failure, there’s gonna be
disruption in the line that is carrying the signal and this disruption could be
at the micro level so your chip is broken and it doesn’t work anymore or to
potentially shut off some of the circuit flows and not
deliver the function that we require as a result of it you’ve had a hiccup in
your system or catastrophic failure. So in order to prevent these two elements
there is a very simple rule that we have to follow. If you don’t take anything
else with you as a result of this presentation and take one item is the
thermal management from my perspective, others may disagree with me,
its sole focus is Junction temperature of the device. For most silicon this
number has evolved to be about 125 degrees C for any ambient,
it has to be less than or equal to 125. We typically when we go
through the design because of the complexity that I mentioned with the
multiple material properties with the packaging issues, manufacturing processes
etc etc. We typically leave about a 10% margin on this, so we don’t call the
design complete unless I have a 10 or 15 degree difference to 125 so if my
Junction temperature calculation for the worst possible operating ambient is 110 or
112, I cautiously call that design complete and go forward. So my sole
purpose of the thermal management it’s a focal point of junction temperature, when
you look at the literature when you look at the manufacturers data etc etc
they’re making a reference to this this device is an 85-degree device and that’s
a very, from my humble perspective, is a very very erroneous
data to communicate to the field because in the laboratory I can create in a
controlled situation 85 degree approach ambient temperature of the fluid with the air or
liquid to that device, in the real world when I’m dealing with this kind of
complexity it is very very very difficult and you’re going to see
examples of it why I’ve mentioned this very difficult to really accurately
measure what the approach temperature is and whether 85 degrees makes
sense or not, but Junction temperature is a very referenceable temperature it
is, from the engineering standpoint, it is the hottest point that
I measure on this device, but in a more device designed by the standpoint are
these tiny little wires the hottest part on these inter layer metallic inter
layers this is one that creates the junction that we have to manage and then
reduce. In some devices depending upon the packaging and so forth and the power
density we could have devices that are in excess of could be in excess
of a couple of thousand degrees C, very hard and could melt very quickly
because I have a very small area with a lot of current going through it so heats
up just by the nature of the physics. So if I walk away from this conversation
and discussion with junction temperature being my
criterion for successful thermal design I think our mission has been accomplished in
this webinar. So the goals of thermal management are very very clear is the TJ
which is the junction temperature in order to prevent catastrophic failure it
has to be less than fail when the T spec is 125, these
are all I’ve written absolute temperature here but this is in
reference to the worst possible operating ambient. So if I’m working for
instance with the telecom datacom application it’s about 55 degrees C, if
I’m dealing with consumer electronics typically 30-35, if I’m dealing with data
centers and so forth it used to be 25 degrees now it’s in the neighborhood of 35
because they can really cool it effectively, etc. Another very important
parameter is the desired operating temperature when we go through the
design we want to be approximately at the mean time between failure again this
is for Delta if you imagine your head is at Delta T at absolute temperatures
because the reference is always to the worst possible ambient
it should be about 90 percent time up the T spec or around 115 degrees C for
most silicon if you put the power supply some power supplies you can push it up to
185, if you go to some power devices they’re upwards of 200 depending upon the device.
And then you’ll also want to do performance optimization meaning at any
time and any location attempt the delta T between Junction to ambient is less
than or equal to delta T any time between failure so this gives us the
optimization because of the fact whether we like it or not
the temperature distribution as well as air velocity distribution electronics
equipment is very temporal and very spatial.
It depends upon where you are even in from card to card space in front of a
component if you traverse your probe from the coast of the edge of the bottom
board all the way to the top you see a significant temperature gradient to the
tune of maybe 15-20 degrees the notion of a uniform slug flow and
uniform temperature is a fallacy that we don’t see, it’s good for a pipe flow, it’s
good for a duct flow, not when we have these 3D protrusions coming into the
into the channel and causing the flow to stear and do all kinds of funky things
as you’re gonna see. So what does thermal management entail? The objective, as I
mentioned, is to maintain the junction temperature below the certain limit of a
given class of devices for the worst possible ambient. So the hierarchy of
modeling remember I told you remember that system and the way the packaging
was done we went from the component all the way to the system, you want to go
from the environment where the system resides this finds the
boundary condition that we have to deal with like the ambient temperature,
imagine for instance you have developed this security camera that goes into the
southern states in the country and there’s going to be in a blazing sun in
july or august with a high degree of humidity, that’s going to impact heat
transfer, so you need to that’s the environment that the boundary condition
that you have to set then you put your cabinet where the electronics are housed
and you recalculate the for instance the Sun loading in the example that I gave.
If I have a cabinet that’s sitting in the Sun with all the treatments that I can
do with respect to reflective paint and and shielding and stuff like this it’s
very very difficult it’s never impossible but it’s difficult and costly
to minimize radiation absorption and as a sort of a simplistic rule of thumb if
I’m dissipating 100 watts in my box, typically I expect to get out of there
80 to 100 watts of Sun loading in my system as a result so all of a sudden
the boundary condition is different and I have to I have to be able to account
for that. Then I gotta go to board where the components are housed,
then the component itself the component packaging and then work my way down to the chip. This is where the junction resides.
So the thermal management is just the opposite of packaging, we go from the
outside where the boundary tension is ultimate of the junction temperature
you’ll never start here because this is ambiguous and the other thing that we
have to remember the reason we are doing this is a level of coupling that takes
place. Remember in these slides I mentioned several times please pay
attention to the way these are coupled with each other that’s why these arrows
the red arrows are showing that how the system is coupled together in order to
be able to, just the nature of the package causes it to be
like this. So for me to calculate the junction temperature of this device I
have to start from here work my way all the way back and calculate the junction
because the boundary condition here, whatever is transpiring here
is going to impact this as a result of the heat transfer in another system. so the utility of the solution level
when we go through the thermal modeling or thermal management, the
environment gives us the boundary conditions of pressure and temperature
around the system, the cabinet is the interface of the environments, it’s the
balance of energy that’s where I want to put my control volume and do the balance
of energy so I can solve the problem. Board is a boundary condition
with pressure velocity and temperature and that’s again we want to put a
boundary condition around there or control volume around there in order to be
able to solve the component level. At the component is the interface to the board
and the coolant, again this is a location for the energy balance, once I put in
energy balance on this you’re going to get the junction temperature. In the
subsequent webinars that we’re going to have as we’ve shown in the past we’re
going to show how this concept of integral modeling works very effectively
for us to develop a governing equation for the junction temperature of the
device that we’re working with. So the hierarchy is from the system to the cage, to the board, to the component eventually to the die for calculation of
the junction temperature and that’s how these systems are these areas are
utilized in order to be able to accurately predict this. Again accurate
prediction of the junction temperature is critical. In most entities there is a
bunch of mechanical, a few mechanical engineers who do a thermal work there
are few engineers who are doing reliability analysis and there’s a whole
bunch of electrical and software engineers. So the data that we provide
typically goes up the food chain meaning they come up with the junction
temperature we said the system will go no-go
and then the reliability engineers take that information up and calculate the
expected life of that system. In some of the bigger systems that they
expected to be in the market for a long time seven eight years the
reliability calculation expected life is of paramount importance because you have
to design a system that lasts for a long time. In the past when when I was in Bell
Labs and we were designing the telecom and datacom cabinets we had to design
for fifteen to thirty years depending on the application, so it was of highest importance to be able to calculate the junction temperature so
our reliability engineers could calculate the expected life and if
the expected life didn’t meet the requirement of say 7 years, 15 years,
30 years, X years whatever the number is doesn’t matter, the design has to be
reconsidered. So the information that you convey could have a long consequence
effect as far as product introduction the cost of the product etc etc so it
cannot be taken lightly I wished the folks who are managing the
electronics companies would have a better appreciation of how important
thermal is and how cost saving can be can be if we do this upfront. A lot of
this work can be done upfront when the system is in the conceptual stage so
we don’t have to go all the way to the system layout and then realize that the
system is not going to function effectively and we have to go and redo
it all over again so we understand the hierarchy I hope
that why it’s important to go from the environment to the chip and the role
that every single level plays in order to come up with a junction temperature.
The more accurate I am with my description going down the ladder, the more accurate my
junction temperature calculation will be as a consequence of it cost of the
product the cost of the development and so forth is gonna be first. So let’s just
take a look at every single aspect of these, we started with the component
we have drawn it just for the sake of the description and more of a traditional or
old-fashioned gold egg type of component these are the least where this where the
circuit comes in typically is some sort of a packaging
of molded package of whatever a chip is bonded to chip carrier whether it’s on
the top or bottom but there is a BGA or attachment of whatever this doesn’t make
any difference the heat is generated here it goes every which way. Take this
picture as the heat that you generate here whatever is consumed for therefore
the functionality of the device to pass electrons to the next level whatever is
leaking out of this, out of your circuit it goes every which way don’t
make an assumption that that’s not going to go from the top I have written such
as in a given package, nothing is going to happen,
once that heat is leaving the system by radiation convection and then conduction
to the board, the board becomes an integral part of your that the printed
wiring board with different layers of copper some had wheels and so forth
it comes in take important and integral part of your cooling solution and it has
to be paid a very close attention to. We saw a picture earlier on that heat from
the device was actually by conduction in So my thermal
management I cannot ignore what happens in the board in order to be able to
provide a successful solution. So some of the points to note multi heat
transfer path I should have put a heat transfer here high level of
interconnection need a source to sink and this actually if I understand it
properly I can use the board for instance as a very effective heat sink. I
incur the cost once put some layers of copper in there reduce the temperature
in a lot of applications for instance 50 60 percent of the heat that’s generated
in the device actually gets conducted into the board. Highly three-dimensional
and large level of heat spreading and and that this is also very very
important when you come to the calculation as how well the heat is
spread and how the local components are affected. So we have to pay attention
to that. Then when we come to the board level it’s a interesting phenomenon
because in most air cooling application that we come across it’s probably it’s
80 percent of the market is air cooled it may be even higher there is these
protrusions that are into the channel, typically we have one or two or three or X boards
on top of each other their flow comes from one side to the other side and we
have these Manhattan-like topology that has taken place that causes a huge huge
havoc for us. We do a lot of water flow simulation to understand what the board
layout is and how the flow is being distributed. Some of the videos that
we’ve done I’ll show you some pictures they are just breathtaking because I had
never seen before these flow structures that
exists in a circuit board so conduction coupling via the printed wiring board is
huge if I’m pumping a lot of heat into the board
I could be heating my next-door neighbor next to a component if I’m again heating
a lot I could also depending upon the air velocity and the surface condition
and sort of I could be radiating a lot of heat to my neighboring component.
Typically radiation after about two meters per second is smaller portion but
in natural convection radiation heat transfer to the count for up to twenty
five thirty percent of the overall thermal transport that’s occurring in the device. [Audio Interruption] Convection coupling is huge, meaning some
of you might for instance I’ve seen convection ovens they say the chicken
cooks much faster the reason for it is getting hot air that’s wrapping around
the check in with the roast in a higher speed and as a result of it the heat
transfer is significantly larger well we have the same roasting effect here if I
have a very hot device and air is coming through here this air gets heated
and this low power device that I pay no attention to had only one or two watts
it was not part of the consideration but because of the packaging being downstream
of this hot device I’m actually cooking it. So we thought this device was
dissipating two watts but now it’s dissipating five watts and either airflow
management is required or we have to come up with a heat sink or some other
cooling solution for this. So at the same time as I mentioned I cannot ignore
radiation Copland it’s something that as mechanical engineers not that much
emphasis was put on the radiation everybody thinks that radiation happens
the Sun surface temperature no radiation happens as long as you have a delta T we
have heat transfer, all three modes of heat transfer persists at a given delta T, now the magnitude of it’s obviously a function of the condition that we’re
dealing with. The magnitude may be small but it’s never zero, so in your layout
and set-up, radiation heat transfer could be very significant that we have to
account for that. Here are some flow visualization
stuff that we’ve done that is just mind-boggling. This is a standard PCB where the flow is coming from the top to bottom we see this is at 150.75 meters
per second, a digital board, and we are visualizing the flow with two color ink.
We can see that the flow that’s introduced here it’s accumulated here,
the flow that’s introduced here it’s coming back down into here. Well guess
what these devices are coupled via convection, so all this air
which is shown in blue is accumulating and coming down here as it’s moving down
and then circulating slowly here, is picking up all the heat that’s been
generated by these devices so this device number 20 gets cooked as a result
of the convection coupling. Look what’s happening here, these are not a
manifestation of photography these are the hot fluids from two
different locations are actually merging together and coming down the pike here.
So there is a lot of interaction taking place. Look at the effect of the
float if you think that these devices are getting adequate cooling you would
actually look at it and say oh there’s there’s nothing wrong with this, this is
a passage here that the flow is going to come in and device number 18 is going
to get cooled effectively. Well look what happened as a result of the board layout,
the flow comes in here it breaks to the left and then goes out
here, all follows the least resistance path and as a result of it these devices
are not getting any cooling. This is getting partial cooling number 17 or
number 18 and then 13 is not getting there the cooling that’s required. Look at these optical modules they’re coming in and they’re shooting out and
as a result of it the power supply is getting very little to no flow. This is
the one that’s most fascinating, had I not put the arrow here.
You would have thought the flow is coming from left to right, but the reality
the flow is coming from right to left this is a component that the height is
twice as this and there are three components in line
and look what has happened here this component has created three stagnation
points and it is creating these horseshoe vortices that are going around
the device. Their flow that’s accumulated between the two devices is
circulating and coming back out and going out this way.
So the heat that’s generated here is going back to this device, is heating
this device, it’s creating a stagnation point here and when I was talking about
you know when they say 85 degree device temperature well where am I going
to measure this 85 degrees C? Get a thermocouple and traverse it from
top to bottom I see a significant temperature gradient across and as a
result of it I have no idea what the temperature reference was that the
device manufacturer designed. That’s why they focus on junction temperature. When
we have this kind of soup that we’re dealing with and it’s a hodgepodge of
stuff I have to go to a point of reference that I can measure and I can
adequately control and hopefully manage and you see you can see why we are
talking about the junction temperature. So you can see a highly coupled system
and this is the message that we want to take have you take away with you with
you that it is very very important to pay attention to the coupling that take’s place. So let’s look at the electronics thermal transport and the
traditional problems that relates to electronics, obviously cooling, so the
thermal coupling is the primary mode of coupling is by convection heat transfer
and we have a system like this and if you have a critical device that resides
here and the flow comes in and remember the air is very smart, or the fluid
I should say, is very smart it’s going to look it from a mile away it’s going to
find out what the least resistance path is and go through it
it’s an incompressible fluid it’s gonna identify the lowest resistance path and
it’s going to go through it so it is upon us as engineers to make sure that
we have a system that we understand the flow distribution otherwise we have to
deal with this. Other non-critical components may have an adverse response
as a result of the thermal coupling that takes place. This is the biggest menace
part of the reason that us as thermal engineers are in business is because of
the fact these problems are so complex to solve otherwise if it was like a
traditional classical problems that we saw in our heat transfer and fluid
mechanics courses, electronics engineer brothers and sisters would have been able
to solve the problem without any issues. So combination of the convection and
conduction thermal coupling may enhance component level communication via the
backplane a carcase we have to be cognizant of that. So what are the steps
for successful design? Hopefully this would make sense to you as you look at
your process in your company this establishes specific thermal requirement
sometimes the military does that very effectively, I know there’s
some of the Japanese companies in the old time days they used to do this.
They would come back for instance say temperature in no place of the system
any component any board anything can exceed 100 degrees C. So what this does
as long as it’s reasonable it gives us a point of reference to compare and work
towards so I don’t have an ambiguous anymore 85 degree component, junction
that I can’t measure, hopefully I can measure and manage it effectively but
this gives us a ground zero. Employ system level approach, aim for two
solutions starting with that integral model what are we talking about because
of the complexity that we saw a lot of us have a tendency with the advent of
very very strong CFD tools that are out there and they’ve been verified to the
kilt but I can never rely upon a single solution where it is
integral model, experimentation, or CFD to say my solution is complete. I have to
have two independent solutions whether in what I mean by independent
let’s say I have me and Joe and I’ll do the CFD and Joe goes and does the
experiment or come up with the integral model and a little calculation we sit
across the table I said Joe what’s your answer what’s my answer
we have to be within 15% of each other. If you’re not the solution is not closed
we have to start all over again one thing we do not want to do we do not
want to model one thing and put our data into another model and say well we are
within 2% no that’s not a independent solution anymore that’s a highly
dependent solution because they took the data from one we put it into other
completely independent. From thermal evaluation that all phases of the design
cycle to make sure that the design is meeting the requirement, integrate different
disciplines system electrical physical and reliability engineers into your
conversation because they are all been affected by the data that we produced as
I talked about it before. So it is important to go through these four steps
in order to be able to produce and create a very effective design process.
So let’s go through a simple exercise we have a purpose to highlight your typical
process of thermal design. A system shown below is this to be cooled by
force convection, meaning we have some fans or whatever inside the system
that’s moving the air. The vendor of the fan shelf has provided a fan curve to
determine whether the system will meet the thermal requirements meaning
junction temperature with the following information provided. Circuit power
dissipation is about geometry tentative circuit board layout and critical
components are identified. So we have our system, we know it’s going to go every
which way and we have to go and approach it and see whether we can solve it. So some of the
questions we want to ask. We know the junction temperature is a function of air velocity
ambient temperature. This is a must we there’s nothing else. I
can remove the air and put fluid versus air but if you’re dealing with liquid
cooling no different the problem is the same that the attrition is the same. So
we want to get a very clear picture wanna understand the problem before we
jump in there and do a CFD model or integral or go into experimental modeling.
So system level system applications site, indoor/outdoor remember we talked about
the radiation coupling system design constraints as simple as must be painted
the particular color for instance a lot of the stuff they go outside because of
the fact goes into people’s neighborhoods they don’t wanna have some
that’s bright yellow or whatever it has to be certain color and that sort of
filters into the background well every color has certain solar
absorptivity conditions, the consistency of the paint that we use is very very
important so we have to understand so we can begin design, we can assess for the
solar load. System environment in suggesting systems if I’m going to a
data center I’m going to put a system right next to
this thing that’s going to have 15 kilowatts of power
next one kilowatt I’m gonna have coupling that takes place. Air filters the
characteristic system vents and openings shapes of the numerous color etc.
Card-rack level card-to-card spacing, EMI shielding very very important and what’s the configuration. Free air passage area, shared material is
attachment to the frame can I possibly use the shelf for the card gauge or as a
heat sink and be able to transport the heat away.
Board-level, board material now we got to understand what level of conductivity we
have. In subsequent webinars we’re going to show you four, sixteen, eighteen layer
board etc what did what the thermal conductivity is and what an important role
it plays. Possibility for local metallization of the board in order to
be able to us it as heat sinks. Options for component placement and how well you can lay it out. Again we’re going to show you as a change in layout I can gain
significant advantage for cooling with the exact same board exact same
components what i have to buy it by the buy-in from the circuit designers
because once you move the components around the communication that has to
happen between the components and the distances are of paramount importance and I
cannot randomly go and change it. So component level package type it’s very
important you know if I spend extra five dollars on the package and get a more
conductive package I don’t have to worry about the heat sink. If I put the
metallization in my board I may not have to use a heat sink. So all of these things
will have cost implications and packaging. Power
dissipation fluctuations, the specific location of the curricular
components. Power dissipation of the component is adjacent to the critical
component. Spacing, thermal data, specifically RJA of the critical
component, I don’t like these resistors but in a pinch it can potentially be
handy to give me a warm fuzzy feeling as to where I what I am designing and what I
have to do. Certainly as a junction temperature calculation I advocate
against it. Neighboring board component power we talked about this I had a level
of coupler. By obtaining answers to these questions we should have a good picture
of what the system is and strategize our cooling solution. So the first thing is
to calculate the air velocity. We have the fan curve, we have the system curve, the
point of intersection of the two gives me the air velocity that the system is
going to see as a result, but this is a bulk air velocity not the local velocity
to the component. So it is standard calculations that that that we do in
order to be able to obtain the effective system. And we do some
component level calculations it’s really important to calculate the approach air
temperature, I call it the T of A for T ambient, but this is not the T ambient
here, it’s the T ambient here for this component. So it’s a simple change of
enthalpy across this channel and does CP delta T, TR to TN I can calculate the TA
as a function of these two and now I have a sense of okay I’ve approached air
temperature again this is like you put a thermocouple here that’s what this
calculation shows, if I move the thermocouple a little bit here a little
bit there the number is going to change no ifs and buts about it but I want to
get an average temperature that’s approaching the component, so I can get a
TA. So quick calculation is I used the RJA based on the number that I have
is my junction temperature satisfied? TJ minus T ambient divided by power is not
the more detailed calculations and local numerical modeling that I have to be
able to go through this calculation. So the quick answer is RJA, TJ minus T
ambient divided by power again remember this is the T of A I’m using them
interchangeably just to convey the same same message divided by power and the
power that we’re using is the total power that’s dissipated in the
the device. That’s another parameter that we have to be very careful with when I
going to go through the calculations that when I used to go through the
calculations always use the maximum possible power. My system passes for the
maximum possible power that the device manufacturer specifies, then I’m safe and
the conversation or question I have with circuit engineers would be would this
device ever see the maximum power that’s specified in the datasheet? If the answer
is yes in our design then I’m gonna use the maximum power,
if not I’m gonna use the nominal power. This is a very important parameter that
we have to be cognizant of when we go through the calculations. So the more detailed
calculation analytical modeling we go through the we get the heat transfer
coefficient for the backside of device etc so this is the control volume
approach that I call it the integral modeling we put a control volume around
the device or a system and we specify the heat transfer coefficient so in
spatial or the thermal conductivity. We do an energy balance Q dot N + TJ generate is equal to Q dot out minus ik + accumulated and that’s what I do we
start from that and we could we work our way down. A computational model we have
to go through one of these CFD packages here we obviously we have a
number of safety packages that we use here we have Six Sigma, Flow Therm, CF
Design and also a package that is based on then just skip in my mind right now
but it’s a very very accurate high-level calculation. Six Sigma is our primary
tool that we use, it’s a very fast good tool to use in order to be able to come up with a quick
solution for these. By the way, one of the reasons, besides Six Sigma, that we
use for the variety of design applications part of the reasos that we have different
tools is because of the fact there are different capabilities in different CFD
tools and also different customers require different solution tools and
then once the solution is done we obviously have to go through the
verification either with experimentation or analytical modeling. So you can see it’s quite
a bit involved when you go through the through the process.
Irrespective of what we did remember I mentioned that we have to have a margin
criteria that we use for thermal design I call it the ADA factor that the delta
T J divided by delta T spec this is the junction minus approach air spec is that what
the manufacturer says at this temperature the device goes kaput the
kaput is not that device blows up it starts creating bit errors. A bit error
is a point of failure that we have to minimize and eliminate. So when
when this delta T is less than or equal to 90% the solution is complete if it’s
not we have to go back into it so remember at this point, analysis
is required to calculate the fit rates we also have to be very cognizant of the
acoustic noise acoustic noise is a condition and also nowadays or over the past 10
years the environmental effect as far as the carbon footprint is concerned is
very very important and this is from the concept all the way to disposal. So when
the customers are paying attention to especially the big customers not much if
you buy a iPhone or Samsung phone or whatever I’m talking about the people
who are spending millions of dollars to gear up for a data center or some office building has to put their own data center in there, they all look at the
full cost of ownership which is no longer just buying the product the
service it is the product and all the way down to disposal of it which is very
very expensive these days to dispose. So you have to consider all of those in
addition to the acoustic noise to make sure that the device is working properly.
So in your leisure just to do this quickly to see whether that N dot CP
calculation and the calculation of the junction and so forth can be done very
very quickly and you get a sense for it So road map to a solution in the few more minutes that I have. How do we develop a solution for electronics cooling problems?
Obviously the first one is define a problem. You saw these, the pain that I
went through just to convey a message by putting these bullets. I’m sure there are more questions we can
add to this this is if you will for cookbook of a sort for you to ask questions
what is it that we have to ask for and see what what they are and what we have
to do. So this also tells us the definition of the problem what is it
that I’m dealing with what what kind of a problem is it what kind of stuff the
steps that I have to go through and then based on this, develop a solution based
on integral model this is the control volume conservation of energy that I
made that I mentioned this gives you a lot of capability to do with what-if
scenarios and again in subsequent webinars we’re going to show you as we
have done in the past how these are done. And we ask a very very important a
big question is the problem phenomenologically understood? Why are you asking
this? People have become very CFD happy computational fluid dynamics come across
a problem, they model it in a CAD tool to SolidWorks Pro E, whatever you guys are
using and then they pump it up into one of these CFD packages and then you push
a button and start doing the simulation. If the problem is not phenomenologically
understood, the CFD tool is not going to be capable of addressing this.
By the way the name of that tool was DNS Direct Numerical Simulation that means
it just just came to me I did a little bit of a senior moment as Josh tried to
make me as old as I am. So if the problem is phenomenologically understood we do
computational, if it’s not we do an experimental, but the core of it, the mother
is integral modeling. So we compare the solution and make sure that the ADA
factor is satisfied. If the answer is within a specific tolerance, the solution
is complete, if not we have to go back. So doing your homework upfront you don’t
repeat all this process all over again and you take your step by step and walk
it through it hopefully you can go through this as
it’s like your fluid as straight as possible and then come up with a complete
solution. So it’s a discipline of going through the problem whether you like it
or not this is what we have to do we come across a lot of clients across the
globe that they come to us when they have a problem, the system is designed
they’re asking us to come up with a cooling solution because
the product is delayed for shipment and as a result of it, we notice that
they just did CFD or adjusted measurements but they did nothing
and now the system is failing and they expect us to perform miracles and it
makes our lives significantly more difficult but keeps us in business. So
the roadmap that the criteria to satisfy the solution are as follows solution is
developed based on engineering principles, it is trackable, the developed
solution can be defended with engineering reasoning. You know when you
go through this what did what did this mean? The developed solution can be
defended with engineering reasoning, for instance as I increase the
velocity we all know the heat transfer coefficient becomes osmotic so after a
certain point I’m going to only get two or three percent changes maybe one or
two percent so if I marched my solution analytically or competition to say five
meters per second I got to see my temperature variations are in the
monitor per cell if I see that temperature has increased or still
decreasing either have not reached asymptotic solution or there is
something wrong with the solution so these are the kind of there has to be
engineering judgment and based on engineering principles that we know is a
matter of fact whether you’re dealing with electronics cooling or rocket
technology as long as you’re on this planet, the laws of physics apply I don’t
know what happens when you go to Mars or some other galaxy but as long as you’re
on planet Earth the laws of physics the laws of thermodynamics apply and we have
to abide by to ensure that solution is correct. So product design cycle and the role of thermal model what happens where do we
where do we play and what kind of a game we have to insert it into the full cycle.
Obviously in most entities input from system electrical software a physical
design for mechanical engineers and thermal engineers. We are lucky to be
followed up conversation but this certainly there’s the circuit designers
the system designers are the ones who do it. They conceptualize the system. At this
junction we want to do a very quick thermal analysis, I call it the
first-order analysis to determine whether the system meets the expected thermal constraints. If it does we go to the electrical prototype at the second level we do a thermal analysis verify the design computational
or experimental analysis in order to do it. To make sure that the system is
meeting that junction temperature requirement I don’t have to look for
anything else, the junction temperature requirements is of utmost importance. Some
of the applications that there’s a human interface involved, the
surface temperature also becomes very important in most applications you can’t
go over sixty degrees C, so your Junction has to be maintained at a particular
level so you don’t have any electronic problems, the surface temperature has to
be below sixty degrees because of the fact the touch factor that you have. So
if it passes you go back to the system build, once you go through the system build that
the last test is a thermal evaluation this is an environmental stress testing
to verify functional performance and it’s really nothing
to do with solar per se, some thermal and mechanical engineers get
involved in this failed phase of testing because this is a serious system test at
elevated temperatures in order to make sure it’s working. If it passes we ship
the product if it fails at any of these junctions you have to go back hopefully
if you’ve done your homework if you get a failure it is not a system that’s not
like this this is this is cause for firing. If you’re working at a company
and you did the design and the system failed at this level I’d be
hard-pressed to see that you don’t get fired. If you’ve done your homework
properly at these levels and you’ve done the dump thermal verification,
this should be a piece of cake. If any issue is going to be more
empowering on the software on the circuit configuration and so forth it
should not be a therm. So at every level you can see the level of first orders
calculation second order calculation assistant level testing that is really
motor environmental and this really puts the whole picture of the of the thermal
analysis into into our design cycle. So with that remember that the heat kills.
Effective thermal management of electronics is the key for the failsafe
operation irrespective of the system that you’re dealing with
don’t look at it because I have I an iPhone or Samsung phone or whatever
I don’t have to worry about heat or if I have a windmill that generates a lot of
of airflow I’m not going to get on fire you are going to get on fire, look at
that drone. The drone gets on fire, it’s up in the air moving 5-10 km/h at least
if not more and still catches on fire despite the fact that it’s got
significant convection cooling taking place. Because of the fact that packaging
wasn’t done right. Had the packaging been done right this would not happen. Had the heat being taken out effectively to the body of the drone and then they knew
that at a certain air velocity this thing is going to get cooled off over the
pattern or had they designed a body inside in a fashion that had adequate
surface for the total power heat that’s being generated when it is still and even
if you’re in altitude the Sun loading doesn’t affect you, this would not happen
and I can go through the same explanation over and over for every
single one of these that you can see the fire is taking place. The heat is a
threat. Thermal management is a key to successful design and whoever tells you
otherwise don’t believe them. So with that I thank you.
Remember that ATS is here to support and help you. We are delighted that you have
participated and hope you enjoyed the presentation.

Leave a Reply

Your email address will not be published. Required fields are marked *