Different results from AMD and Intel machine

MichaelCFD · February 17, 2011, 15:20

Hi,

I hope anyone can help me out here: I got slightly different numerical results from AMD core machine and Intel core machine by using exactly the same code (MPI parallel). For serial running the difference is much smaller and only on some single cells for about 8th- or 9th decimal digits, but still there.

Debugging seems no problem or not get the root cause yet. Anyone here had such experience or any idea? Thanks.

Michael

MichaelCFD · February 17, 2011, 15:23

btw, the code is in C. Thanks.

julien.decharentenay · February 17, 2011, 18:05

Hi Michael,

It is not uncommon to not have exactly the same results on different architectures/OS.

For exmaple, the following looks identical, but may be treated slightly differently:
double c = 0;
double c = 0.d0;

I also assume that there is no random number generator in the code.

The main question is:
Does the differences affect the results at convergence significantly?

Regards, Julien

Martin Hegedus · February 17, 2011, 18:22

Is the code single precision or double precision?

Are they the same executable or did you recompile the code on each machine?

If you recompiled it, what level of optimization did you use?

What type of solver is it, structured, unstructured, implicit, explicit?

Is the solution steady or unsteady? If steady, did you converge it to machine zero? If unsteady, at what point do you see the difference build up?

You mentioned MPI. Does this mean you are using multiple machines or just one machine with multiple cores? Also, how is your domain broken up? For example, I would expect an implicit structured chimera solver to converge differently depending on how the overall grid is parsed and passed out among the various cores.

MichaelCFD · February 21, 2011, 10:31

Thanks for all your responses and helps.

I use the same executable compiled on a Intel machine. No random number. Also I tested to output (printf) the following as suggested:

double a=0;
double a=0.d0;

The output is the same from AMD and Intel for many decimal digits.

The convergence is fine for both. and parallel part doing the exactly the same thing. The problem is that even for serial running, there are digit difference fr om the two machines.

I guess if any one had tested code like this. Maybe I should grep another code to test...

Martin Hegedus · February 21, 2011, 12:37

Assuming your code is double precision, both codes are doing exactly the same thing, and the case is steady state.

1) Euler should be run first for comparison, then laminar, then turbulent.
2) The results should be converged to machine zero. In general, the residual should be in the vicinity of 1.0e-15 and 1.0e-16.
3) Assuming you don't have some very fine cells, or cells with poor Jacobians, the differences in state variables (i.e. rho, rho-v, p, etc) between the two codes should be less than 1.0e-10, IMO.
4) Integrated load values, such as lift and drag, can be off by much more since they depend on integrating pressure differences.

MichaelCFD · February 22, 2011, 12:19

Initially the difference is very small, like 1.e-14, but as iterations go on, it becomes significantly large... Did anyone here test your own code on different machines? Thanks.

Martin Hegedus · February 22, 2011, 12:36

Yes I have, and the answer you are looking for depends on the case you are running, the solution methodology, the grid, and what you are comparing. A simple answer does not exist.

You haven't given enough details to help address your question.

MichaelCFD · February 23, 2011, 21:51

So did you find any difference in results from different machine? if any difference, what is the cause? My code is a typical cfd code: unsteady or steady, fv, ... But I do not think those factors should make such machine difference... Thanks.

Martin Hegedus · February 23, 2011, 22:57

For an unsteady result, once the results diverge, even by an epsilon, they will continue to diverge. So, in that case, a difference of 1e-8 in a field value isn't anything special.

However, my general experience is that field values for steady results converged to machine zero are within 1e-12 between amd and intel for solutions on high quality grids. I take notice if the results differ by more than 1e-10. My experience is that, in general, when this is the case, the probability is significant that there is a bug in the code. However, this is only true if the solution is independent of the number of cores solving the problem. For example, the probability is high that the solution during convergence for a steady problem from an implicit method is dependent on the number of cpu cores solving the problem. This difference should diminish as the solution converges to machine zero, assuming that the right hand side is independent of how the problem is broken up for the various cores.

But, it is also important to take into account nonlineararities of the flow being analyzed. Epsilon changes in a shock or a vortex can cause noticeable differences in other areas of the flow field.

February 17, 2011, 15:20	Different results from AMD and Intel machine	#1
MichaelCFD New Member Join Date: Feb 2011 Posts: 6 Rep Power: 15	Hi, I hope anyone can help me out here: I got slightly different numerical results from AMD core machine and Intel core machine by using exactly the same code (MPI parallel). For serial running the difference is much smaller and only on some single cells for about 8th- or 9th decimal digits, but still there. Debugging seems no problem or not get the root cause yet. Anyone here had such experience or any idea? Thanks. Michael Last edited by MichaelCFD; February 17, 2011 at 15:37.

February 17, 2011, 18:05		#3
julien.decharentenay Senior Member Julien de Charentenay Join Date: Jun 2009 Location: Australia Posts: 231 Rep Power: 17	Hi Michael, It is not uncommon to not have exactly the same results on different architectures/OS. For exmaple, the following looks identical, but may be treated slightly differently: double c = 0; double c = 0.d0; I also assume that there is no random number generator in the code. The main question is: Does the differences affect the results at convergence significantly? Regards, Julien __________________ --- Julien de Charentenay

February 21, 2011, 10:31	different using the same executable	#5
MichaelCFD New Member Join Date: Feb 2011 Posts: 6 Rep Power: 15	Thanks for all your responses and helps. I use the same executable compiled on a Intel machine. No random number. Also I tested to output (printf) the following as suggested: double a=0; double a=0.d0; The output is the same from AMD and Intel for many decimal digits. The convergence is fine for both. and parallel part doing the exactly the same thing. The problem is that even for serial running, there are digit difference fr om the two machines. I guess if any one had tested code like this. Maybe I should grep another code to test...

February 22, 2011, 12:19	I use the same executable for AMD and Intel machines	#7
MichaelCFD New Member Join Date: Feb 2011 Posts: 6 Rep Power: 15	Initially the difference is very small, like 1.e-14, but as iterations go on, it becomes significantly large... Did anyone here test your own code on different machines? Thanks.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
CFX11 + Fortran compiler ?	Mohan	CFX	20	March 30, 2011 18:56
OpenFOAM 13 Intel quadcore parallel results	msrinath80	OpenFOAM Running, Solving & CFD	13	February 5, 2008 05:26
OpenFOAM 13 AMD quadcore parallel results	msrinath80	OpenFOAM Running, Solving & CFD	1	November 10, 2007 23:23
AMD X2 & INTEL core 2 are compatible for parallel?	nikolas	FLUENT	0	October 5, 2006 06:49
INTEL vs. AMD	Michael Bo Hansen	CFX	9	June 19, 2001 16:54

February 17, 2011, 15:23		#2
MichaelCFD New Member Join Date: Feb 2011 Posts: 6 Rep Power: 15	btw, the code is in C. Thanks.

February 17, 2011, 18:22		#4
Martin Hegedus Senior Member Martin Hegedus Join Date: Feb 2011 Posts: 500 Rep Power: 19	Is the code single precision or double precision? Are they the same executable or did you recompile the code on each machine? If you recompiled it, what level of optimization did you use? What type of solver is it, structured, unstructured, implicit, explicit? Is the solution steady or unsteady? If steady, did you converge it to machine zero? If unsteady, at what point do you see the difference build up? You mentioned MPI. Does this mean you are using multiple machines or just one machine with multiple cores? Also, how is your domain broken up? For example, I would expect an implicit structured chimera solver to converge differently depending on how the overall grid is parsed and passed out among the various cores.

February 21, 2011, 12:37		#6
Martin Hegedus Senior Member Martin Hegedus Join Date: Feb 2011 Posts: 500 Rep Power: 19	Assuming your code is double precision, both codes are doing exactly the same thing, and the case is steady state. 1) Euler should be run first for comparison, then laminar, then turbulent. 2) The results should be converged to machine zero. In general, the residual should be in the vicinity of 1.0e-15 and 1.0e-16. 3) Assuming you don't have some very fine cells, or cells with poor Jacobians, the differences in state variables (i.e. rho, rho-v, p, etc) between the two codes should be less than 1.0e-10, IMO. 4) Integrated load values, such as lift and drag, can be off by much more since they depend on integrating pressure differences.

February 22, 2011, 12:36		#8
Martin Hegedus Senior Member Martin Hegedus Join Date: Feb 2011 Posts: 500 Rep Power: 19	Yes I have, and the answer you are looking for depends on the case you are running, the solution methodology, the grid, and what you are comparing. A simple answer does not exist. You haven't given enough details to help address your question.

February 23, 2011, 21:51		#9
MichaelCFD New Member Join Date: Feb 2011 Posts: 6 Rep Power: 15	So did you find any difference in results from different machine? if any difference, what is the cause? My code is a typical cfd code: unsteady or steady, fv, ... But I do not think those factors should make such machine difference... Thanks.

February 23, 2011, 22:57		#10
Martin Hegedus Senior Member Martin Hegedus Join Date: Feb 2011 Posts: 500 Rep Power: 19	For an unsteady result, once the results diverge, even by an epsilon, they will continue to diverge. So, in that case, a difference of 1e-8 in a field value isn't anything special. However, my general experience is that field values for steady results converged to machine zero are within 1e-12 between amd and intel for solutions on high quality grids. I take notice if the results differ by more than 1e-10. My experience is that, in general, when this is the case, the probability is significant that there is a bug in the code. However, this is only true if the solution is independent of the number of cores solving the problem. For example, the probability is high that the solution during convergence for a steady problem from an implicit method is dependent on the number of cpu cores solving the problem. This difference should diminish as the solution converges to machine zero, assuming that the right hand side is independent of how the problem is broken up for the various cores. But, it is also important to take into account nonlineararities of the flow being analyzed. Epsilon changes in a shock or a vortex can cause noticeable differences in other areas of the flow field.