Different results from AMD and Intel machine
Hi,
I hope anyone can help me out here: I got slightly different numerical results from AMD core machine and Intel core machine by using exactly the same code (MPI parallel). For serial running the difference is much smaller and only on some single cells for about 8th or 9th decimal digits, but still there. Debugging seems no problem or not get the root cause yet. Anyone here had such experience or any idea? Thanks. Michael 
btw, the code is in C. Thanks.

Hi Michael,
It is not uncommon to not have exactly the same results on different architectures/OS. For exmaple, the following looks identical, but may be treated slightly differently: double c = 0; double c = 0.d0; I also assume that there is no random number generator in the code. The main question is: Does the differences affect the results at convergence significantly? Regards, Julien 
Is the code single precision or double precision?
Are they the same executable or did you recompile the code on each machine? If you recompiled it, what level of optimization did you use? What type of solver is it, structured, unstructured, implicit, explicit? Is the solution steady or unsteady? If steady, did you converge it to machine zero? If unsteady, at what point do you see the difference build up? You mentioned MPI. Does this mean you are using multiple machines or just one machine with multiple cores? Also, how is your domain broken up? For example, I would expect an implicit structured chimera solver to converge differently depending on how the overall grid is parsed and passed out among the various cores. 
different using the same executable
Thanks for all your responses and helps.
I use the same executable compiled on a Intel machine. No random number. Also I tested to output (printf) the following as suggested: double a=0; double a=0.d0; The output is the same from AMD and Intel for many decimal digits. The convergence is fine for both. and parallel part doing the exactly the same thing. The problem is that even for serial running, there are digit difference fr om the two machines. I guess if any one had tested code like this. Maybe I should grep another code to test... 
Assuming your code is double precision, both codes are doing exactly the same thing, and the case is steady state.
1) Euler should be run first for comparison, then laminar, then turbulent. 2) The results should be converged to machine zero. In general, the residual should be in the vicinity of 1.0e15 and 1.0e16. 3) Assuming you don't have some very fine cells, or cells with poor Jacobians, the differences in state variables (i.e. rho, rhov, p, etc) between the two codes should be less than 1.0e10, IMO. 4) Integrated load values, such as lift and drag, can be off by much more since they depend on integrating pressure differences. 
I use the same executable for AMD and Intel machines
Initially the difference is very small, like 1.e14, but as iterations go on, it becomes significantly large... Did anyone here test your own code on different machines? Thanks.

Yes I have, and the answer you are looking for depends on the case you are running, the solution methodology, the grid, and what you are comparing. A simple answer does not exist.
You haven't given enough details to help address your question. 
So did you find any difference in results from different machine? if any difference, what is the cause? My code is a typical cfd code: unsteady or steady, fv, ... But I do not think those factors should make such machine difference... Thanks.

For an unsteady result, once the results diverge, even by an epsilon, they will continue to diverge. So, in that case, a difference of 1e8 in a field value isn't anything special.
However, my general experience is that field values for steady results converged to machine zero are within 1e12 between amd and intel for solutions on high quality grids. I take notice if the results differ by more than 1e10. My experience is that, in general, when this is the case, the probability is significant that there is a bug in the code. However, this is only true if the solution is independent of the number of cores solving the problem. For example, the probability is high that the solution during convergence for a steady problem from an implicit method is dependent on the number of cpu cores solving the problem. This difference should diminish as the solution converges to machine zero, assuming that the right hand side is independent of how the problem is broken up for the various cores. But, it is also important to take into account nonlineararities of the flow being analyzed. Epsilon changes in a shock or a vortex can cause noticeable differences in other areas of the flow field. 
All times are GMT 4. The time now is 07:19. 