CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   STAR-CCM+ (https://www.cfd-online.com/Forums/star-ccm/)
-   -   Running on 4 node error MPI Errors[320798736] (https://www.cfd-online.com/Forums/star-ccm/225570-running-4-node-error-mpi-errors-320798736-a.html)

tailele April 1, 2020 03:34

Running on 4 node error MPI Errors[320798736]
 
Good morning boys , i have a question. Until this moment i used 3 nodes from 24 cores (v3) for node; and all goes well. Yesterday i buyed another node , in this case is a double xeon (v4). In this moment only one cpu function because the second is without cooling. As Always i started my system with the new node , but as i write in the object i return an error MPI.

The log is this:
SEVERE [star.base.neo.ClientNotifyHandler]: MPI Errors[320798736] : MPI_Waitall: Internal MPI error: Filename too long

MPI Errors[320798736] : MPI_Waitall: Internal MPI error: Filename too long

Now i try with some test:
Node 1 , 2 and 3 are the old nodes ; node 4 is the newone

Test 1:
Node 1,2,3,4 with only a core for node --test passed
Node 1,2,3,4 with 16 cores for node -- test passed
Node 1,2,4 with 24 core for node 1 and 2 and 16 cores for node 4 --test passed
Node 1,2,3 with 24 cores for nodes and node 4 with 16 cores -- test failed
Node 1,2,3 with 24 cores for nodes and node 4 with 1 cores -- test failed

Anyone have a idea?

bluebase April 1, 2020 11:49

Is your cluster running a windows system by chance?

tailele April 1, 2020 12:14

Quote:

Originally Posted by bluebase (Post 763847)
Is your cluster running a windows system by chance?

Yes, Windows 10....

tailele April 1, 2020 13:03

i chaged the mpi from ibm to intel. The error is Always present but isn't the same:

SEVERE [star.base.neo.ClientNotifyHandler]: Connection reset
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream. java:210)
at java.net.SocketInputStream.read(SocketInputStream. java:141)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.j ava:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.ja va:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:1 78)
at java.io.InputStreamReader.read(InputStreamReader.j ava:184)
at java.io.BufferedReader.fill(BufferedReader.java:16 1)
at java.io.BufferedReader.read(BufferedReader.java:18 2)
at star.base.neo.NeoProperty.readNextChar(NeoProperty .java:804)
at star.base.neo.NeoProperty.input(NeoProperty.java:5 83)
[catch] at star.base.neo.ClientNotifyHandler.run(ClientNotify Handler.java:427)
at java.lang.Thread.run(Thread.java:748)
WARNING [org.netbeans.core.multiview.MultiViewTopComponent]: The MultiviewDescription instance class star.coremodule.ui.SimulationMultiviewDesc is not serializable. Cannot persist TopComponent.
WARNING [null]: Last record repeated again.

tailele April 2, 2020 04:59

Now i add the second cpu , so mi situation is 4 nodes with two cpu for node:
Node 1: 2x e5 2658v3 (24 cores)
Node 2: 2x e5 2658v3 (24 cores)
Node 3: 2x e5 2658v3 (24 cores)
Node 4: 2x e5 2683v4 (32 cores)

Today i started a new session , with two tests:

-1 : 4 nodes with 24 cores for node ---passed
-2 : 3 nodes with 24 cores for nodes and node 4 with 32 cores --- failed

Is it possible that all node must have the same cores?
But how with 3 nodes i could use 2 node with 24 cores and one with 16 cores?


All times are GMT -4. The time now is 07:31.