CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > REEF3D

Regarding error While Running simulation using Multicore in HPCE

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   October 7, 2023, 07:36
Default Regarding error While Running simulation using Multicore in HPCE
  #1
Member
 
Bhargav
Join Date: Aug 2022
Posts: 30
Rep Power: 4
Varada is on a distinguished road
Hello REEF3D team,

I am currently using REEF3D-23.03/REEF3D-release_candidate/ version and working on a wave structure interaction problem within a three-dimensional numerical wave tank. When I run the simulation on my personal computer, utilizing 12 cores, it runs without any issues. However, upon attempting to run the simulation on the High-Performance Computing Environment (HPCE), I encountered an error message, which I've provided below.

I also attempted to run the simulation on HPCE with 8 cores, and it proceeded smoothly without any errors. I have attached the relevant files for your review. Could you please examine the attached files and help me identify the problem? I have also tried the latest release candidate (RC), but I am still facing the same issue when using a higher number of cores.

Thank you for your assistance.

[Attach relevant files here]
Attached Files
File Type: txt control(1).txt (269 Bytes, 1 views)
File Type: txt ctrl(1).txt (1.5 KB, 1 views)
Varada is offline   Reply With Quote

Old   October 7, 2023, 11:27
Default
  #2
Member
 
Bhargav
Join Date: Aug 2022
Posts: 30
Rep Power: 4
Varada is on a distinguished road
Please review the error I mentioned earlier.

[cn249:17924] *** Process received signal ***
[cn249:17924] Signal: Segmentation fault (11)
[cn249:17924] Signal code: Address not mapped (1)
[cn249:17924] Failing at address: 0x357e6
[cn249:17924] [ 0] /lib64/libpthread.so.0(+0xf5d0)[0x2b16af07b5d0]
[cn249:17924] [ 1] /lfs/usrhome/phd/oe19d201/REEF3D-code/REEF3D-master-23.08/bin/REEF3D[0x8ec000]
[cn249:17924] [ 2] /lfs/usrhome/phd/oe19d201/REEF3D-code/REEF3D-master-23.08/bin/REEF3D[0x53f81f]
[cn249:17924] [ 3] /lfs/usrhome/phd/oe19d201/REEF3D-code/REEF3D-master-23.08/bin/REEF3D[0x5982c3]
[cn249:17924] [ 4] /lfs/usrhome/phd/oe19d201/REEF3D-code/REEF3D-master-23.08/bin/REEF3D[0x597ef5]
[cn249:17924] [ 5] /lfs/usrhome/phd/oe19d201/REEF3D-code/REEF3D-master-23.08/bin/REEF3D[0x85ce84]
[cn249:17924] [ 6] /lfs/usrhome/phd/oe19d201/REEF3D-code/REEF3D-master-23.08/bin/REEF3D[0x5b3abf]
[cn249:17924] [ 7] /lfs/usrhome/phd/oe19d201/REEF3D-code/REEF3D-master-23.08/bin/REEF3D[0x5b4711]
[cn249:17924] [ 8] /lfs/usrhome/phd/oe19d201/REEF3D-code/REEF3D-master-23.08/bin/REEF3D[0x41c2dd]
[cn249:17924] [ 9] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2b16af2aa3d5]
[cn249:17924] [10] /lfs/usrhome/phd/oe19d201/REEF3D-code/REEF3D-master-23.08/bin/REEF3D[0x41f55f]
[cn249:17924] *** End of error message ***
[cn249:17933] *** Process received signal ***
[cn249:17933] Signal: Segmentation fault (11)
[cn249:17933] Signal code: (128)
[cn249:17933] Failing at address: (nil)
[cn249:17933] [ 0] /lib64/libpthread.so.0(+0xf5d0)[0x2ab5b6f2e5d0]
[cn249:17933] [ 1] /lfs/usrhome/phd/oe19d201/REEF3D-code/REEF3D-master-23.08/bin/REEF3D[0x8ec0ec]
[cn249:17933] [ 2] /lfs/usrhome/phd/oe19d201/REEF3D-code/REEF3D-master-23.08/bin/REEF3D[0x53f691]
[cn249:17933] [ 3] /lfs/usrhome/phd/oe19d201/REEF3D-code/REEF3D-master-23.08/bin/REEF3D[0x5982c3]
[cn249:17933] [ 4] /lfs/usrhome/phd/oe19d201/REEF3D-code/REEF3D-master-23.08/bin/REEF3D[0x597ef5]
[cn249:17933] [ 5] /lfs/usrhome/phd/oe19d201/REEF3D-code/REEF3D-master-23.08/bin/REEF3D[0x85ce84]
[cn249:17933] [ 6] /lfs/usrhome/phd/oe19d201/REEF3D-code/REEF3D-master-23.08/bin/REEF3D[0x5b3abf]
[cn249:17933] [ 7] /lfs/usrhome/phd/oe19d201/REEF3D-code/REEF3D-master-23.08/bin/REEF3D[0x5b4711]
[cn249:17933] [ 8] /lfs/usrhome/phd/oe19d201/REEF3D-code/REEF3D-master-23.08/bin/REEF3D[0x41c2dd]
[cn249:17933] [ 9] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2ab5b715d3d5]
[cn249:17933] [10] /lfs/usrhome/phd/oe19d201/REEF3D-code/REEF3D-master-23.08/bin/REEF3D[0x41f55f]
[cn249:17933] *** End of error message ***
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 54 with PID 0 on node cn249 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
Varada is offline   Reply With Quote

Old   October 8, 2023, 03:59
Default
  #3
Senior Member
 
Arun Kamath
Join Date: Nov 2014
Location: Trondheim, Norway
Posts: 265
Rep Power: 13
kamath is on a distinguished road
My guess is that this is a memory allocation problem on the HPC. Check how much total RAM you are getting allocated on the HPC for your 80 cores. It probably needs more than that.
Do you get allocated 8 cores and 80 cores on the same node? Or do you get more nodes when you increase the number of cores to 80?
Generally memory is attached to the nodes. Each node generally has may be 16/32 cores depending on the machine.

But you say your are using the RC, while the path shows master. Are you sure about the version you are using?

Also dont need M 20 2 in ctrl.txt
__________________
Arun
X years with REEF3D
kamath is offline   Reply With Quote

Old   October 9, 2023, 07:47
Default
  #4
New Member
 
Keshav Pathak
Join Date: Jul 2022
Posts: 27
Rep Power: 4
keshav_20 is on a distinguished road
We faced the same problem with REEF3D RC and master version. The Job is run on 2 nodes with 40 cores each and 192 GB ram.
no. of nodes = 2, no. of cores = 40 x 2 = 80, Total RAM = 192 x 2 = 384 GB
keshav_20 is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
InterFoam based solver running into floating point error on restarting simulation Venky_94 OpenFOAM Running, Solving & CFD 9 November 23, 2021 17:53
Problem when running simulation pak_sargon CONVERGE 4 July 7, 2021 00:29
Running simulation with Design Parameters on HPC AS_Aero ANSYS 0 April 11, 2018 04:02
a transient cfx simulation suddenly stopped writing .out and then .bak while running mona.li CFX 1 March 5, 2018 05:15
How can I detect cavitation presence while the simulation is still running? Stabum CFX 5 May 18, 2015 19:38


All times are GMT -4. The time now is 19:34.