CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   CONVERGE (https://www.cfd-online.com/Forums/converge/)
-   -   ERROR: Safe Malloc has detected a null pointer (https://www.cfd-online.com/Forums/converge/234460-error-safe-malloc-has-detected-null-pointer.html)

ltl March 8, 2021 06:01

ERROR: Safe Malloc has detected a null pointer
 
Hello everyone:

I'm meeting this error as follows:

"ERROR: Safe Malloc has detected a null pointer, pointer_size is 56327176
Please ensure that the system has enough memory to run this program. "

But the system has enough memory, so what's the problem?

Thanks for your reading and It's my pleasure to get your response!

jetcheve March 8, 2021 13:27

Quote:

Originally Posted by ltl (Post 798195)
Hello everyone:

I'm meeting this error as follows:

"ERROR: Safe Malloc has detected a null pointer, pointer_size is 56327176
Please ensure that the system has enough memory to run this program. "

But the system has enough memory, so what's the problem?

Thanks for your reading and It's my pleasure to get your response!

Good afternoon.

I would still expect this is a memory issue. Can you post more from the end of the log file? Have you checked your memory_usage.out file? This may or may not tell us what's going on with memory around the time of your crash. Can you tell me how much memory you have? Do you have anything occurring around the time of the crash (i.e.: embedding or amr turning on, valves or ports opening, combustion, etc.)? Do you have anything else running in the background?

Thanks!

- John

ltl March 9, 2021 01:38

Quote:

Originally Posted by jetcheve (Post 798233)
Good afternoon.

I would still expect this is a memory issue. Can you post more from the end of the log file? Have you checked your memory_usage.out file? This may or may not tell us what's going on with memory around the time of your crash. Can you tell me how much memory you have? Do you have anything occurring around the time of the crash (i.e.: embedding or amr turning on, valves or ports opening, combustion, etc.)? Do you have anything else running in the background?

Thanks!

- John

Hello John:

Thanks for your reply.

1. The message from the end of the log file will be shown at last.
2. I have checked the mem_usage.out file, but I didn't find some information useful.
3. I don't know the concrete memory, but it's enough to support the subsequent calculations.
4. The embedding mode is PERMANENT, something else didn't occur.
5. Calculations were carried by large clusters server, so there are many other nodes computing different tasks.

Message from the end of the log file:

JOB ABORT invoked by rank 0:
ERROR: Safe Malloc has detected a null pointer, pointer_size is 56327176
Please ensure that the system has enough memory
to run this program.
...
...
...(some messages like before)
[cli_3]: [cli_10]: [cli_17]: [cli_23]: [cli_26]: [cli_0]: [cli_2]: [cli_4]: [cli_5]: [cli_6]: [cli_7]: [cli_8]: [cli_9]: [cli_11]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 11
[cli_12]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 12
...
...
...(some messages like before)
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 13
yhrun: error: cn395: tasks 0-6,10-14,17-18,25,27: Exited with exit code 1
yhrun: error: cn395: tasks 8-9,15-16,19-24,26: Exited with exit code 1
yhrun: error: cn395: task 7: Exited with exit code 1

jetcheve March 9, 2021 08:09

Quote:

Originally Posted by ltl (Post 798297)
Hello John:

Thanks for your reply.

1. The message from the end of the log file will be shown at last.
2. I have checked the mem_usage.out file, but I didn't find some information useful.
3. I don't know the concrete memory, but it's enough to support the subsequent calculations.
4. The embedding mode is PERMANENT, something else didn't occur.
5. Calculations were carried by large clusters server, so there are many other nodes computing different tasks.

Message from the end of the log file:

JOB ABORT invoked by rank 0:
ERROR: Safe Malloc has detected a null pointer, pointer_size is 56327176
Please ensure that the system has enough memory
to run this program.
...
...
...(some messages like before)
[cli_3]: [cli_10]: [cli_17]: [cli_23]: [cli_26]: [cli_0]: [cli_2]: [cli_4]: [cli_5]: [cli_6]: [cli_7]: [cli_8]: [cli_9]: [cli_11]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 11
[cli_12]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 12
...
...
...(some messages like before)
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 13
yhrun: error: cn395: tasks 0-6,10-14,17-18,25,27: Exited with exit code 1
yhrun: error: cn395: tasks 8-9,15-16,19-24,26: Exited with exit code 1
yhrun: error: cn395: task 7: Exited with exit code 1


Good morning -

Can you tell me more about your simulation? What version of CONVERGE are you using? What are you simulating? Are you using AMR as well as embedding? What is the cell count? When does this error occur: upon startup, near an event, when a file is being written, etc.? In addition, I'm just referring to the nodes you are running on when asking about background processes. If there is something using significant additional memory, that could be a problem.

Also, the memory usage file should give information on the total memory being used and the amount used for each processor just before the crash in the last line. Can you check for that and see if anything stands out? High total memory or a spike in memory usage on one of the ranks, possibly? How many cores are you using?

Finally, it would be good to know how much memory is available on each node.

Thanks,

- John

ltl April 6, 2021 20:44

Quote:

Originally Posted by jetcheve (Post 798330)
Good morning -

Can you tell me more about your simulation? What version of CONVERGE are you using? What are you simulating? Are you using AMR as well as embedding? What is the cell count? When does this error occur: upon startup, near an event, when a file is being written, etc.? In addition, I'm just referring to the nodes you are running on when asking about background processes. If there is something using significant additional memory, that could be a problem.

Also, the memory usage file should give information on the total memory being used and the amount used for each processor just before the crash in the last line. Can you check for that and see if anything stands out? High total memory or a spike in memory usage on one of the ranks, possibly? How many cores are you using?

Finally, it would be good to know how much memory is available on each node.

Thanks,

- John

Dear John,

Sorry to reply to you now.

This problem has been solved, mainly due to the poor mesh quality caused by a certain part of the cylinder. And it then occures the "Segmentation fault" error, this problem was processed by changing the "Tolerance" to a suitable value.

Thank you for your previous help.

-LTL

jetcheve April 7, 2021 14:00

Quote:

Originally Posted by ltl (Post 800827)
Dear John,

Sorry to reply to you now.

This problem has been solved, mainly due to the poor mesh quality caused by a certain part of the cylinder. And it then occures the "Segmentation fault" error, this problem was processed by changing the "Tolerance" to a suitable value.

Thank you for your previous help.

-LTL

Thanks for letting me know! Sorry I wasn't of more help on this one.

- John


All times are GMT -4. The time now is 06:13.