CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   ANSYS (https://www.cfd-online.com/Forums/ansys/)
-   -   FSI System Coupling HPC Cluster (SLURM) (https://www.cfd-online.com/Forums/ansys/235207-fsi-system-coupling-hpc-cluster-slurm.html)

JAsia April 5, 2021 18:30

FSI System Coupling HPC Cluster (SLURM)
 
I have been working to try to get a FSI simulation running in our universities HPC cluster. I have the coupling.sci file, mechanical and fluent data, .py script, and submission script. I have searched through the ANSYS forums, but don't seem to find any solutions that work. The ANSYS forums are also down for this week.

Here is the error file I am met with:

error file
Code:

Traceback (most recent call last):
  File "/datapool/usr/local/src/ansys/ansys-20.2/ansys_inc/v202/SystemCoupling/PyLib/main/Controller.py", line 143, in <module>
    _run(sys.argv)
  File "/datapool/usr/local/src/ansys/ansys-20.2/ansys_inc/v202/SystemCoupling/PyLib/main/Controller.py", line 139, in _run
    _executeScript(options)
  File "/datapool/usr/local/src/ansys/ansys-20.2/ansys_inc/v202/SystemCoupling/PyLib/main/Controller.py", line 88, in _executeScript
    kernel.commands.readScriptFile(scriptFile)
  File "PyLib/kernel/commands/__init__.py", line 31, in readScriptFile
  File "PyLib/kernel/commands/CommandManager.py", line 168, in readScriptFile
  File "run.in", line 19, in <module>
    ('PARTICIPANT-2', 1.0)])
  File "PyLib/kernel/commands/CommandDefinition.py", line 74, in func
  File "PyLib/kernel/commands/__init__.py", line 28, in executeCommand
  File "PyLib/kernel/commands/CommandManager.py", line 121, in executeCommand
  File "PyLib/cosimulation/externalinterface/core/partitioning.py", line 77, in execute
  File "PyLib/kernel/framework/ApplicationContext.py", line 30, in __getattr__
  File "PyLib/kernel/framework/ApplicationContext.py", line 50, in __madeComponent
  File "PyLib/kernel/framework/ApplicationContext.py", line 59, in <lambda>
  File "PyLib/kernel/framework/Annotate.py", line 50, in injected
  File "PyLib/cosimulation/partitioning/__init__.py", line 33, in __init__
  File "PyLib/cosimulation/partitioning/machinelist.py", line 66, in loadMachines
  File "PyLib/cosimulation/partitioning/machinelist.py", line 252, in _constructMachineListSLURM
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'



Here is the submission script:

batch script
Code:

#SBATCH -J "FSI"                              # Job name
#SBATCH -n 16                                      # Number of cores to use
#SBATCH -N 1                                        # Number of compude nodes to use
#SBATCH -e "%J.err"                                # SLURM error file name
#SBATCH -o "%J.out"                                # SLURM output file name
###SBATCH -t 0                                      # Time limit (0 indicates no time limit)
#SBATCH -p talon-part1                              # Talon partition to user for this job
#SBATCH --mem=120GB

RUN_FILE="run.in"                      # Ansys definition file

module load mech/ansys-20.2test
module load system/openmpi-4.0.4
module load system/mpich-3.4a2
###module load system/perl-5.22.1

#START_METHOD="IBM MPI Distributed Parallel"
START_METHOD="IBM MPI Local Parallel"

#function join_by { local IFS="$1"; shift; echo "$*"; }
function join_by { local d=$1; shift; echo -n "$1"; shift; printf "%s" "${@/#/$d}"; }

HOSTS=`scontrol show hostname $SLURM_JOB_NODELIST`
HOSTLIST="$(join_by "*$SLURM_CPUS_ON_NODE," $HOSTS)*$SLURM_CPUS_ON_NODE"

systemcoupling -R "$RUN_FILE" > FSI.out



and here is the .py script:

py script
Code:

ImportSystemCouplingInputFile(FilePath = 'coupling.sci')

execCon = DatamodelRoot().CouplingParticipant

execCon['Solution'].ExecutionControl.InitialInput = 'OscPlate.cas'

execCon['Solution'].ExecutionControl.WorkingDirectory = 'Fluent'

execCon['Solution 1'].ExecutionControl.InitialInput = 'mechanical.dat'

execCon['Solution 1'].ExecutionControl.WorkingDirectory = 'Mechanical'

execCon['Solution'].ExecutionControl.PrintState()

execCon['Solution 1'].ExecutionControl.PrintState()

PartitionParticipants(AlgorithmName = "SharedAllocateMachines",
        NamesAndFractions = [('PARTICIPANT-1', 1.0),
                            ('PARTICIPANT-2', 1.0)])

PrintSetup()

Solve()



I have tried to follow the tutorials and the user manual for system coupling, but I am lost at what else I can try.

AtoHM April 6, 2021 01:33

No experience with that kind of simulation on a cluster, but I would try to go the obvious way and see what that error message is about.
Its obviously a Python error, where it expects a value as string to convert to an integer and gets an object of type 'NoneType' instead. There are many reasons that get you a NoneType object so there is no obvious hint. Look what happens on the lines prior to line 252 in _constructMachineListSLURM, there is probably some error in there. Fails to get the machine id? Something like that?


But, I assume it is not your job to fix queue-Scripts. Get help from system administrators.

JAsia April 6, 2021 02:54

Quote:

Originally Posted by AtoHM (Post 800758)
No experience with that kind of simulation on a cluster, but I would try to go the obvious way and see what that error message is about.
Its obviously a Python error, where it expects a value as string to convert to an integer and gets an object of type 'NoneType' instead. There are many reasons that get you a NoneType object so there is no obvious hint. Look what happens on the lines prior to line 252 in _constructMachineListSLURM, there is probably some error in there. Fails to get the machine id? Something like that?


But, I assume it is not your job to fix queue-Scripts. Get help from system administrators.

Would this be in a file called _constructMachineListSLURM or in the machinelist.py file? I have actually looked in machinelist.py and it only goes up to about 180 lines.

AtoHM April 6, 2021 04:07

Then I guess _constructMachineListSLURM is a function that is imported from some place else. import statements are usually placed at the top of a file, maybe you can find where it comes from.

bluebase April 28, 2021 14:31

Naive question:


How intelligent is the CLI parser? Could it be the case that the commands in run.in have to be oneliners?


It's weird that this line is printed alone in the error log:
('PARTICIPANT-2', 1.0)])


All times are GMT -4. The time now is 15:00.