mpt_connect: error on linux amd64
We installed fluent 6.2.6 on six computers (Athlon X2 4400 dual core running kubuntu linux with a custom 2.6.14.3 kernel) and it runs fine on a single pc. We tried to run a parallel process today on two nodes but it never started. All hosts have password-less access to other hosts using ssh, there is a symlink that connects /usr/bin/rsh and /usr/bin/ssh, the working directory is an nfs export mounted on all nodes, all nodes have the same username (fluent :)) and there are no lan problems.
When I type fluent 3d -pnet it starts just fine, then I click on parallel>network>configure... , kill the running node (localhost) and select two hosts and insert a number (2 for example) in "spawn count", then I click on "spawn"; here is the output: Kill script file is /home/fluent/kill-fluent-fl01-10798 Host spawning Node 0 on machine "fl02" (unix). Starting exec /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16 net node -mport 127.0.0.1:192.168.1.202:32791:0 0: mpt_connect: error: connect failed: Connection refused [bt] Execution path: [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(Process_Stackframe+0x17) [0xb37207] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(mpt_error+0x109) [0xaff2a9] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(mpt_connect+0x71) [0xb031b1] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(mpt_establish_connection+0x9c) [0xb046dc] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16 [0xb08654] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(MPT_Add_Me+0x10c) [0xb09f1c] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(main+0x8f) [0xa208df] [bt] /lib/libc.so.6(__libc_start_main+0xdb) [0x2aaaaae6547b] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(system+0x52) [0x455caa] 0: mpt_establish_connection: error: unable to connect: Illegal seek [bt] Execution path: [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(Process_Stackframe+0x17) [0xb37207] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(mpt_error+0x109) [0xaff2a9] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(mpt_establish_connection+0xb4) [0xb046f4] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16 [0xb08654] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(MPT_Add_Me+0x10c) [0xb09f1c] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(main+0x8f) [0xa208df] [bt] /lib/libc.so.6(__libc_start_main+0xdb) [0x2aaaaae6547b] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(system+0x52) [0x455caa] 0: mpt_connect: error: connect failed: Connection refused [bt] Execution path: [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(Process_Stackframe+0x17) [0xb37207] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(mpt_error+0x109) [0xaff2a9] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(mpt_connect+0x71) [0xb031b1] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(mpt_establish_connection+0x9c) [0xb046dc] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16 [0xb08654] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(MPT_Add_Me+0x10c) [0xb09f1c] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(main+0x8f) [0xa208df] [bt] /lib/libc.so.6(__libc_start_main+0xdb) [0x2aaaaae6547b] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(system+0x52) [0x455caa] 0: mpt_establish_connection: error: unable to connect: Connection refused [bt] Execution path: [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(Process_Stackframe+0x17) [0xb37207] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(mpt_error+0x109) [0xaff2a9] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(mpt_establish_connection+0xb4) [0xb046f4] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16 [0xb08654] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(MPT_Add_Me+0x10c) [0xb09f1c] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(main+0x8f) [0xa208df] [bt] /lib/libc.so.6(__libc_start_main+0xdb) [0x2aaaaae6547b] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(system+0x52) [0x455caa] 0: mpt_connect: error: connect failed: Connection refused [bt] Execution path: [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(Process_Stackframe+0x17) [0xb37207] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(mpt_error+0x109) [0xaff2a9] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(mpt_connect+0x71) [0xb031b1] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(mpt_establish_connection+0x9c) [0xb046dc] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16 [0xb08654] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(MPT_Add_Me+0x10c) [0xb09f1c] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(main+0x8f) [0xa208df] [bt] /lib/libc.so.6(__libc_start_main+0xdb) [0x2aaaaae6547b] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(system+0x52) [0x455caa] 0: mpt_establish_connection: error: unable to connect: Connection refused [bt] Execution path: [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(Process_Stackframe+0x17) [0xb37207] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(mpt_error+0x109) [0xaff2a9] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(mpt_establish_connection+0xb4) [0xb046f4] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16 [0xb08654] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(MPT_Add_Me+0x10c) [0xb09f1c] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(main+0x8f) [0xa208df] [bt] /lib/libc.so.6(__libc_start_main+0xdb) [0x2aaaaae6547b] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(system+0x52) [0x455caa] 0: mpt_connect_to_server: error: cannot establish connection; bye.: Connection refused [bt] Execution path: [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(Process_Stackframe+0x17) [0xb37207] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(mpt_error+0x109) [0xaff2a9] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16 [0xb08731] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(MPT_Add_Me+0x10c) [0xb09f1c] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(main+0x8f) [0xa208df] [bt] /lib/libc.so.6(__libc_start_main+0xdb) [0x2aaaaae6547b] [bt] /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_node/fluent_net.6.2.16(system+0x52) [0x455caa] Then it hangs waiting for a kill. Starting fluent typing fluent 3d -t2 -cln=.fluent.hosts fluent starts running but never starts processing here is the output: Loading "/home/fluent/Fluent.Inc/fluent6.2.16/lib/fluent.dmp.114-64" Done. Starting /home/fluent/Fluent.Inc/fluent6.2.16/lnamd64/3d_host/fluent.6.2.16 host -cx localhost.localdomain:32797:32798 "(list (rpsetvar (QUOTE parallel/function) "fluent 3d -pdefault -node -alnamd64 -t2 -cnf=.fluent.hosts ") (rpsetvar (QUOTE parallel/rhost) "") (rpsetvar (QUOTE parallel/ruser) "") (rpsetvar (QUOTE parallel/nprocs_string) "2") (rpsetvar (QUOTE parallel/auto-spawn?) #t) (rpsetvar (QUOTE parallel/trace-level) 0) (rpsetvar (QUOTE parallel/remote-shell) 0) (rpsetvar (QUOTE parallel/path) "/home/fluent/Fluent.Inc") (rpsetvar (QUOTE parallel/hostsfile) ".fluent.hosts"))" Welcome to Fluent 6.2.16 Copyright 2005 Fluent Inc. All Rights Reserved Loading "/home/fluent/Fluent.Inc/fluent6.2.16/lib/flprim.dmp.1119-64" Done. Host spawning Node 0 on machine "fl01" (unix). Starting /home/fluent/Fluent.Inc/fluent6.2.16/multiport/lnamd64/nmpi/bin/mpirun then nothing happens (forever) |
All times are GMT -4. The time now is 06:20. |