Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test Failure on MacOS -- SPARSE_HSS_mpi_21 #123

Open
TobiasDuswald opened this issue Sep 18, 2024 · 2 comments
Open

Test Failure on MacOS -- SPARSE_HSS_mpi_21 #123

TobiasDuswald opened this issue Sep 18, 2024 · 2 comments

Comments

@TobiasDuswald
Copy link

Dear Developers,

I tried installing STRUMPACK on MacOS 14.5, Xcode 15.4, M3 Pro (arm). I wanted to inform you that, on my system, test 119 SPARSE_HSS_mpi_21 fails; all other tests passed. I've tried releases v7.1.0, v7.1.4, and the latest master.

I installed the software and ran the tests with

mkdir build && mkdir install
cd build
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=../install -Dmetis_PREFIX=../../metis-5.1.0 ..
make -j && make install
make examples -j
ctest --output-on-failure

Below you may find the output of the test. If you need more information, please let me know.

119/133 Test #119: SPARSE_HSS_mpi_21 ................***Failed    0.85 sec
# Running with:
# OMP_NUM_THREADS=1 mpirun -n 2 /Users/tobiasduswald/Software/STRUMPACK/build/test/test_sparse_mpi utm300/utm300.mtx --sp_compression HSS --hss_leaf_size 4 --hss_rel_tol 1e-1 --hss_abs_tol 1e-10 --hss_d0 16 --hss_dd 8 --sp_reordering_method metis --sp_compression_min_sep_size 25 
# opening file 'utm300/utm300.mtx'
# %%MatrixMarket matrix coordinate real general
# reading 300 by 300 matrix with 3,155 nnz's from utm300/utm300.mtx
# Initializing STRUMPACK
# using 1 OpenMP thread(s)
# using 2 MPI processes
# matching job: maximum matching with row and column scaling
# matrix equilibration, r_cond = 1 , c_cond = 1 , type = N
# Matrix padded with zeros to get symmetric pattern.
# Number of nonzeros increased from 3,155 to 4,754.
# initial matrix:
#   - number of unknowns = 300
#   - number of nonzeros = 4,754
# nested dissection reordering:
#   - Metis reordering
#      - used METIS_NodeND (iso METIS_NodeNDP)
#      - supernodal tree was built from etree
#   - strategy parameter = 8
#   - number of separators = 153
#   - number of levels = 14
#   - nd time = 0.00114608
#   - matching time = 0.000522137
#   - symmetrization time = 8.4877e-05
# symbolic factorization:
#   - nr of dense Frontal matrices = 152
#   - nr of HSS Frontal matrices = 1
#   - symb-factor time = 0.000423908
#   - sep-reorder time = 0.000329971
# multifrontal factorization:
#   - estimated memory usage (exact solver) = 0.095376 MB
#   - minimum pivot, sqrt(eps)*|A|_1 = 1.05367e-08
#   - replacing of small pivots is not enabled
#   - factor time = 0.000829935
#   - factor nonzeros = 21,142
#   - factor memory = 0.169136 MB
#   - compression = hss
#   - factor memory/nonzeros = 177.336 % of multifrontal
#   - maximum HSS rank = 5
#   - HSS relative compression tolerance = 0.1
#   - HSS absolute compression tolerance = 1e-10
#   - normal(0,1) distribution with minstd_rand engine
GMRES it. 0	res =      95.4366	rel.res =            1	 restart!
GMRES it. 1	res =      3.85148	rel.res =    0.0403564
GMRES it. 2	res =      1.54211	rel.res =    0.0161585
GMRES it. 3	res =     0.060384	rel.res =  0.000632713
GMRES it. 4	res =    0.0215773	rel.res =   0.00022609
GMRES it. 5	res =  0.000985069	rel.res =  1.03217e-05
GMRES it. 6	res =  3.38379e-05	rel.res =  3.54559e-07
# DIRECT/GMRES solve:
#   - abs_tol = 1e-10, rel_tol = 1e-06, restart = 30, maxit = 5000
#   - number of Krylov iterations = 6
#   - solve time = 0.00135398
# COMPONENTWISE SCALED RESIDUAL = 0.00201743
# RELATIVE ERROR = 0.000996645
residual too large
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
  Proc: [[8847,1],0]
  Errorcode: 1

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
prterun has exited due to process rank 0 with PID 0 on node w203-1b-v4 calling
"abort". This may have caused other processes in the application to be
terminated by signals sent by prterun (as reported here).
--------------------------------------------------------------------------
@pghysels
Copy link
Owner

Thank you for reporting.
It looks like it's not really a bug, but a problem with the tolerances for this specific problem.
You might get different results when running again, because the metis ordering might change.

@TobiasDuswald
Copy link
Author

Alright, thanks for the info!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants