std::length_error', when total nonzeros is higher than maximum of integer4, Fortran use of STRUMPACK_MPIdist #94

GuoqiMa · 2023-05-28T15:31:31Z

Hi, sorry to bother you again.

I would like to ask the problem about integer type. I didn't see any integer type in SRC/fortran. So, the default integer type is integer(4), right?

Now I need to use long integer, namely integer(8) and float complex,. I changed the integer type in my code, but I meet segmentation error like below:

Do I have to both compile and link with flag -i8 ?

Also, could you please explain the difference between float complex and float complex_64?

GuoqiMa · 2023-05-29T15:10:22Z

I found Integer 8 is not necessary, so I change back and use float complex even though my model is very large. I split it to different ranks so that integer4 is fine, but the total nonzeros are above the limit of integer4.

Now it seems I get stuck in this step

Initializing STRUMPACK
using 2 OpenMP thread(s)
using 20 MPI processes
matching job: maximum matching with row and column scaling
terminate called after throwing an instance of 'std::length_error'
what(): vector::_M_default_append

For a small model, there is no vector:: M_default_append, I looked for some information online, it should be related to C++.
I wonder if you know what is happening?

GuoqiMa · 2023-05-29T17:12:57Z

This seems unable to overcome, because STRUMPACK uses 32-bit indexing for BLAS, LAPACK and ScaLAPACK, However, total non-zeros is over 3.5billion, 32-bit integer can hold a maximum digit around 2 billion.

pghysels · 2023-05-29T21:03:33Z

I think the code is running out of memory in the column permutation phase. This uses the MC64 code, which is sequential. So the code needs to gather the whole input matrix to the root MPI process, then call the MC64 code there, and then broadcast the result. The column permutation is done to maximize the diagonal entries, but if your problem already is diagonally dominant (or has non-zero diagonal entries) then this step might not be necessary. You can try to disable it with

STRUMPACK_set_matching(S, STRUMPACK_MATCHING_NONE)

pghysels · 2023-05-29T21:09:48Z

If you want to use the 64bit interface, you need to specify:

STRUMPACK_FLOATCOMPLEX_64

or equivalent for the different floating point precisions.

There is a way to use 64 bit integers for the BLAS/LAPACK routines, but that should not be necessary.
The integer arguments to the BLAS/LAPACK routines are only referring to rows or columns (not the total elements), and this should be less than the 32 bit maximum. In case you really need it, the CMake option is STRUMPACK_USE_BLAS64. Then you need to link with a 64 bit BLAS/LAPACK, which you can specify through CMake via: -DBLA_SIZEOF_INTEGER=8, or -DBLA_VENDOR=Intel10_64ilp (see https://cmake.org/cmake/help/latest/module/FindBLAS.html)

GuoqiMa · 2023-05-30T04:19:04Z

Thanks for your reply.
I did try matching_none, but seems the similar problem still exist

Initializing STRUMPACK
using 2 OpenMP thread(s)
using 20 MPI processes
matching job: none
matrix equilibration, r_cond = 1 , c_cond = 1 , type = N
terminate called after throwing an instance of 'std::length_error'
what(): cannot create std::vector larger than max_size()

pghysels · 2023-05-30T05:54:26Z

Could this just be running out of memory?
It might help to run with fewer MPI ranks and fewer OpenMP threads. Then you leave some cores idle, and it will take longer, but it will take less memory.

Do you have an estimate for the required memory usage? Perhaps from some smaller runs you can extrapolate the memory usage to get an estimate.

GuoqiMa · 2023-05-30T08:21:34Z

Yes, I have estimated the memory, usually, the actual memory usage is above 1.6 times more than than the estimated value. Now I am testing it with enough memory.

In fact, I my previous tests, I deliberately run it under insufficient memory, but it will stop at later steps, not that step right after matrix equilibration. Anyhow I will test this. Thanks very much.

GuoqiMa · 2023-06-03T11:11:22Z

Hi, Pieter.

I have tried by creating a banded matrix by myself and seems it is not the problem of insufficient memory. I found that it when total nonzeros exceeds the maximum of integer4 (2,147,483,648), below error will imediately show up before the step intiate matrix, while under the maximum of integer4, code is successfull. So, does this mean I have to use in 64-bit STRUMPACK?

Initializing STRUMPACK
using 2 OpenMP thread(s)
using 20 MPI processes
matching job: none
matrix equilibration, r_cond = 1 , c_cond = 1 , type = N
rminate called after throwing an instance of 'std::length_error'
what(): cannot create std::vector larger than max_size()
rrtl: error (76): Abort trap signal

GuoqiMa · 2023-06-03T11:17:55Z

Or is it the Metis should be 64-bit? because I use metis for reordering

pghysels · 2023-06-03T20:04:11Z

You can try with 64bit METIS. STRUMPACK can use either 32 or 64 bit METIS.

GuoqiMa · 2023-06-03T20:16:10Z

For the step initializing matrix, does it reach matrix reordering? Now I should reconfiger with also 64 metis, besides what you mentioned, right

If you want to use the 64bit interface, you need to specify:
STRUMPACK_FLOATCOMPLEX_64
or equivalent for the different floating point precisions.

There is a way to use 64 bit integers for the BLAS/LAPACK routines, but that should not be necessary. The integer arguments to the BLAS/LAPACK routines are only referring to rows or columns (not the total elements), and this should be less than the 32 bit maximum. In case you really need it, the CMake option is STRUMPACK_USE_BLAS64. Then you need to link with a 64 bit BLAS/LAPACK, which you can specify through CMake via: -DBLA_SIZEOF_INTEGER=8, or -DBLA_VENDOR=Intel10_64ilp (see https://cmake.org/cmake/help/latest/module/FindBLAS.html)

pghysels · 2023-06-04T20:19:03Z

Yes, you can try METIS with 64 bit integers. STRUMPACK can use METIS with either 32 or 64 bit.

GuoqiMa · 2023-06-11T13:49:22Z

Hi, Pieter. I find each integer input of ''STRUMPACK_set_distributed_csr_matrix(S, c_loc(locN), c_loc(IA3), c_loc(JA3), c_loc(A2),c_loc(RowS),1)'' must be integer4 type, otherwise segmentation errror will happen. Maybe I think it is not the problem of BLAS/LAPACK as you said, just the problem of the data type of interfaces of STRUMPACK, In STRUMPACK, there is step that calculates total non-zeros, this is must integer8 for my case. So, I wonder if there is any solution for my problem?

So, I think you have to work out a 64-bit integer strumpack interface, like this STRUMPACK_set_distributed_csr_matrix, so that it can call BLAS/LAPACK or metis to work.

GuoqiMa · 2023-06-11T13:58:31Z

Hi, Pieter.

I have tried by creating a banded matrix by myself and seems it is not the problem of insufficient memory. I found that it when total nonzeros exceeds the maximum of integer4 (2,147,483,648), below error will imediately show up before the step intiate matrix, while under the maximum of integer4, code is successfull. So, does this mean I have to use in 64-bit STRUMPACK?

Initializing STRUMPACK using 2 OpenMP thread(s) using 20 MPI processes matching job: none matrix equilibration, r_cond = 1 , c_cond = 1 , type = N rminate called after throwing an instance of 'std::length_error' what(): cannot create std::vector larger than max_size() rrtl: error (76): Abort trap signal

Like I tested, the problem is in the initialization, in which 64-bit integer should be used. I guess initialization does not need BLAS/LAPACK, or METIS.

pghysels · 2023-06-11T18:02:40Z

For

STRUMPACK_set_distributed_csr_matrix

the arguments local_rows, row_ptr, col_ind, dist should be 32 or 64 bit signed integer pointers (depending on whether you use STRUMPACK_FLOATCOMPLEX or STRUMPACK_FLOATCOMPLEX_64).
values is a pointer to complex float values, the last argument symmetric_pattern is an integer (32 bit).

It could be that the total number of nonzeros is larger than the 32 bit limit, but locally on each rank the number of nonzeros fits in 32 bit. In that case, I think it will print out a negative number for the number of nonzeros, but it might still run correctly. I can try to rewrite that part of the code, so that it uses 32 bit integers everywhere except for printing the total nnz, but I need some time to do that carefully. But you can run with 64 bit integers everywhere. I don't see why that wouldn't work.

GuoqiMa · 2023-06-12T05:00:00Z

Thank you very much. Oh, I thought floatcomplex_64 is higher double complex precision. Now I know how to run it. I should use this floatcomplex_64.

GuoqiMa · 2023-06-12T05:04:34Z

Using floatcomplex_64 the error changed, I think the error new is just something else.

When under the 32-bit limit, it seems fine:

**_ Initializing STRUMPACK
using 2 OpenMP thread(s)
using 10 MPI processes
matching job: none
matrix equilibration, r_cond = 5.00001e-08 , c_cond = 0.0192308 , type = B
initial matrix:

number of unknowns = 20,000,000
number of nonzeros = 2,059,997,348
nested dissection reordering:
Natural reordering
strategy parameter = 8
number of separators = 1
number of levels = 1
nd time = 35.7113
symmetrization time = 9.53674e-07
symbolic factorization:
nr of dense Frontal matrices = 1
symb-factor time = 48.2309
multifrontal factorization:
estimated memory usage (exact solver) = 3.2e+09 MB
minimum pivot, sqrt(eps)*|A|_1 = 0.0251465
replacing of small pivots is not enabled
STRUMPACK: out of memory!_**

However, when above the 32-bit integer limit, the error is different than before, like this:

**_Initializing STRUMPACK
using 2 OpenMP thread(s)
using 10 MPI processes
matching job: none
matrix equilibration, r_cond = 1 , c_cond = 1 , type = N

BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
RANK 0 PID 753361 RUNNING AT cn-08-21
KILLED BY SIGNAL: 9 (Killed)_**

The difference is the matrix equilibration, from type B to type N, do you have any idea why this difference happens?

pghysels · 2023-06-12T22:01:05Z

That should be the same.
So it looks like the matrix isn't passed correctly.
Can you share some code I can look at?

GuoqiMa · 2023-06-13T06:55:17Z

Actually, I didn't change anything but the number of nonzeros. Below the 32-bit limit, it works, above the limit, it does not work. I do see there are some 'bus error' on some ranks.

call STRUMPACK_init(S,MPI_COMM_WORLD, STRUMPACK_FLOATCOMPLEX_64,  STRUMPACK_MPI_DIST, 0,  c_null_ptr, 1)
Call STRUMPACK_set_matching(S, STRUMPACK_MATCHING_NONE)
 call STRUMPACK_set_reordering_method(S, STRUMPACK_METIS)
locN=Iend-Istart
allocate(x(1:Nrows))
x(1:Nrows)=(0.d0,0.d0)
IA2(1:Nrows)=(IA2(1:Nrows)-IA2(1))!from 0
JA2(1:Nzs)=JA2(1:Nzs)-1!from 0
call STRUMPACK_set_distributed_csr_matrix(S, c_loc(locN),  c_loc(IA2), c_loc(JA2), c_loc(A2),c_loc(RowS),1)
ierr = STRUMPACK_solve(S, c_loc(B2), c_loc(x), 0);
B2(1:Nrows)=x(1:Nrows)

pghysels · 2023-06-13T21:33:01Z

The equilibration is based on the xgeequ routine from LAPACK:
https://netlib.org/lapack/explore-html/dd/d9a/group__double_g_ecomputational_ga56565ae06016954202aee24cdfc38257.html
The equilibration type, (R=row, C=column or B=both), is decided based on the matrix values (see r_cond and c_cond).
I would think that if row equilibration is required for the smaller problem it should also be required for the larger one.
But for your larger problem r_cond = 1 and c_cond = 1 seems suspicious. It makes me believe the matrix is not passed correctly. But I can't find anything wrong in the code.

GuoqiMa · 2023-06-14T07:21:33Z

Thank you. May I ask another question.

Recently, my HPC manager has reconfigured the STRUAMPACK. However, my old code can not get correct solutions anymore, even the log seems correct. But before, I have checked that the solutions are correct, for this installation now, solutions are all 0 all the time. I have no idea what happened to it. It is very wired. I asked my manager to reinstall it again and also try to ask if you have a clue.

Initializing STRUMPACK
using 2 OpenMP thread(s)
using 10 MPI processes
matching job: maximum matching with row and column scaling
matrix equilibration, r_cond = 1 , c_cond = 1 , type = N
Matrix padded with zeros to get symmetric pattern.
Number of nonzeros increased from 18,741,393 to 31,796,591.
initial matrix:

number of unknowns = 176,505
number of nonzeros = 31,796,591
nested dissection reordering:
Metis reordering
- used METIS_NodeND (iso METIS_NodeNDP)
- supernodal tree was built from etree
strategy parameter = 8
number of separators = 6,525
number of levels = 14
nd time = 9.78745
matching time = 14.2286
symmetrization time = 0.269281
symbolic factorization:
nr of dense Frontal matrices = 6,525
symb-factor time = 0.190588
multifrontal factorization:
estimated memory usage (exact solver) = 15869.5 MB
minimum pivot, sqrt(eps)*|A|_1 = 1.56934e-07
replacing of small pivots is not enabled
DenseMPI factorization complete, no GPU support, P=2, T=2: 3.46566 seconds, 760.986 GFLOPS, 219.579 GFLOP/s, ds=1867, du=6183
DenseMPI factorization complete, no GPU support, P=2, T=2: 3.5328 seconds, 862.828 GFLOPS, 244.233 GFLOP/s, ds=1986, du=6353
DenseMPI factorization complete, no GPU support, P=2, T=2: 3.26074 seconds, 955.25 GFLOPS, 292.955 GFLOP/s, ds=1840, du=7117
DenseMPI factorization complete, no GPU support, P=3, T=2: 5.00299 seconds, 1195.37 GFLOPS, 238.93 GFLOP/s, ds=1953, du=7751
DenseMPI factorization complete, no GPU support, P=3, T=2: 4.90208 seconds, 1304.77 GFLOPS, 266.167 GFLOP/s, ds=1992, du=8033
DenseMPI factorization complete, no GPU support, P=2, T=2: 4.97933 seconds, 1155.33 GFLOPS, 232.026 GFLOP/s, ds=1967, du=7565
DenseMPI factorization complete, no GPU support, P=5, T=2: 5.37258 seconds, 2644.16 GFLOPS, 492.157 GFLOP/s, ds=3495, du=7924
DenseMPI factorization complete, no GPU support, P=5, T=2: 5.65052 seconds, 2733.49 GFLOPS, 483.76 GFLOP/s, ds=3580, du=7924
DenseMPI factorization complete, no GPU support, P=10, T=2: 1.77755 seconds, 1326.33 GFLOPS, 746.157 GFLOP/s, ds=7924, du=0
factor time = 22.4018
factor nonzeros = 991,844,543
factor memory = 15869.5 MB
REFINEMENT it. 0 res = 2.27357e-11 rel.res = 1 bw.error = 1
DIRECT/GMRES solve:
abs_tol = 1e-10, rel_tol = 1e-06, restart = 30, maxit = 5000
number of Krylov iterations = 0
solve time = 0.00257802

GuoqiMa · 2023-06-14T08:43:36Z

And this is configuring file from my manager:

!#STRUMPACK 7.1.2
!#Introduction
Compiled on cn-09-32 by Sergio Martinez
Requested by Guoqi Ma Ma
https://portal.nersc.gov/project/sparse/strumpack/v7.1.0/installation.html

!###Prepare modules
module purge
module load mkl/2021.3
module load intel/2021.3-gcc-9.3
module load impi/2021.3
module load scotch/6.1.1
module load metis/5.1.0
module load parmetis/4.0.3

!### Prepare directories
ROOT_DIR=/apps/ku
COMPILER_DEP=intel-2021_3-gcc-9_3
MPI_DEP=impi-2021_3
APP_NAME=strumpack
APP_VERSION=7.1.2
APPS=${ROOT_DIR}/${COMPILER_DEP}/${MPI_DEP}/${APP_NAME}/${APP_VERSION}
BUILD=${ROOT_DIR}/build/${APP_NAME}/${APP_VERSION}

mkdir -p ${BUILD}
mkdir -p ${APPS}

!### Download and extract
cd $BUILD
wget https://github.com/pghysels/STRUMPACK/archive/refs/tags/v7.1.2.tar.gz
tar -xvf v7.1.2.tar.gz
cd STRUMPACK-7.1.2

!## Configure, Build and Install
mkdir -p build
cd build
cmake ../ -DCMAKE_BUILD_TYPE=Release
-DCMAKE_INSTALL_PREFIX=$APPS
-DCMAKE_CXX_COMPILER=mpiicpc
-DCMAKE_C_COMPILER=mpiicc
-DCMAKE_Fortran_COMPILER=mpiifort
-DTPL_SCALAPACK_LIBRARIES="${MKLROOT}/lib/intel64/libmkl_scalapack_lp64.a;-Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_cdft_core.a ${MKLROOT}/lib/intel64/libmkl_intel_lp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_lp64.a -Wl,--end-group;-liomp5;-lpthread;-lm;-ldl"
-DMETIS_INCLUDE_DIR=/apps/ku/intel-2021_3-gcc-9_3/metis/5.1.0/include
-DMETIS_LIBRARIES=/apps/ku/intel-2021_3-gcc-9_3/metis/5.1.0/lib/libmetis.a
-DSTRUMPACK_USE_PARMETIS=ON
-DTPL_PARMETIS_INCLUDE_DIRS=/apps/ku/intel-2021_3-gcc-9_3/impi-2021_3/parmetis/4.0.3/include
-DTPL_PARMETIS_LIBRARIES=/apps/ku/intel-2021_3-gcc-9_3/impi-2021_3/parmetis/4.0.3/lib/libparmetis.a
-DTPL_ENABLE_SCOTCH=ON
-DTPL_SCOTCH_INCLUDE_DIRS=/apps/ku/intel-2021_3-gcc-9_3/impi-2021_3/scotch/6.1.1/include
-DTPL_SCOTCH_LIBRARIES="/apps/ku/intel-2021_3-gcc-9_3/impi-2021_3/scotch/6.1.1/lib/libscotch.a;/apps/ku/intel-2021_3-gcc-9_3/impi-2021_3/scotch/6.1.1/lib/libscotcherr.a"
-DTPL_ENABLE_PTSCOTCH=ON
-DTPL_PTSCOTCH_INCLUDE_DIRS=/apps/ku/intel-2021_3-gcc-9_3/impi-2021_3/scotch/6.1.1/include
-DTPL_PTSCOTCH_LIBRARIES="/apps/ku/intel-2021_3-gcc-9_3/impi-2021_3/scotch/6.1.1/lib/libptscotch.a;/apps/ku/intel-2021_3-gcc-9_3/impi-2021_3/scotch/6.1.1/lib/libptscotcherr.a"

make
!#make test
!#https://portal.nersc.gov/project/sparse/strumpack/master/FAQ.html
!#All MPI tests will fail because Intel MPI doesn't recognize --oversubscribe option
make install

!## Modulefile
!### Location
/apps/ku/modulefiles/MPI/intel/2021.3-gcc-9.3/impi/2021.3/strumpack/7.1.2.lua
!### Content
local pkgName = myModuleName()
local pkgVersion = myModuleVersion()
local pkgNameVer = myModuleFullName()
local hierA = hierarchyA(pkgNameVer,2)
local mpiD = hierA[1]:gsub("/","-"):gsub("%.","")
local compilerD = hierA[2]:gsub("/","-"):gsub("%.","")
local base = pathJoin("/apps/ku", compilerD, mpiD, pkgNameVer)
whatis("Name: " ..pkgName)
whatis("Version: " .. pkgVersion)
whatis("Description: STRUMPACK is a software library providing linear algebra routines and linear system solvers for sparse and for dense rank-structured linear systems.")
whatis("URL: https://portal.nersc.gov/project/sparse/strumpack/index.html")

depends_on("mkl/2021.3")
depends_on("scotch/6.1.1")
depends_on("metis/5.1.0")
depends_on("parmetis/4.0.3")

prepend_path("CPATH",           pathJoin(base,"include"))
prepend_path("LD_LIBRARY_PATH", pathJoin(base,"lib"))
prepend_path("LIBRARY_PATH",    pathJoin(base,"lib"))

setenv("STRUMPACK_DIR",  base)

pghysels · 2023-06-15T17:05:33Z

I checked the fexample.f90, and that gives the correct results (see f5be648)
Do you have an MPI example, perhaps a modification of fexample.f90 that I can try?
Sorry, I could write this example myself, but I'm not very familiar with Fortran.

GuoqiMa changed the title ~~long integer type in Fortran use of STRUMPACK_MPIdist~~ error: ''vector::_M_default_append' when running large model in Fortran use of STRUMPACK_MPIdist May 29, 2023

GuoqiMa changed the title ~~error: ''vector::_M_default_append' when running large model in Fortran use of STRUMPACK_MPIdist~~ std::length_error', when total nonzeros is higher than maximum of integer4, Fortran use of STRUMPACK_MPIdist Jun 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

std::length_error', when total nonzeros is higher than maximum of integer4, Fortran use of STRUMPACK_MPIdist #94

std::length_error', when total nonzeros is higher than maximum of integer4, Fortran use of STRUMPACK_MPIdist #94

GuoqiMa commented May 28, 2023 •

edited

Loading

GuoqiMa commented May 29, 2023 •

edited

Loading

GuoqiMa commented May 29, 2023

pghysels commented May 29, 2023

pghysels commented May 29, 2023

GuoqiMa commented May 30, 2023

pghysels commented May 30, 2023

GuoqiMa commented May 30, 2023

GuoqiMa commented Jun 3, 2023 •

edited

Loading

GuoqiMa commented Jun 3, 2023

pghysels commented Jun 3, 2023

GuoqiMa commented Jun 3, 2023

pghysels commented Jun 4, 2023

GuoqiMa commented Jun 11, 2023 •

edited

Loading

GuoqiMa commented Jun 11, 2023

pghysels commented Jun 11, 2023

GuoqiMa commented Jun 12, 2023

GuoqiMa commented Jun 12, 2023 •

edited

Loading

pghysels commented Jun 12, 2023

GuoqiMa commented Jun 13, 2023

pghysels commented Jun 13, 2023

GuoqiMa commented Jun 14, 2023 •

edited

Loading

GuoqiMa commented Jun 14, 2023 •

edited

Loading

pghysels commented Jun 15, 2023

std::length_error', when total nonzeros is higher than maximum of integer4, Fortran use of STRUMPACK_MPIdist #94

std::length_error', when total nonzeros is higher than maximum of integer4, Fortran use of STRUMPACK_MPIdist #94

Comments

GuoqiMa commented May 28, 2023 • edited Loading

GuoqiMa commented May 29, 2023 • edited Loading

GuoqiMa commented May 29, 2023

pghysels commented May 29, 2023

pghysels commented May 29, 2023

GuoqiMa commented May 30, 2023

pghysels commented May 30, 2023

GuoqiMa commented May 30, 2023

GuoqiMa commented Jun 3, 2023 • edited Loading

GuoqiMa commented Jun 3, 2023

pghysels commented Jun 3, 2023

GuoqiMa commented Jun 3, 2023

pghysels commented Jun 4, 2023

GuoqiMa commented Jun 11, 2023 • edited Loading

GuoqiMa commented Jun 11, 2023

pghysels commented Jun 11, 2023

GuoqiMa commented Jun 12, 2023

GuoqiMa commented Jun 12, 2023 • edited Loading

pghysels commented Jun 12, 2023

GuoqiMa commented Jun 13, 2023

pghysels commented Jun 13, 2023

GuoqiMa commented Jun 14, 2023 • edited Loading

GuoqiMa commented Jun 14, 2023 • edited Loading

pghysels commented Jun 15, 2023

GuoqiMa commented May 28, 2023 •

edited

Loading

GuoqiMa commented May 29, 2023 •

edited

Loading

GuoqiMa commented Jun 3, 2023 •

edited

Loading

GuoqiMa commented Jun 11, 2023 •

edited

Loading

GuoqiMa commented Jun 12, 2023 •

edited

Loading

GuoqiMa commented Jun 14, 2023 •

edited

Loading

GuoqiMa commented Jun 14, 2023 •

edited

Loading