-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AOCL 2.2 changes - Majorly include LAPACK 3.9.0 support #36
Open
rsanagap
wants to merge
1,017
commits into
flame:master
Choose a base branch
from
amd:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Total memory requirement for large matrices exceed the range of 32-bit integer. Changed the type of argument to local memory allocation routine to size_t to accomodate large sizes. Added stdc++ flag to LDFLAGS for including C++ based aocl-utils library in the build. Signed off by Vasanthakumar R <[email protected]> Change-Id: I1e521c373acc89cc6a32e17f55ae1f30707ba196
Updated README to use name AOCL-LAPACK for AMD optimized version of libflame Change-Id: I7d4b88a18dafc60dfb39e0354bf06bbc0cdd9082
Change-Id: Ia90f01c4f53e88ff94c107717a4d4127e76a29a1
Updated README to use name AOCL-LAPACK for AMD optimized version of libflame Change-Id: I7d4b88a18dafc60dfb39e0354bf06bbc0cdd9082
When running on Linux, FLA_Apply_pivots_unb_external.c was using an implicitly defined function for memory allocation. Correct this code to use malloc. AMD-Internal: [CPUPL-3412] Change-Id: I8d17aaad691df243633566f361f16f27389d1ed8
Fixes compiler warnings generated due to recent checkin-ins Signed-off-by: Jintu Das <[email protected]> Change-Id: Iad565a6816e5a5b5bdf94b81944cffb70563283c
Incorrect macro used. replaced AOCL_DTL_TRACE. Replaced AOCL_DTL_TRACE_LOG_EXIT with AOCL_DTL_TRACE_EXIT_INDENT Signed-off-by: bsinghpa<[email protected]> AMD-Internal: CPUPL-3439 Change-Id: Ibc121d5a741ff2ef606220047417b032203819ae
Incorrect macro used. replaced AOCL_DTL_TRACE. Replaced AOCL_DTL_TRACE_LOG_EXIT with AOCL_DTL_TRACE_EXIT_INDENT Signed-off-by: bsinghpa<[email protected]> AMD-Internal: CPUPL-3439 Change-Id: Ibc121d5a741ff2ef606220047417b032203819ae
Change-Id: I4e13db9123208c9eb7a2fe533ee2e5c9421fd2f8
AMD-Internal: CPUPL-3441 Change-Id: I08d9faa1f1768e73cee3e820b2438742dee25a90
In the configure script, add clang compiler flags for debugging, profiling and specifying the C language standard. Other changes: 1. Update C language standard from C99 to C11 for gcc, clang and icc compilers. Other compilers are not relevant, so leave as is. 2. Moved all warning flags to warning flags section. Note: warnings are enabled by default. AMD-Internal: [CPUPL-3408] Change-Id: Iec6f9c40a6c7e0e8091484e3f9ee041feb62f631
AMD-Internal: CPUPL-3441 Change-Id: I08d9faa1f1768e73cee3e820b2438742dee25a90
Change-Id: Ic261233289c701dce800f2fc384253847e805c28
AMD-Internal: CPUPL-3470 Change-Id: I016df622e3ae2be349683a1b5b60d6afe8a747ad
AMD-Internal: CPUPL-3470 Change-Id: I016df622e3ae2be349683a1b5b60d6afe8a747ad
Enabled ability to pass libaoclutils repository path and tag as a parameter to build tools of configure and CMake Change-Id: I2b0309249d68943893be0f0444d6c85187567f04
Changed the default netlib-test version from LAPACK 3.10.0 to LAPACK 3.11 Signed-off-by: Jintu Das <[email protected]> Change-Id: I8e7d1f0cd8721fe424e12d5a5476bafd13f5f4fc
- Pass --imatrix parameter with values 'I' for infinity and 'N' for NaN in command line execution AMD-Internal: CPUPL-3170 Change-Id: I6db4d87023fa790d59065ed3a3228cd496135fbc
Added stdc++ dependency to libflame shared library. This is to refelct stdc++ as one of the dependencies to use libFLAME library Signed-off-by: Vasanthakumar R <[email protected] AMD-Internal: CPUPL-2129 Change-Id: I1742876fcedb7e13729e5ce6abe73c8125d6d8fb
Added stdc++ dependency to libflame shared library. This is to refelct stdc++ as one of the dependencies to use libFLAME library Signed-off-by: Vasanthakumar R <[email protected] AMD-Internal: CPUPL-2129 Change-Id: I1742876fcedb7e13729e5ce6abe73c8125d6d8fb
To match netlib LAPACK behaviour, ensure diagonal is real, even when exiting with error for matrix being non-positive-definite or diagonal being NaN. AMD-Internal: [CPUPL-3530] Change-Id: Ic82452517e13c1222ab1aacd1dc76d9c2a26d6d2
To match netlib LAPACK {c,z}lauu2.f behaviour, ensure diagonal is real for elements 1..N-1 in libFLAME equivalent function FLA_Ttmm. AMD-Internal: [CPUPL-3531] Change-Id: I3472db9951d38e0b0adfa805ee514b396c19ae5b
Adding Log and trace with timming to double complex precision APIs (files starting with z) Minor bug fix patch to remove ipiv, jpiv and jpvt Functions: zcgesv Signed-off-by: bsinghpa<[email protected]> AMD-Internal: CPUPL-2422 Change-Id: If3c864efa644a6390470fc948cb149e7ca3cc166
Adding Log and trace with timing to double complex precision APIs (files starting with z) Functions:zhetri_3x,zhetri_rook,zhetrs,zhetrs2, zhetrs_3,zhetrs_aa,zhetrs_aa_2stage, zhetrs_rook,zhfrk,zhpcon,zhpev,zhpevd, zhpevx,zhpgst,zhpgv,zhpgvd,zhpgvx,zhprfs,zhpsv,zhpsvx.c Signed-off-by: bsinghpa<[email protected]> AMD-Internal: CPUPL-2422 Change-Id: I32cef9ba1884656a6be8372f53b8f886a2d5ab5e
1. Added BLAS free implementation of zgetrf for smaller matrix sizes. (For sizes between 1 to 22) 2. Changed the path to lapack implementation of zgetrf for matrix size between 22 to 50. Signed-off-by: Sridhar Govindaswamy <[email protected]> Change-Id: Icdc07e0905e3bb31a194467aefea9b4b78cc1f27
Modified DLARF, DORG2R and DORGQR API for smaller size inputs. Signed-off-by: Jintu Das <[email protected]> AMD-Internal: CPUPL-2601 Change-Id: I468ca2785a1bf94c0eea1a0045119be6f4aeee89
Modified the non-zero verification of pivot element and changed the logic for division by pivot element to fix the netlib failures observed with zgetrf small path implementation. Signed-off-by: Vasanthakumar R <[email protected] Change-Id: Ieb3150ad381453159e65b544a8c386479df2d8e6
Enabling the original small path optimized code only for Windows build. The previous version with fix for Linux increased the total errors in Windows. Fix for compilation error without AMD optimizations included. TODO: Analyze and fix the difference in Netlib test results between Windows and Linux. Signed-off-by: Vasanthakumar R <[email protected] Change-Id: I6ac0c420f98879d7c4a0f647f6fcb4837968fbe2
Adding Log and trace with timing to double complex precision APIs (files starting with z) Functions: ztpqrt2,ztprfb,ztprfs,ztptri,ztptrs, ztpttf,ztpttr,ztrcon,ztrevc,ztrevc3, ztrexc,ztrrfs,ztrsen,ztrsna,ztrsyl, ztrtrs,ztrttf,ztrttp,ztzrqf,ztzrzf,zunbdb, Signed-off-by: bsinghpa<[email protected]> AMD-Internal: CPUPL-2422 Change-Id: Ieffdff2b8733c5033266b1d21d100fa5f7896c87
Adding Log and trace with timing to double complex precision APIs (files starting with z) Functions: ztbtrs,ztfsm,ztftri,ztfttp,ztfttr, ztgevc,ztgex2,ztgexc,ztgsen,ztgsja, ztgsna,ztgsy2,ztgsyl,ztpcon,ztplqt, ztplqt2,ztpmlqt,ztpmqrt,ztpqrt Signed-off-by: bsinghpa<[email protected]> AMD-Internal: CPUPL-2422 Change-Id: I16c5078425c32af907d85dcba1a57b002e31dfc6
Implicit function declarations giving rise to errors in AOCC4.2 Build 140. Updated test-suite to remove those errors. Signed-off-by: Vasanthakumar R <[email protected]> AMD-Internal: CPUPL-3969 Change-Id: I1b5c2e0f59f8f5cc4927b22c7c4ac74502cdb794
1.Added avx-512 implementation of ZGETRF 2.Updated avx2 implementation of ZGETRF to improve performance 3.Additional changes done to remove declaration warnings. Jira ticket : CPUPL-3611 Signed-off-by: Sridhar Govindaswamy <[email protected]> Change-Id: I0e604d961ca18804766ebf655daf2c6dfa616d8b
Optimization of DGESVD path 10 and 1t for small sizes. Reused optimized bidiag code from path 6T optimization. Signed-off-by: Vasanthakumar R <[email protected]> AMD-Internal: CPUPL-3251 Change-Id: Ib70822a5f89871a6d749abc4b1039d6e44d45f8f
HSEQR API was failing in main test suite for cases where not all the Eigen values have converge. When such a condition arises, the code should not check for negative test case failires. Added a condition to avoid negative test check when such a case occurs. Signed-off-by: Vasanthakumar R <[email protected]> AMD-Internal: CPUPL-4032 Change-Id: Iad948a44c0a6fdbaab00d0d1f08a8777f07fcc55
When AOCL-BLAS linking is selected during build, the BLAS API declarations have to be taken from blis.h header of AOCL-BLAS library instead of AOCL-LAPACK's internal header file. Made updates to several source and header files to handle the same. AMD-Internal: CPUPL-3826 Change-Id: I1f08d4d6ee6423f52cf33111d204fa673f9eb361
Linking error was observed with libFLAME library built without "--enable-amd-opt" option. Checks to exclude compilation of few optimized API functions were missing. Have made the necessary changes to solve the issue. CPUPL-4119 Signed-off-by: Sridhar Govindaswamy <[email protected]> Change-Id: Ifbabc9919dc7887592fe91d738c89738e2174ccf
Optimization for Path10 in DGESVD lead to errors in netlib tests. Modified the logic to fix the issues. Added separate optimized code path for 1T path which was sharing the same code as 10. Signed-off-by: Vasanthakumar R <[email protected]> AMD-Internal: CPUPL-3251 Change-Id: I3f5415bfe7d5e5dc4cc3b026560332738cc6ce19
Fixed the memory leak issue in main testsuite. Signed-off-by: Parag <[email protected]> AMD-Internal: CPUPL-4040 Change-Id: Ia6f06bb5f3252090ab523a4365c2aca69bbb5ef0
Added AVX2 implementation of DGETRS for NOTRANS. Signed-off-by: Jintu Das <[email protected]> AMD-Internal: CPUPL-3660 Change-Id: I5ff38241c99d139e0f834f52fce2b55004ac4470
Usage of FLA_AMD_OPT removed and replaced with FLA_ENABLE_AMD_OPT with '#if' convention instead of '#ifdef'. Signed-off-by: Vasanthakumar R <[email protected]> Change-Id: Idbf4b5f385ed2167198c4a2b47340c1ab2a8778b
Legacy testsuite CMake build fails when linked to ST BLIS and ST libFLAME. The cause for failures are, - Few OpenMP routines are being referenced in ST libflame as part of FLA_PROGRESS feature. - Inclusion of pthread library is missing in legacy testsuite CMake file. BLIS library has a dependency on this libaray. Made changes in FLA_PROGRESS header file and legacy testsuite CMake file to handle the same. CPUPL-3899 Signed-off-by: Sridhar Govindaswamy <[email protected]> Change-Id: I66d25d509e693bfbdb48157034edfc22d2ab1eb1
Added the missing statement in validation code and test code. Signed-off-by: Parag <[email protected]> AMD-Internal: CPUPL-3759 Change-Id: I8bd5179d7c6f1a4e0d32faad5d2ff614df68eb4f
- removed redundant code related to PIC flags and GCOV flag - Updated BUILD.md to generate libflame with aocc compiler linked with libiomp5.so - std=c11 has some issues generating the main test suite. Signed-off-by: bsinghpa<[email protected]> AMD-Internal: CPUPL-3878 Change-Id: I80d31057644bfcb7737fb47145dc27389442ccd2
Memory corruption in dgesvdx test happens in dtsebz API due to mapping of N SVD problem to 2*N Eigen problem. Handled it through additional buffer for Eigen problem in DBDSVDX and copying only required singular values to SVD array at the end. Signed-off-by: Vasanthakumar R <[email protected]> AMD-Internal: CPUPL-4077 Change-Id: Ied04d74cbfb655ca486857754d3e0bfda2112993
1. Set AVX2 as default compilation flag. This gives better performance for functions using high level language code. 2. Fixed CMAKE_INSTALL_PREFIX path in Windows to customize the path AMD Internal : [CPUPL-3395] Change-Id: I78bb2a77f951f5f40088dd15f739fbc32e081e6c
At present, AOCL-Utils library is merged using its object files into AOCL-LAPACK library by default. This behaviour is now changed to merge only when explicitly set in build option using "ENABLE_EMBED_AOCLUTILS" for both CMake and autoconfigure tools build mode. Otherwise, by default, AOCL-Utils library is not merged into AOCL-LAPACK and applications have with link AOCL-Utils explicitly. With this change, it requires following to be done in default build - For CMake based build, ensure header file path of AOCL-Utils is set using LIBAOCLUTILS_INCLUDE_PATH option. - For autoconfigure makefile build, ensure header file path of AOCL-Utils is set in CFLAGS before running make command. - Applications using AOCL-LAPACK have to link with AOCL-Utils explicitly AMD-Internal: CPUPL-4020 Change-Id: I668814b12baacd8d793ef7de3145b6277d7297f0
Optimization of DGESVD path 6 using previously optimized modules and specific optimizations. Signed-off-by: Vasanthakumar R <[email protected]> AMD-Internal: CPUPL-3251 Change-Id: I187d26876553ed75e5a46be712b0d06cfb61f05a
Added new test API to verify LAPACK SYEVX API functionality AMD-Internal: CPUPL-3991 Signed-off-by: dnikku <[email protected]> Change-Id: Id01bb7084eb172a4cb40058c0d0a6b41095cd049
Exclude function that gets thread count from initialization function, aocl_fla_init. This will avoid overhead for single thread mode and also in cases where multi-threading is not used for specific size ranges. Also included a fix for ctest to set path of aocl-utils for netlibtests AMD Internal : CPUPL-4009 Change-Id: I947298525c9771fb0f51ab8df3ccb55b58ea1e16
Added a new variant of multi-threaded ZGETRF. This variant makes use of AOCL-BLAS multi-thread framework. This path will be taken only when FLA_ENABLE_AOCL_BLAS is enabled as there is a dependency on AOCL_BLAS framework APIs. Signed-off-by: Sridhar Govindaswamy <[email protected]> Change-Id: Ifdd6cedc3f6892167991802a0ac3ba5ef6eaf54e
Code changes to fix abrupt execution halt due to syevx test on windows AMD-Internal: CPUPL-3991 Signed-off-by: dnikku <[email protected]> Change-Id: I64a5210f01b167f544e848cc0b8a24a44207416f
CMake of sample code updated to take aoclutils library through option AOCLUTILS_LIBRARY_PATH AMD Internal : CPUPL-3826 Change-Id: I28b5e337b477087c5d2c56dfc780063ea3495829
Fixed CTEST commands for syevx and zgetrf Fixed cmake build command typo in test/example/ReadMe.txt AMD-Internal: CPUPL-4181 Signed-off-by: dnikku <[email protected]> Change-Id: I6c4fadcbe4d4c9ebe86cfaa2aa1e0a616ad62271
Passing incorrect value to LF_ISA_CONFIG cmake flag, was causing wrong value to be set to LF_ISA_CONFIG. Updated cmake to throw error instead. AMD Internal : [CPUPL-4207] Change-Id: Id1d27951e796f0600d4f5442ef168a10629d232c
AOCL-LAPACK library depends on AOCL-Utils library. Updated the library and test suite build documentation with information on additonal build flags to set AOCL-Utils library path. AMD Internal : CPUPL-3826 Change-Id: If96a433ab676fcdc1967db9540245c08d755db40
Updated CMake build configuration to use C11 standard instead of gnu99 similar to autotools configure/make build method. With this change, the failing Netlib tests are working fine. In addition, included D_GNU_SOURCE flag as that was needed by some of the POSIX standard pthread functions used by blis.h referenced in main test suite. AMD Internal : [CPUPL-4214] Change-Id: I7f6190ab053994ca8576d7d269fe7d3fdf20a3e0
Few more flags updated in CMake when user sets external openmp library path. These new flags ensure only needed openmp library is loaded and picks library only from user specified path. Minor Readme edits. AMD Internal : CPUPL-4220 Change-Id: I5ba43e304c0c241d3542b68bfa754952ae9810ab
Version of AOCL-LAPACK upgraded to 4.2.0 from 4.1.0 Signed-off-by: Vasanthakumar R <[email protected]> Change-Id: I91bf53fb2f1ff943072e154ed39dc225698e6de3
Link aocl-utils library as needed for shared library build for both Linux and Windows Change-Id: If52c00410d131236fac8821f9a25168771f366b4
Change-Id: Ica56467b4e995f0be131c0d247801feabbdb637f
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Field,
Please review and merge them.
Thanks