Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NCCL-2.10.3-GCCcore-11.2.0-CUDA-11.4.1.eb fails to build #20709

Open
VictorGoitea opened this issue Jun 4, 2024 · 0 comments
Open

NCCL-2.10.3-GCCcore-11.2.0-CUDA-11.4.1.eb fails to build #20709

VictorGoitea opened this issue Jun 4, 2024 · 0 comments

Comments

@VictorGoitea
Copy link
Contributor

I run the command:

eb --installpath=/work/easybuild_u1_gpu --cuda-compute-capabilities=7.0 --accept-eula-for=CUDA -r --debug NCCL-2.10.3-GCCcore-11.2.0-CUDA-11.4.1.eb

Fails with message:

== Temporary log file in case of crash /tmp/eb-joriv2uy/easybuild-dswdiie5.log
== resolving dependencies ...
== processing EasyBuild easyconfig /opt/easybuild/easyconfigs/n/NCCL/NCCL-2.10.3-GCCcore-11.2.0-CUDA-11.4.1.eb
== building and installing NCCL/2.10.3-GCCcore-11.2.0-CUDA-11.4.1...
== fetching files...
== ... (took < 1 sec)
== creating build dir, resetting environment...
== ... (took < 1 sec)
== unpacking...
== ... (took < 1 sec)
== patching...
== ... (took < 1 sec)
== preparing...
== ... (took < 1 sec)
== configuring...
== ... (took < 1 sec)
== building...
== ... (took 17 secs)
== FAILED: Installation ended unsuccessfully (build directory: /home/ucloud/.local/easybuild/build/NCCL/2.10.3/GCCcore-11.2.0-CUDA-11.4.1): build failed (first 300 chars): cmd " make  -j 76  NVCC_GENCODE="-gencode=arch=compute_70,code=sm_70"  PREFIX=/work/easybuild_u1_gpu/software/NCCL/2.10.3-GCCcore-11.2.0-CUDA-11.4.1 " exited with exit code 2 and output:
make -C src build BUILDDIR=/home/ucloud/.local/easybuild/build/NCCL/2.10.3/GCCcore-11.2.0-CUDA-11.4.1/nccl-2.10.3 (took 18 secs)
== Results of the build can be found in the log file(s) /tmp/eb-joriv2uy/easybuild-NCCL-2.10.3-20240604.093342.lnSEU.log
ERROR: Build of /opt/easybuild/easyconfigs/n/NCCL/NCCL-2.10.3-GCCcore-11.2.0-CUDA-11.4.1.eb failed (err: 'build failed (first 300 chars): cmd " make  -j 76  NVCC_GENCODE="-gencode=arch=compute_70,code=sm_70"  PREFIX=/work/easybuild_u1_gpu/software/NCCL/2.10.3-GCCcore-11.2.0-CUDA-11.4.1 " exited with exit code 2 and output:\nmake -C src build BUILDDIR=/home/ucloud/.local/easybuild/build/NCCL/2.10.3/GCCcore-11.2.0-CUDA-11.4.1/nccl-2.10.3')

Error log file:

easybuild-NCCL-2.10.3-20240604.093342.lnSEU.log

Additional information:

OS: Ubuntu 22.04 LTS x86_64                                                                                                                                                                                                                           
Host: PowerEdge C4140                                                                                                                                                                                                                                 
Kernel: 5.4.256.el8                                                                                                                                                                                                                                   
Uptime: 49 days, 23 hours                                                                                                                                                                                                                             
Packages: 773 (dpkg)                                                                                                                                                                                                                                  
Shell: bash 5.1.16                                                                                                                                                                                                                                    
Terminal: tmux                                                                                                                                                                                                                                        
CPU: Intel Xeon Gold 6230 (80) @ 800MHz                                                                                                                                                                                                               
GPU: NVIDIA Tesla V100 SXM2 32GB                                                                                                                                                                                                                      
GPU: NVIDIA Tesla V100 SXM2 32GB                                                                                                                                                                                                                      
GPU: NVIDIA Tesla V100 SXM2 32GB                                                                                                                                                                                                                      
GPU: NVIDIA Tesla V100 SXM2 32GB                                                                                                                                                                                                                      
Memory: 18733MiB / 192030MiB

eb --show-system-info

System information (j-5056263-job-0):

* OS:
  -> name: Ubuntu
  -> type: Linux
  -> version: 22.04
  -> platform name: x86_64-unknown-linux

* CPU:
  -> vendor: Intel
  -> architecture: x86_64
  -> family: Intel
  -> arch name: UNKNOWN (archspec is not installed?)
  -> model: Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz
  -> speed: 929.304
  -> cores: 76
  -> features: 3dnowprefetch,abm,acpi,adx,aes,aperfmperf,apic,arat,arch_capabilities,arch_perfmon,art,avx,avx2,avx512_vnni,avx512bw,avx512cd,avx512dq,avx512f,avx512vl,bmi1,bmi2,bts,cat_l3,cdp_l3,clflush,clflushopt,clwb,cmov,constant_tsc,cpuid,cpuid_fault,cqm,cqm_llc,cqm_mbm_local,cqm_mbm_total,cqm_occup_llc,cx16,cx8,dca,de,ds_cpl,dtes64,dtherm,dts,epb,ept,ept_ad,erms,est,f16c,flexpriority,flush_l1d,fma,fpu,fsgsbase,fxsr,ht,ibpb,ibrs,ibrs_enhanced,ida,intel_ppin,intel_pt,invpcid,invpcid_single,lahf_lm,lm,mba,mca,mce,md_clear,mmx,monitor,movbe,mpx,msr,mtrr,nonstop_tsc,nopl,nx,ospke,pae,pat,pbe,pcid,pclmulqdq,pdcm,pdpe1gb,pebs,pge,pku,pln,pni,popcnt,pse,pse36,pts,rdrand,rdseed,rdt_a,rdtscp,rep_good,sdbg,sep,smap,smep,smx,ss,ssbd,sse,sse2,sse4_1,sse4_2,ssse3,stibp,syscall,tm,tm2,tpr_shadow,tsc,tsc_adjust,tsc_deadline_timer,vme,vmx,vnmi,vpid,x2apic,xgetbv1,xsave,xsavec,xsaveopt,xsaves,xtopology,xtpr

* GPU:
  -> NVIDIA
    -> 1x Tesla V100-SXM2-32GB, 550.54.15

* software:
  -> glibc version: 2.35
  -> Python binary: /usr/bin/python3
  -> Python version: 3.10.4

eb --show-config

# Current EasyBuild configuration
# (C: command line argument, D: default value, E: environment variable, F: configuration file)
#
buildpath      (D) = /home/ucloud/.local/easybuild/build
containerpath  (D) = /home/ucloud/.local/easybuild/containers
installpath    (E) = /opt/easybuild
repositorypath (D) = /home/ucloud/.local/easybuild/ebfiles_repo
robot-paths    (E) = /opt/easybuild/easyconfigs
sourcepath     (D) = /home/ucloud/.local/easybuild/sources
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant