Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve SSE4.1/AES-NI support #644

Merged
merged 1 commit into from
Sep 2, 2024
Merged

Improve SSE4.1/AES-NI support #644

merged 1 commit into from
Sep 2, 2024

Conversation

tbvdm
Copy link

@tbvdm tbvdm commented Mar 31, 2024

The CryptX Perl module contains a vendored copy of libtomcrypt. It uses -msse4.1 -maes to enable AES-NI support. The problem is that these flags are used for all source files. This results in SIGILL crashes on CPUs without SSE4.1.

I think the solution is for CryptX to use -msse4.1 -maes only for aesni.c. But this is not possible without several changes to libtomcrypt. This pull request is a proposal for those changes.

The primary change is in how SSE4.1 is detected. Currently libtomcrypt checks __SSE4_1__. But this requires that -msse4.1 is used for multiple source files. Moreover, it's a compile-time check, not a run-time one. I think it would be better to detect SSE4.1 with cpuid.

Then there is the build procedure. I came up with the following approach. By default, AES-NI support is disabled. To enable it, add -DLTC_AES_NI to CFLAGS and set the required compiler flags in CFLAGS_AES_NI (typically -maes -msse4.1). For example:

make -f makefile.unix CFLAGS=-DLTC_AES_NI CFLAGS_AES_NI="-maes -msse4.1"

Checklist

  • documentation is added or updated
  • tests are added or updated

Closes #641

@tbvdm
Copy link
Author

tbvdm commented Mar 31, 2024

Maybe I should cc @karel-m

@sjaeckel sjaeckel self-requested a review April 2, 2024 22:34
@sjaeckel sjaeckel self-assigned this Apr 2, 2024
@sjaeckel
Copy link
Member

sjaeckel commented Apr 2, 2024

Did you try the proposed change of #641 and whether this maybe fixes it already?

@tbvdm
Copy link
Author

tbvdm commented Apr 3, 2024

#641 doesn't help, unfortunately. Here is a crash from the fix-perl-cryptx-99 branch on a machine without SSE4.1 (running OpenBSD):

$ git clone -b fix-perl-cryptx-99 https://github.com/libtom/libtomcrypt.git
$ cd libtomcrypt
$ make -f makefile.unix CFLAGS="-maes -msse4.1"
$ cd ..
$ cat test.c
#include <tomcrypt.h>

int main(void)
{
	register_cipher(&aes_desc);
	gcm_test();
}
$ cc -I libtomcrypt/src/headers libtomcrypt/libtomcrypt.a test.c
$ ./a.out
Illegal instruction (core dumped)
$ egdb -q a.out a.out.core
Reading symbols from a.out...
(No debugging symbols found in a.out)
[New process 186382]
Core was generated by `a.out'.
Program terminated with signal SIGILL, Illegal instruction.
#0  0x00000a534708a0f0 in gcm_test ()
(gdb) x/i $pc
=> 0xa534708a0f0 <gcm_test+1696>:	ptest  %xmm0,%xmm0

@HaikalAzaim

This comment was marked as spam.

@sjaeckel
Copy link
Member

sjaeckel commented Jun 12, 2024

Sorry for the long delay.

#641 doesn't help, unfortunately. Here is a crash from the fix-perl-cryptx-99 branch on a machine without SSE4.1 (running OpenBSD):

$ git clone -b fix-perl-cryptx-99 https://github.com/libtom/libtomcrypt.git
$ cd libtomcrypt
$ make -f makefile.unix CFLAGS="-maes -msse4.1"
$ cd ..
$ cat test.c
#include <tomcrypt.h>

int main(void)
{
	register_cipher(&aes_desc);
	gcm_test();
}
$ cc -I libtomcrypt/src/headers libtomcrypt/libtomcrypt.a test.c
$ ./a.out
Illegal instruction (core dumped)
$ egdb -q a.out a.out.core
Reading symbols from a.out...
(No debugging symbols found in a.out)
[New process 186382]
Core was generated by `a.out'.
Program terminated with signal SIGILL, Illegal instruction.
#0  0x00000a534708a0f0 in gcm_test ()
(gdb) x/i $pc
=> 0xa534708a0f0 <gcm_test+1696>:	ptest  %xmm0,%xmm0

FMU that's a different problem. You can also see that it fails in gcm_test() and not in AES related code.

A valid test-case for AES-NI would be to first configure&build the library as you did and then execute this:

#include <tomcrypt.h>

int main(void)
{
	aes_test();
}

Regarding what you're seeing: You're explicitly enabling SSE4.1, which allows the compiler to generate SSE-specific instructions.

WARNING: I checked the disassembly in the past and was too lazy to re-check now, so I hope the following statement is still correct :) Please correct me if I'm wrong!

The following code comes from an era when compilers were not yet that smart:

#ifdef LTC_GCM_TABLES
int x;
#ifdef LTC_GCM_TABLES_SSE2
asm("movdqa (%0),%%xmm0"::"r"(&gcm->PC[0][I[0]][0]));
for (x = 1; x < 16; x++) {
asm("pxor (%0),%%xmm0"::"r"(&gcm->PC[x][I[x]][0]));
}
asm("movdqa %%xmm0,(%0)"::"r"(&T));
#else
int y;
XMEMCPY(T, &gcm->PC[0][I[0]][0], 16);
for (x = 1; x < 16; x++) {
#ifdef LTC_FAST
for (y = 0; y < 16; y += sizeof(LTC_FAST_TYPE)) {
*(LTC_FAST_TYPE_PTR_CAST(T + y)) ^= *(LTC_FAST_TYPE_PTR_CAST(&gcm->PC[x][I[x]][y]));
}
#else
for (y = 0; y < 16; y++) {
T[y] ^= gcm->PC[x][I[x]][y];
}
#endif /* LTC_FAST */
}
#endif /* LTC_GCM_TABLES_SSE2 */

By enabling -msse4.1 at compile time you did not enable that "assembly optimized path" in the source code (via the define etc.), but the compilers got so smart that they generate exactly those assembly instructions from the "default code path" (even when LTC_GCM_TABLES_SSE2 is not defined).

You should also be able to reproduce this failure with the latest release (master branch) that has no AES-NI support yet.

Maybe also try to add debug information when building the library to see where exactly it breaks? make -f makefile.unix CFLAGS="-maes -msse4.1 -g3"

Yes, these details are only slightly related to the error you see, but should show you the problem.

@sjaeckel
Copy link
Member

sjaeckel commented Jun 12, 2024

And also you're right with this PR... now I understand the purpose :)

With this we have now a "proper" runtime check for AES-NI. It allows to build the rest of the library for all amd64 (also non SSE) and only requires the AES-NI related parts to be built with SSE4.1 support.

Copy link
Member

@sjaeckel sjaeckel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

README.md Show resolved Hide resolved
@tbvdm
Copy link
Author

tbvdm commented Jun 24, 2024

Sorry for the long delay.

No worries. Thanks for looking at this.

FMU that's a different problem. You can also see that it fails in gcm_test() and not in AES related code.

[...]

Regarding what you're seeing: You're explicitly enabling SSE4.1, which allows the compiler to generate SSE-specific instructions.

That's precisely the point: currently, if you want to enable AES-NI support, you need to enable SSE4.1 unconditionally for the whole source tree.

This is especially relevant when building binary packages. The same binary package should (ideally) be able to run different CPUs with different capabilities.

And also you're right with this PR... now I understand the purpose :)

With this we have now a "proper" runtime check for AES-NI. It allows to build the rest of the library for all amd64 (also non SSE) and only requires the AES-NI related parts to be built with SSE4.1 support.

Yes, exactly. :)

@tbvdm
Copy link
Author

tbvdm commented Jun 24, 2024

I've force-pushed a new commit that includes the change you requested.

Copy link
Member

@sjaeckel sjaeckel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm ... I've just had a third look at this and wouldn't it even make sense to compile aesni.c always with those flags? ... but then we'd need some kind of configure step which determines the target architecture ... What do you think?

Also I've only now realized that there's no CMake support for this feature, that's why I haven't merged it yet.

@tbvdm
Copy link
Author

tbvdm commented Jun 29, 2024

Why don't we use target attributes? See the new commit. This avoids the need for flags altogether. You only have to make sure that LTC_AES_NI is defined. For example:

make -f makefile.unix CFLAGS=-DLTC_AES_NI

I defined __attribute__ away for MSVC. Also, MSVC does not need flags or attributes to enable SSE4.1 and AES intrinsics.

(However, I don't think MSVC will work anyway, because apparently it does not support inline assembly on x64. So the cpuid call will fail. I think the solution is to call cpuid through an intrinsic.)

Copy link
Collaborator

@levitte levitte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ltc's source checker in helper.pl doesn't like it when you redefine compiler specific symbols. It may be better to make an LTC macro for what you want, like this.

src/ciphers/aes/aesni.c Outdated Show resolved Hide resolved
src/ciphers/aes/aesni.c Outdated Show resolved Hide resolved
src/ciphers/aes/aesni.c Outdated Show resolved Hide resolved
src/ciphers/aes/aesni.c Outdated Show resolved Hide resolved
@tbvdm
Copy link
Author

tbvdm commented Aug 31, 2024

I've pushed a new commit that is rebased on current develop, squashes the two commits in this PR, and includes @levitte's suggestions.

Comment on lines 14 to 18
#if defined(__GNUC__) || defined(__clang__)
#define LTC_ATTRIBUTE(x) __attribute__(x)
#else
#define LTC_ATTRIBUTE(x)
#endif
Copy link
Member

@sjaeckel sjaeckel Aug 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please directly put this in tomcrypt_cfg.h?

Copy link
Member

@sjaeckel sjaeckel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the improvements on this and thanks for the PR. Looks very good now.

(and please rebase on top of current develop :) )

@tbvdm
Copy link
Author

tbvdm commented Sep 1, 2024

Done!

@sjaeckel
Copy link
Member

sjaeckel commented Sep 1, 2024

This also supersedes #641 right?

@tbvdm
Copy link
Author

tbvdm commented Sep 1, 2024

Yes, I believe it does.

@sjaeckel
Copy link
Member

sjaeckel commented Sep 2, 2024

Before merging this I have one more question:

Wouldn't it now make sense to enable this by default on x86/amd64?
If yes, we should change the compile-time-switch to disable if not desired.

Btw. thanks (and sorry) @tbvdm for doing this lengthy exercise with us ;)

@karel-m
Copy link
Member

karel-m commented Sep 2, 2024

@tbvdm could you please rebase this PR's branch on top of current develop? I will try to make a dev release of CryptX perl bindings with this patch to check what will happen on cpantesters & co.

@sjaeckel
Copy link
Member

sjaeckel commented Sep 2, 2024

Wouldn't it now make sense to enable this by default on x86/amd64?

@karel-m @levitte I'd also like your opinion on this

@karel-m
Copy link
Member

karel-m commented Sep 2, 2024

Wouldn't it now make sense to enable this by default on x86/amd64?

I do not have a strong opinion on this but I would rather keep it off by default,

According to my experiance you basically need:

  • to found out (or guess) whether your compiler suupports AESNI (in my CryptX I turn it on only for gcc/clang/llvm v5+)
  • than the compiler needs a special options e.g. -msse4.1 -maes for gcc/clang/llvm (OK, perhaps using target attributes might help here, but I'm not sure how it will work on older or less common compilers)
  • on top of that you will simply pass also -DLTC_AES_NI

@levitte
Copy link
Collaborator

levitte commented Sep 2, 2024

I think I'm leaning on the cautious side re the default. @karel-m said it a lot better

@sjaeckel
Copy link
Member

sjaeckel commented Sep 2, 2024

Alright, then we'll stay away from the danger zone and leave it off by default ;)

@tbvdm I took the liberty to rebase and force-push your branch, I hope that's OK

@sjaeckel sjaeckel merged commit ce904c8 into libtom:develop Sep 2, 2024
75 checks passed
@sjaeckel
Copy link
Member

sjaeckel commented Sep 2, 2024

I guess we forgot to enable this in the CI build ... ;)

- { BUILDNAME: 'STOCK+AESNI', BUILDOPTIONS: '-msse4.1 -maes', BUILDSCRIPT: '.ci/run.sh' }

@tbvdm
Copy link
Author

tbvdm commented Sep 2, 2024

Thanks for merging this. :)

Yes, it seems I missed main.yml when I grepped the tree for relevant files.

@sjaeckel sjaeckel mentioned this pull request Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants