Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native ARM Mac GPU usage (metal performance shaders) #501

Open
tylerjereddy opened this issue May 16, 2023 · 7 comments
Open

Native ARM Mac GPU usage (metal performance shaders) #501

tylerjereddy opened this issue May 16, 2023 · 7 comments
Labels
enhancement New feature or request not possible atm Not possible at the moment

Comments

@tylerjereddy
Copy link

Since Cirrus CI offers some native arm Mac (M chip) services, I was wondering if there might be some documentation/examples/options for using the GPU component (i.e., the metal performance shaders) when testing with i.e., torch which has an mps backend: https://pytorch.org/docs/stable/notes/mps.html

I did a little experiment here: tylerjereddy/scipy#71

And found that there may be some restrictions that prevent practical usage in the open source tier:
RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 1.70 GB). Tried to allocate 0 bytes on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

Do you have any experience/guidance here? Is this expected? Is this disabled and you don't want us trying it? It would be very cool to be able to flush through GPUs in CI like that!

@fkorotkov fkorotkov transferred this issue from cirruslabs/cirrus-ci-docs May 16, 2023
@fkorotkov fkorotkov added the enhancement New feature or request label May 16, 2023
@fkorotkov
Copy link
Contributor

I ran the following Pytorch example inside a Tart VM and indeed it seems not supported by underlying Virtualization.framework. Seems it's not supported yet but hopefully there will be some news on WWDC in two weeks. 🤞

@fkorotkov fkorotkov added the not possible atm Not possible at the moment label May 16, 2023
@tylerjereddy
Copy link
Author

Thanks, this would be pretty cool!

@fkorotkov
Copy link
Contributor

With a little bit more investigation it seems the Virtualization.Framework should support Metal. It's mentioned in the last years WWDC video on 10:53. There is even ParavirtualizedGraphics.Framework that predates Virtualization.Framewerk which allegedly should use it.

But in my testing I don't see any graphics devices inside the VM:

Screenshot 2023-05-16 at 4 14 33 PM

Comparing to what I see on an M1 Mac Mini:

Screenshot 2023-05-16 at 4 12 36 PM

@edigaryev I know you diged into private APIs of Virtualization.Framework. Have you seem maybe any mentions of Metal?

@edigaryev
Copy link
Collaborator

@fkorotkov the paravirtualization actually seems to be used:

Screenshot 2023-05-17 at 12 56 33

You can also check this by running ioreg inside of a VM:

% ioreg -n AppleParavirtGPU -r
+-o AppleParavirtGPU  <class AppleParavirtGPU, id 0x100000191, registered, matched, active, busy 0 (1 ms), retain 13>
  | {
  |   "IOClass" = "AppleParavirtGPU"
  |   "KDebugVersion" = 4294967296
  |   "IOPersonalityPublisher" = "com.apple.driver.AppleParavirtGPUIOGPUFamily"
  |   "IOMatchedAtBoot" = Yes
  |   "IOReportLegendPublic" = Yes
  |   "AGCInfo" = {"fLastSubmissionPID"=134,"fSubmissionsSinceLastCheck"=0,"fBusyCount"=0}
  |   "IOProviderClass" = "AppleARMIODevice"
  |   "MetalPluginName" = "AppleParavirtGPUMetalIOGPUFamily"
  |   "IOProbeScore" = 0
  |   "SurfaceList" = ()
  |   "IONameMatch" = "paravirtualizedgraphics,gpu"
  |   "MetalPluginClassName" = "AppleParavirtDevice"
  |   "SchedulerState" = {"Stamps"=(),"BusyWorkQueues"=()}
  |   "CFBundleIdentifierKernel" = "com.apple.driver.AppleParavirtGPUIOGPUFamily"
  |   "IOMatchCategory" = "IOAcceleratorES"
  |   "CFBundleIdentifier" = "com.apple.driver.AppleParavirtGPUIOGPUFamily"
  |   "IONameMatched" = "paravirtualizedgraphics,gpu"
  |   "PerformanceStatistics" = {"recoveryCount"=0,"In use system memory"=108962304,"Alloc system memory"=52527104}
  |   "IOGeneralInterest" = "IOCommand is not serializable"
  |   "IOReportLegend" = ({"IOReportChannels"=((1,6442450945,"Alloc system memory"),(2,6442450945,"In use system memory"),(3,6442450945,"GPU Restart Count")),"IOReportGroupName"="Internal Statistics","IOReportChan$
  |   "DisplayPortCount" = 1
  | }
  | 
  +-o AppleParavirtDisplay  <class AppleParavirtDisplay, id 0x1000001df, registered, matched, active, busy 0 (0 ms), retain 9>
  | +-o IOMobileFramebufferUserClient  <class IOMobileFramebufferUserClient, id 0x100000285, !registered, !matched, active, busy 0, retain 5>
  | +-o IOMobileFramebufferUserClient  <class IOMobileFramebufferUserClient, id 0x100000286, !registered, !matched, active, busy 0, retain 5>
  +-o AppleParavirtDeviceUserClient  <class AppleParavirtDeviceUserClient, id 0x100000294, !registered, !matched, active, busy 0, retain 5>
  +-o AppleParavirtDeviceUserClient  <class AppleParavirtDeviceUserClient, id 0x100000353, !registered, !matched, active, busy 0, retain 5>
  +-o AppleParavirtDeviceUserClient  <class AppleParavirtDeviceUserClient, id 0x10000035a, !registered, !matched, active, busy 0, retain 5>
  +-o AppleParavirtDeviceUserClient  <class AppleParavirtDeviceUserClient, id 0x10000035d, !registered, !matched, active, busy 0, retain 5>
  +-o AppleParavirtDeviceUserClient  <class AppleParavirtDeviceUserClient, id 0x10000036a, !registered, !matched, active, busy 0, retain 5>
  +-o AppleParavirtDeviceUserClient  <class AppleParavirtDeviceUserClient, id 0x1000003fa, !registered, !matched, active, busy 0, retain 5>

I'm not sure as to why Apple’s Metal Performance Shaders don't work, though.

@tylerjereddy
Copy link
Author

Perhaps @Developer-Ecosystem-Engineering might be able to (informally) point us in the right direction? I know they've been quite helpful with NumPy low-level development on M-series chips.

@gluefox
Copy link

gluefox commented May 8, 2024

I am running into the same issue as well.

@Developer-Ecosystem-Engineering

Its currently not supported to run these types of workloads under virtualization.framework.

We understand the request!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request not possible atm Not possible at the moment
Projects
None yet
Development

No branches or pull requests

5 participants