Native ARM Mac GPU usage (metal performance shaders) #501

tylerjereddy · 2023-05-16T17:44:19Z

Since Cirrus CI offers some native arm Mac (M chip) services, I was wondering if there might be some documentation/examples/options for using the GPU component (i.e., the metal performance shaders) when testing with i.e., torch which has an mps backend: https://pytorch.org/docs/stable/notes/mps.html

I did a little experiment here: tylerjereddy/scipy#71

And found that there may be some restrictions that prevent practical usage in the open source tier:
RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 1.70 GB). Tried to allocate 0 bytes on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

Do you have any experience/guidance here? Is this expected? Is this disabled and you don't want us trying it? It would be very cool to be able to flush through GPUs in CI like that!

The text was updated successfully, but these errors were encountered:

fkorotkov · 2023-05-16T19:03:25Z

I ran the following Pytorch example inside a Tart VM and indeed it seems not supported by underlying Virtualization.framework. Seems it's not supported yet but hopefully there will be some news on WWDC in two weeks. 🤞

tylerjereddy · 2023-05-16T19:13:08Z

Thanks, this would be pretty cool!

fkorotkov · 2023-05-16T20:18:25Z

With a little bit more investigation it seems the Virtualization.Framework should support Metal. It's mentioned in the last years WWDC video on 10:53. There is even ParavirtualizedGraphics.Framework that predates Virtualization.Framewerk which allegedly should use it.

But in my testing I don't see any graphics devices inside the VM:

Comparing to what I see on an M1 Mac Mini:

@edigaryev I know you diged into private APIs of Virtualization.Framework. Have you seem maybe any mentions of Metal?

edigaryev · 2023-05-17T09:00:31Z

@fkorotkov the paravirtualization actually seems to be used:

You can also check this by running ioreg inside of a VM:

% ioreg -n AppleParavirtGPU -r
+-o AppleParavirtGPU  <class AppleParavirtGPU, id 0x100000191, registered, matched, active, busy 0 (1 ms), retain 13>
  | {
  |   "IOClass" = "AppleParavirtGPU"
  |   "KDebugVersion" = 4294967296
  |   "IOPersonalityPublisher" = "com.apple.driver.AppleParavirtGPUIOGPUFamily"
  |   "IOMatchedAtBoot" = Yes
  |   "IOReportLegendPublic" = Yes
  |   "AGCInfo" = {"fLastSubmissionPID"=134,"fSubmissionsSinceLastCheck"=0,"fBusyCount"=0}
  |   "IOProviderClass" = "AppleARMIODevice"
  |   "MetalPluginName" = "AppleParavirtGPUMetalIOGPUFamily"
  |   "IOProbeScore" = 0
  |   "SurfaceList" = ()
  |   "IONameMatch" = "paravirtualizedgraphics,gpu"
  |   "MetalPluginClassName" = "AppleParavirtDevice"
  |   "SchedulerState" = {"Stamps"=(),"BusyWorkQueues"=()}
  |   "CFBundleIdentifierKernel" = "com.apple.driver.AppleParavirtGPUIOGPUFamily"
  |   "IOMatchCategory" = "IOAcceleratorES"
  |   "CFBundleIdentifier" = "com.apple.driver.AppleParavirtGPUIOGPUFamily"
  |   "IONameMatched" = "paravirtualizedgraphics,gpu"
  |   "PerformanceStatistics" = {"recoveryCount"=0,"In use system memory"=108962304,"Alloc system memory"=52527104}
  |   "IOGeneralInterest" = "IOCommand is not serializable"
  |   "IOReportLegend" = ({"IOReportChannels"=((1,6442450945,"Alloc system memory"),(2,6442450945,"In use system memory"),(3,6442450945,"GPU Restart Count")),"IOReportGroupName"="Internal Statistics","IOReportChan$
  |   "DisplayPortCount" = 1
  | }
  | 
  +-o AppleParavirtDisplay  <class AppleParavirtDisplay, id 0x1000001df, registered, matched, active, busy 0 (0 ms), retain 9>
  | +-o IOMobileFramebufferUserClient  <class IOMobileFramebufferUserClient, id 0x100000285, !registered, !matched, active, busy 0, retain 5>
  | +-o IOMobileFramebufferUserClient  <class IOMobileFramebufferUserClient, id 0x100000286, !registered, !matched, active, busy 0, retain 5>
  +-o AppleParavirtDeviceUserClient  <class AppleParavirtDeviceUserClient, id 0x100000294, !registered, !matched, active, busy 0, retain 5>
  +-o AppleParavirtDeviceUserClient  <class AppleParavirtDeviceUserClient, id 0x100000353, !registered, !matched, active, busy 0, retain 5>
  +-o AppleParavirtDeviceUserClient  <class AppleParavirtDeviceUserClient, id 0x10000035a, !registered, !matched, active, busy 0, retain 5>
  +-o AppleParavirtDeviceUserClient  <class AppleParavirtDeviceUserClient, id 0x10000035d, !registered, !matched, active, busy 0, retain 5>
  +-o AppleParavirtDeviceUserClient  <class AppleParavirtDeviceUserClient, id 0x10000036a, !registered, !matched, active, busy 0, retain 5>
  +-o AppleParavirtDeviceUserClient  <class AppleParavirtDeviceUserClient, id 0x1000003fa, !registered, !matched, active, busy 0, retain 5>

I'm not sure as to why Apple’s Metal Performance Shaders don't work, though.

tylerjereddy · 2023-05-31T13:48:05Z

Perhaps @Developer-Ecosystem-Engineering might be able to (informally) point us in the right direction? I know they've been quite helpful with NumPy low-level development on M-series chips.

gluefox · 2024-05-08T10:56:43Z

I am running into the same issue as well.

Developer-Ecosystem-Engineering · 2024-05-09T20:25:58Z

Its currently not supported to run these types of workloads under virtualization.framework.

We understand the request!

fkorotkov transferred this issue from cirruslabs/cirrus-ci-docs May 16, 2023

fkorotkov added the enhancement New feature or request label May 16, 2023

fkorotkov added the not possible atm Not possible at the moment label May 16, 2023

tylerjereddy mentioned this issue May 25, 2024

ENH: array types, signal: delegate to CuPy and JAX for correlations and convolutions scipy/scipy#20772

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Native ARM Mac GPU usage (metal performance shaders) #501

Native ARM Mac GPU usage (metal performance shaders) #501

tylerjereddy commented May 16, 2023

fkorotkov commented May 16, 2023

tylerjereddy commented May 16, 2023

fkorotkov commented May 16, 2023

edigaryev commented May 17, 2023

tylerjereddy commented May 31, 2023

gluefox commented May 8, 2024

Developer-Ecosystem-Engineering commented May 9, 2024

Native ARM Mac GPU usage (metal performance shaders) #501

Native ARM Mac GPU usage (metal performance shaders) #501

Comments

tylerjereddy commented May 16, 2023

fkorotkov commented May 16, 2023

tylerjereddy commented May 16, 2023

fkorotkov commented May 16, 2023

edigaryev commented May 17, 2023

tylerjereddy commented May 31, 2023

gluefox commented May 8, 2024

Developer-Ecosystem-Engineering commented May 9, 2024