Awesome comment! This made me realize I made a substantial oversight in the article. A few things. First, I ran this with PyTorch which is not yet optmized for M1. As a result, these results have a HUGE caveat that the experiment does not take advantage of everything M1 has to offer (in other words, I think just the neural engine and not the GPU). Second, I ran the original tests with python 3.8 instead of python 3.9, the latter of which has been optimized for M1. As a result, I am currently running the test again with pyton 3.9 and epoch #1 took about 50 minutes which means the m1 run time should be about the same as the MBP and maybe even slightly better (update to be posted). Third, with regards to the part about native or rosetta, since I re-ran the test with python 3.9 from terminal, I assume this was run natively on M1.
The updated bottom line, I think, is two-part. First if you prefer working with PyTorch, the M1 is not *yet* the right machine for you. The forums suggest we should see something from PyTorch on that front and I am looking forward to that. Second, for a big ML task, a purpose-built GPU rig/thing is still probably going to provide the best aboslute results. For that, I am just cloning my projects to a Colab notebook and letting things run there.