Introduction:
As the popularity of machine learning and artificial intelligence continues to grow, many individuals and companies are turning to powerful tools like TensorFlow to build and train their models. Recently, Apple unveiled its new M1 Pro and M1 Max MacBook models, which boast impressive performance gains compared to previous iterations. However, users have reported unexpected results when running TensorFlow tests on these new machines. In this tutorial, we will explore some common issues and solutions for unexpected results when using TensorFlow on M1 Pro/Max MacBooks.
Common Issues:
1. Compatibility Issues: One of the most common issues users face when running TensorFlow tests on M1 Pro/Max MacBooks is compatibility issues with the new architecture. Since the M1 Pro/Max models are based on Apple’s own ARM-based chip, the architecture is quite different from traditional x86 chips. This can lead to unexpected results and errors when running TensorFlow tests that were not designed to run on ARM architecture.
2. Performance Issues: Another common issue reported by users is performance degradation when running TensorFlow tests on M1 Pro/Max MacBooks. While these new machines are designed to be powerful and efficient, some users have reported slower performance compared to Intel-based machines when running TensorFlow tests. This can be frustrating for users who rely on TensorFlow for their machine learning projects.
3. Compiler Optimization Issues: Some users have reported issues with compiler optimization when running TensorFlow tests on M1 Pro/Max MacBooks. Since the new machines use Apple’s own compiler, there may be differences in optimization settings compared to traditional compilers like GCC or Clang. This can lead to unexpected results and errors when running TensorFlow tests.
Solutions:
1. Update TensorFlow: One of the first steps you should take when experiencing unexpected results with TensorFlow on M1 Pro/Max MacBooks is to ensure that you are using the latest version of TensorFlow. The TensorFlow team is constantly updating the library to improve compatibility and performance on different architectures, including ARM-based chips. Updating to the latest version may help resolve some of the issues you are experiencing.
2. Use Rosetta 2: If you are still experiencing compatibility issues with TensorFlow on M1 Pro/Max MacBooks, you can try running TensorFlow tests through Rosetta 2. Rosetta 2 is a translation layer that allows x86-based software to run on ARM-based Macs. While this may not be an ideal solution, it can help you run TensorFlow tests until official support for ARM architecture is implemented.
3. Optimize Code for ARM: Another solution to unexpected results when running TensorFlow tests on M1 Pro/Max MacBooks is to optimize your code for ARM architecture. This may involve making changes to your code to leverage the performance advantages of the M1 Pro/Max chip. By optimizing your code for ARM, you may be able to improve performance and reduce unexpected results.
4. Report Issues: If you are still experiencing unexpected results when running TensorFlow tests on M1 Pro/Max MacBooks, it is important to report these issues to the TensorFlow team. By providing detailed information about the errors you are encountering, you can help the team identify and address compatibility and performance issues with ARM architecture. Reporting issues can help improve the overall TensorFlow experience for users on M1 Pro/Max MacBooks.
Conclusion:
While running TensorFlow tests on M1 Pro/Max MacBooks may lead to unexpected results and errors, there are solutions available to help you address these issues. By updating TensorFlow, using Rosetta 2, optimizing your code for ARM architecture, and reporting issues to the TensorFlow team, you can improve compatibility and performance when using TensorFlow on M1 Pro/Max MacBooks. As Apple continues to innovate with its M1 Pro/Max chips, it is likely that support for ARM architecture will improve in the future, making TensorFlow even more powerful on these machines.
The MNIST dataset is not good to show hardware acceleration with a GPU. This has to do with the matrices sizes being small and the data handling between the cpu and gpu eating up time. It might surprise some that you could end up with better results if you only handed this to the cpu. I would recommend using a model that is handling larger chunks of data such as the ceras dataset.
you can force the random seed to be the same to all machines – usually picking a number in the code… that way you always get same results… this is used to develop strong code optimizations… and this is why you see different visuals using the same dataset… random number generation… very surprised that m1 max is slower than m1 pro… I expected 20% faster instead??? did you do any analysis as to why is slower? great video!
I think mnist is a relatively small model with not that many parameters, so the Pro may be more than enough with less overhead. If you train a larger model, the Max should be much faster.
what were the specs of different machines?
very interested to see what libraries and data sets will become the standard for benchmarking machine learning performance on CPUs and GPUs
needs comparison vs rtx3060 and rtx 3070TI laptops
Result on 2019 Mbp Pro 16' (intel i9 2.3 GHz + AMD Radeon Pro 5500M 4 GB)
LOAD_TRAIN: 4790 ms
CREATE_TRAIN: 9535 ms
LOAD_TEST: 787 ms
CREATE_TEST: 823 ms
TRAIN: 257528 ms
Thanks for the review, interesting results. What can you say about generating images using CAN + Clip Macbook Pro will pull this or is it better to consider purchasing a desktop PC with a nvidia 3090 card?
I am quite new to this camp, and I would like to try myself in deep learning, I would like to hear some advice from you.
Thank you for these videos. I am studying ML & NLP and ordered M1 Max 64GB over two months ago and just received it lately. Lately some people have posted their reviews for M1 Max for ML and I've watched these reviews. The reviews show some potential power of M1 Max gpu for ML, but at this point the results are mixed. I am thinking if I should keep the Max or down grade it to M1 Pro 32gb.
Consistently you must be dumbing out the Max… don't you think ?
I repeat the test, on max 32Gb Ram. The first time the results on TRAIN was 140000 ms on test, the others parts are identical. But I repeat the test and the results was:
LOAD_TRAIN: 3272 ms
CREATE_TRAIN: 6436 ms
LOAD_TEST: 516 ms
CREATE_TEST: 533 ms
TRAIN: 38158 ms
I am not sure why, perhaps the test select two randoms numbers.
There is an update on software?
There is and learning process on Mac? The nexts times alwais the results was lower than 40000 ms and I repeats several times.
Pretty interesting to see M1 pro is outperforming M1 max. How much ram does your M1 pro have?
Cheers Alex – with the benchmarks, can you show powermetrics so we can see what's in play & how much?
I feel really sth should be off there in default settings
A batch of 20 may be way too small for Max and the bottleneck becomes loading data during the training steps. Are there differences on the SSD between these two machines? Or maybe feeding the wider memory bandwidth actually has more overhead? I work with NVidia A6000 and usually need to make very bug batch sizes, right before tensorflow gives out of memory errors, to fully utilize the GPU. Maybe you can try increasing the batch size right until the demo code doesn’t work anymore. It’s like moving a home with a sedan or a truck, but for both if you just load 1 box at a time, the car would drive faster than the truck. But if you fully load the truck definitely the truck is faster.
Hey man, please help me out, nobody filmed decent comparison for m1pro/max vs say 11900h or 11700h for Java, I'm interested in all the rutine such as: IDE (preferably Intellij Idea) speed test, project compile test, then say Java based docker container or/and kubernetes pod/pods starting. If you could film it I would be probably most thankful guy out here))) so please please please.
Thanks for testing this use case!
I suspect some thermal throttling is going on within the CPU or GPU, in the AMD 12 core vs 16 core, the 16 core is faster, however, the 12 core has faster single CPU speeds due to thermals.. so unless you are using them all, often times the 12 core will be faster. This could be the same issue here. You close down everything and ran the tests, I bet if you went the other way, and loaded up both systems, the Max would have started to become faster relative to the pro.
is time to test deep fritz 18 in parallels
this is the video I was waiting for a long time