Elena Canorea
Communications Lead
This article introduces the Intel Arc A770 GPU as a competitive option for intensive AI tasks, especially for those working within the Windows ecosystem. Traditionally, NVIDIA GPUs and CUDA have dominated this space, but Intel’s latest offering provides a robust alternative. This article adds new information to help users work more easily with the Arc A770 GPU natively on Windows, bypassing the need for the Windows Subsystem for Linux (WSL).
Through practical steps and detailed insights, we explore how to set up and optimize the Arc A770 GPU for various AI models, including Llama2, Llama3, and Phi3. The article also includes performance metrics and memory usage statistics, providing a comprehensive overview of the GPU’s capabilities. Whether you are a developer or researcher, this post will equip you with the knowledge to leverage Intel’s GPU for your AI projects efficiently and effectively.
Intel recently provided me with the opportunity to test their Arc A770 GPU for AI tasks. While detailed specifications can be found here, one feature immediately stood out: 16GB of RAM. This is 4GB more than its natural competitor, the NVIDIA RTX 3060, making it a compelling option for AI computations at a similar price point.
Intel Arc A770 GPU used for tests
At Plain Concepts, where we predominantly work with Microsoft technologies, I decided to explore the GPU’s capabilities on a Windows platform. Given my usual work with PyTorch, I began by utilizing the Intel Extension for PyTorch to see if it could run models like Llama2, Llama3, and Phi3, and to evaluate its performance.
Initially, I considered using the Windows Subsystem for Linux (WSL) based on suggestions from various blog posts and videos that indicated native Windows support might not be fully ready. However, I chose to first experiment with a native Windows setup, and after a few tweaks and adjustments, I was pleased to discover that everything worked seamlessly!
Intel Arc A770 GPU used for tests
In this article, I will share my experiences and the steps I took to run Llama2, Llama3, and Phi3 models on the Intel Arc A770 GPU natively in Windows. I will also present performance metrics, including execution time and memory usage for each model. The goal is to provide a comprehensive overview of how the Intel Arc A770 GPU can be effectively used for intensive AI tasks on Windows.
Intel provides a comprehensive guide for installing the Python extension for the Arc GPU.
Intel extension for Pytorch install guide
However, setting up the Arc A770 GPU on Windows required some initial adjustments and troubleshooting. Here’s a brief summary of those adjustments. For detailed instructions, refer to the samples repository.
As stated in its GitHub repository, “Intel® Extension for PyTorch extends PyTorch with up-to-date features optimizations for an extra performance boost on Intel hardware”. Specifically, it “provides easy GPU acceleration for Intel discrete GPUs through the PyTorch xpu device”. This means that, by using this extension, you can leverage the Intel Arc A770 GPU for AI tasks without relying on CUDA/NVIDIA, and that you can get an even greater performance boost when using one of the optimized models.
Luckly, the extension follows the same API as PyTorch, so in general there is just a few changes to make in the code to get it running on the Intel GPU. Here is a brief summary of the changes needed:
Add intel extension for pytorch, and check if the GPU is correctly detected.
This change is not strictly needed, but it is a good practice to check if the GPU is correctly detected before running the model.
Once the model is loaded, move it to the GPU.
Finally, when using the model, ensure the input data is also on the GPU.
In order to measure the performance accurately, I also added some extra code to retrieve the total inference time and max memory allocation. It mainly consists on a warm-up of each model before actually doing de inference, plus some extra code to wait model to run and print the results in a human-readable way. Check the samples repository for more information and to replicate the results in your own machine.
Llama2 is the second iteration of the popular and open source Llama LLM model by Meta. After the preparing the environment, and making the changes stated in the previous section to the Llama2 official samples, I was able to run the Llama2 model on the Intel Arc A770 GPU, for plain inference as well as for instruction tasks.
The Llama2 7B model takes approximately 14GB of memory using float16 precision. As the GPU has 16GB available, we can run it without any issues. Below you can see the results of the inference sample, using a maximum of 128 tokens in the output.
Similarly, the Llama2 7B chat results were impressive, with the model generating human-like responses in a conversational tone. The chat sample ran smoothly on the Intel Arc A770 GPU, showcasing its capabilities for chat applications. In this case, the sample runs with 512 tokens in the output to further stress the hardware.
Llama3 is the latest iteration of the Llama LLM model by Meta, released a couple of months ago. Luckly the Intel team hurried to include the model optimization in the extension, so it was possible to leverage the full power of the Intel Arc A770 GPU. The process was quite similar to the one used for Llama2, using the same environment and official samples.
The Llama3 8B model takes approximately a little more than 15GB of memory using float16 precision. As the GPU has 16GB available, we can run it without any issues. Below you can see the results of the inference sample, using a maximum of 64 tokens in the output.
Following the Llama2 samples, I also tested for the chat capabilities of the Llama3 8B model, increasing the output tokens to 256.
Phi3 is the latest model from Microsoft, released the 24th of April, designed for instruction tasks. It is a smaller model than Llama2 and Llama3 (3.8B parameters the smallest version), but it is still quite powerful. It is trained for instruction tasks, providing detailed and informative responses.
While Phi3 optimizations for Intel hardware are not yet included in the Intel extension for Pytorch, we can use a third party library, ipex-llm, to optimize the model. In this case, as the Phi3 is quite new, to get the optimization I had to install the prerelease version, that implements the optimizations for all kernel operations of Phi3. Note that ipex-llm is not a formal Intel library, but a community-driven one, so it is not officially supported by Intel.
Once the model is optimized, the rest of the code modifications are the same as for Llama2 and Llama3, so I was able to run the Phi3 model on the Intel Arc A770 GPU without any issues.
The 4K model takes around 2.5GB of memory using 4bit precision. As it has much less parameters than Llama models, it is much faster to run. Below you can see the results of the inference sample, using a maximum of 512 tokens in the output.
To offer a thorough evaluation of the Intel Arc A770 GPU’s performance, I conducted a comparative analysis of execution time and memory usage for each model on both the Intel Arc A770 GPU and the NVIDIA RTX3080 TI.
The metrics were obtained using identical code samples and environment settings for both GPUs, ensuring a fair and accurate comparison. To understand better the results, it is important to note that I didn’t use quantization in the Llama models (dtype float16). As they take >12GB of memory, when using the NVIDIA GPU the system had to use around 2-3 GB of shared memory to compensate. On the other hand, the Phi3 test uses 4-bit quantization on both NVIDIA and Intel tests.
Model | Output Tokens | Execution Time | Max Memory Used |
meta-llama/Llama-2-7b-hf | 128 | ~7.7s | ~12.8GB |
meta-llama/Llama-2-7b-chat-hf | 512 | ~22.1s | ~13.3GB |
meta-llama/Meta-Llama-3-8B | 64 | ~11.5s | ~15.1GB |
meta-llama/Meta-Llama-3-8B-Instruct | 256 | ~30.7s | ~15.2GB |
microsoft/Phi-3-mini-4k-instruct | 512 | ~5.9s | ~2.6GB |
Model | Output Tokens | Execution Time | Max Memory Used |
meta-llama/Llama-2-7b-hf | 128 | ~15.5s | ~12.8GB |
meta-llama/Llama-2-7b-chat-hf | 512 | ~51.5s | ~13.3GB |
meta-llama/Meta-Llama-3-8B | 64 | ~16.9s | ~15.1GB |
meta-llama/Meta-Llama-3-8B-Instruct | 256 | ~66.7s | ~15.2GB |
microsoft/Phi-3-mini-4k-instruct | 512 | ~16.7s | ~2.6GB |
The graph below illustrates the normalized execution time per token for each model on both the Intel Arc A770 and NVIDIA RTX3080 TI GPUs.
*MARGIN OF ERROR: LESS THAN 0.1 SECONDS
As illustrated, the Intel Arc A770 GPU performed exceptionally well across all models, demonstrating competitive execution times. Notably, the Intel Arc A770 GPU outperformed the NVIDIA RTX3080 TI by a factor of two or more in most cases.
The Intel Arc A770 GPU has proven to be a remarkable option for AI computation on a local Windows machine, offering an alternative to the CUDA/NVIDIA ecosystem. The GPU’s ability to efficiently run models like Llama2, Llama3, and Phi3 demonstrates its potential and robust performance capabilities. Despite initial setup challenges, the process was relatively straightforward, and the results were impressive.
In essence, the Intel Arc A770 GPU is a powerful tool for AI applications on Windows. With some initial setup and code adjustments, it handled inference, chat, and training tasks efficiently. This opens up new opportunities for developers and researchers who prefer or need to work within the Windows environment without relying on NVIDIA GPUs and CUDA. As Intel continues to enhance its GPU offerings and software support, the Arc A770 and future models are poised to become significant players in the AI community.
The code samples used in this article can be found in the IntelArcA770 GitHub repository.
As well below are some resources that I find fundamental to dive deeper into the Intel hardware & libraries ecosystem for AI tasks.
Elena Canorea
Communications Lead
Cookie | Duration | Description |
---|---|---|
__cfduid | 1 year | The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information. |
__cfduid | 29 days 23 hours 59 minutes | The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information. |
__cfduid | 1 year | The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information. |
__cfduid | 29 days 23 hours 59 minutes | The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information. |
_ga | 1 year | This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors. |
_ga | 1 year | This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors. |
_ga | 1 year | This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors. |
_ga | 1 year | This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors. |
_gat_UA-326213-2 | 1 year | No description |
_gat_UA-326213-2 | 1 year | No description |
_gat_UA-326213-2 | 1 year | No description |
_gat_UA-326213-2 | 1 year | No description |
_gid | 1 year | This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form. |
_gid | 1 year | This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form. |
_gid | 1 year | This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form. |
_gid | 1 year | This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form. |
attributionCookie | session | No description |
cookielawinfo-checkbox-analytics | 1 year | Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Analytics" category . |
cookielawinfo-checkbox-necessary | 1 year | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-necessary | 1 year | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-non-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary". |
cookielawinfo-checkbox-non-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary". |
cookielawinfo-checkbox-non-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary". |
cookielawinfo-checkbox-non-necessary | 1 year | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary". |
cookielawinfo-checkbox-performance | 1 year | Set by the GDPR Cookie Consent plugin, this cookie is used to store the user consent for cookies in the category "Performance". |
cppro-ft | 1 year | No description |
cppro-ft | 7 years 1 months 12 days 23 hours 59 minutes | No description |
cppro-ft | 7 years 1 months 12 days 23 hours 59 minutes | No description |
cppro-ft | 1 year | No description |
cppro-ft-style | 1 year | No description |
cppro-ft-style | 1 year | No description |
cppro-ft-style | session | No description |
cppro-ft-style | session | No description |
cppro-ft-style-temp | 23 hours 59 minutes | No description |
cppro-ft-style-temp | 23 hours 59 minutes | No description |
cppro-ft-style-temp | 23 hours 59 minutes | No description |
cppro-ft-style-temp | 1 year | No description |
i18n | 10 years | No description available. |
IE-jwt | 62 years 6 months 9 days 9 hours | No description |
IE-LANG_CODE | 62 years 6 months 9 days 9 hours | No description |
IE-set_country | 62 years 6 months 9 days 9 hours | No description |
JSESSIONID | session | The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application. |
viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |
viewed_cookie_policy | 1 year | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |
viewed_cookie_policy | 1 year | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |
viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |
VISITOR_INFO1_LIVE | 5 months 27 days | A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface. |
wmc | 9 years 11 months 30 days 11 hours 59 minutes | No description |
Cookie | Duration | Description |
---|---|---|
__cf_bm | 30 minutes | This cookie, set by Cloudflare, is used to support Cloudflare Bot Management. |
sp_landing | 1 day | The sp_landing is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content. |
sp_t | 1 year | The sp_t cookie is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content. |
Cookie | Duration | Description |
---|---|---|
_hjAbsoluteSessionInProgress | 1 year | No description |
_hjAbsoluteSessionInProgress | 1 year | No description |
_hjAbsoluteSessionInProgress | 1 year | No description |
_hjAbsoluteSessionInProgress | 1 year | No description |
_hjFirstSeen | 29 minutes | No description |
_hjFirstSeen | 29 minutes | No description |
_hjFirstSeen | 29 minutes | No description |
_hjFirstSeen | 1 year | No description |
_hjid | 11 months 29 days 23 hours 59 minutes | This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID. |
_hjid | 11 months 29 days 23 hours 59 minutes | This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID. |
_hjid | 1 year | This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID. |
_hjid | 1 year | This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID. |
_hjIncludedInPageviewSample | 1 year | No description |
_hjIncludedInPageviewSample | 1 year | No description |
_hjIncludedInPageviewSample | 1 year | No description |
_hjIncludedInPageviewSample | 1 year | No description |
_hjSession_1776154 | session | No description |
_hjSessionUser_1776154 | session | No description |
_hjTLDTest | 1 year | No description |
_hjTLDTest | 1 year | No description |
_hjTLDTest | session | No description |
_hjTLDTest | session | No description |
_lfa_test_cookie_stored | past | No description |
Cookie | Duration | Description |
---|---|---|
loglevel | never | No description available. |
prism_90878714 | 1 month | No description |
redirectFacebook | 2 minutes | No description |
YSC | session | YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages. |
yt-remote-connected-devices | never | YouTube sets this cookie to store the video preferences of the user using embedded YouTube video. |
yt-remote-device-id | never | YouTube sets this cookie to store the video preferences of the user using embedded YouTube video. |
yt.innertube::nextId | never | This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen. |
yt.innertube::requests | never | This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen. |