hardware content

2023-10-01 18:58:54 +06:00 · 2023-10-01 18:58:54 +06:00 · e63840ac2c
commit e63840ac2c
parent 9acf1b2b43
7 changed files with 454 additions and 23 deletions
--- a/docs/docs/hardware/hardware.md
+++ b/docs/docs/hardware/hardware.md
@ -1,6 +1,60 @@
 ---
 sidebar_position: 1
-title: Hardware
+title: Introduction
 ---

-TODO
+## How to choose LLMs for your work
+
+Choosing the right Large Language Model (LLM) doesn't have to be complicated. It's all about finding one that works well for your needs. Here's a simple guide to help you pick the perfect LLM:
+
+1. **Set Up the Basics**: First, get everything ready on your computer. Make sure you have the right software and tools to run these models. Then, give them a try on your system.
+2. **Watch Your Memory**: Pay attention to how much memory these models are using. Some are bigger than others, and you need to make sure your computer can handle them.
+3. **Find Compatible Models**: Look for the models that are like the top players in the game. These models are known to work well with the tools you're using. Keep these models in your shortlist.
+4. **Test Them Out**: Take the models on your shortlist and give them a try with your specific task. This is like comparing different cars by taking them for a test drive. It helps you see which one works best for what you need.
+5. **Pick the Best Fit**: After testing, you'll have a better idea of which model is the winner for your project. Consider things like how well it performs, how fast it is, if it works with your computer, and the software you're using.
+6. **Stay Updated**: Remember that this field is always changing and improving. Keep an eye out for updates and new models that might be even better for your needs.
+
+And the good news is, finding the right LLM is easier now. We've got a handy tool called Extractum LLM Explorer that you can use online. It helps you discover, compare, and rank lots of different LLMs. Check it out at **[Extractum](http://llm.extractum.io/)**, and it'll make your selection process a breeze!
+
+You can also use [Model Memory Calculator](https://huggingface.co/spaces/hf-accelerate/model-memory-usage) tool designed to assist in determining the required vRAM for training and conducting inference with large models hosted on the Hugging Face Hub. The tool identifies the minimum recommended vRAM based on the size of the 'largest layer' within the model. Additionally, it's worth noting that model training typically necessitates approximately four times the size of the model, especially when using the Adam optimization. Keep in mind When performing inference, expect to add up to an additional 20% to this as found by [EleutherAI](https://blog.eleuther.ai/transformer-math/). More tests will be performed in the future to get a more accurate benchmark for each model.
+
+## How to Calculate How Much vRAM is Required to My Selected LLM
+
+**For example:** Calculating the VRAM required to run a 13-billion-parameter Large Language Model (LLM) involves considering the model size, batch size, sequence length, token size, and any additional overhead. Here's how you can estimate the VRAM required for a 13B LLM:
+
+1. **Model Size**: Find out the size of the 13B LLM in terms of the number of parameters. This information is typically provided in the model's documentation. A 13-billion-parameter model has 13,000,000,000 parameters.
+2. **Batch Size**: Decide on the batch size you want to use during inference. The batch size represents how many input samples you process simultaneously. Smaller batch sizes require less VRAM.
+3. **Sequence Length**: Determine the average length of the input text sequences you'll be working with. Sequence length can impact VRAM requirements; longer sequences need more memory.
+4. **Token Size**: Understand the memory required to store one token in bytes. Most LLMs use 4 bytes per token.
+5. **Overhead**: Consider any additional memory overhead for intermediate computations and framework requirements. Overhead can vary but should be estimated based on your specific setup.
+
+Use the following formula to estimate the VRAM required:
+
+**VRAM Required (in gigabytes)** = `Model Parameters x Token Size x Batch Size x Sequence Length + Overhead`
+
+- **Model Parameters**: 13,000,000,000 parameters for a 13B LLM.
+- **Token Size**: Usually 4 bytes per token.
+- **Batch Size**: Choose your batch size.
+- **Sequence Length**: The average length of input sequences.
+- **Overhead**: Any additional VRAM required based on your setup.
+
+Here's an example:
+
+Suppose you want to run a 13B LLM with the following parameters:
+
+- **Batch Size**: 4
+- **Sequence Length**: 512 tokens
+- **Token Size**: 4 bytes
+- **Estimated Overhead**: 2 GB
+
+VRAM Required (in gigabytes) = `(13,000,000,000 x 4 x 4 x 512) + 2`
+
+VRAM Required (in gigabytes) = `(8,388,608,000) + 2,000`
+
+VRAM Required (in gigabytes) ≈ `8,390,608,000 bytes`
+
+To convert this to gigabytes, divide by `1,073,741,824 (1 GB)`
+
+VRAM Required (in gigabytes) ≈ `8,390,608,000 / 1,073,741,824 ≈ 7.8 GB`
+
+So, to run a 13-billion-parameter LLM with the specified parameters and overhead, you would need approximately 7.8 gigabytes of VRAM on your GPU. Make sure to have some additional VRAM for stable operation and consider testing the setup in practice to monitor VRAM usage accurately.
--- a/docs/docs/hardware/overview/cloud-vs-buy.md
+++ b/docs/docs/hardware/overview/cloud-vs-buy.md
@ -1,3 +1,54 @@
 ---
-title: Cloud vs. Buy
---
+title: Cloud vs. In-House Servers
+---
+
+## How to Decide for Your Business
+
+In recent months, one of the most critical infrastructure decisions for organizations has been whether to opt for cloud-based servers or in-house servers to run LLMs. Finding the right balance for your needs is essential.
+
+As open-source Large Language Models like LLaMA 2 and Falcon gain prominence across various industries, business leaders are grappling with significant infrastructure decisions. The fundamental question arises: should your company host LLMs in the cloud or invest in in-house servers? The cloud offers ease of deployment, flexible scalability, and alleviates maintenance burdens, making it an attractive choice for many mainstream business applications. However, this convenience often comes at the cost of reliance on internet connectivity and relinquishing some control over your data. In contrast, in-house servers necessitate a substantial upfront investment but provide complete autonomy, predictable costs, and the ability to customize your infrastructure.
+
+## In-House Servers
+
+"Great power comes with great responsibility." Running your own in-house server for large language models (LLMs) involves setting up and maintaining your hardware and software infrastructure to run and host LLMs. This can be complex and expensive due to the initial equipment investment.
+
+### Pros of Running LLMs Locally
+
+- **Full Control:** In-house servers provide complete control over hardware, software, and configurations. This level of customization is invaluable for businesses with unique or specialized requirements.
+- **Data Privacy:** For organizations handling highly sensitive data, in-house servers provide greater control over data privacy and security.
+- **Low Latency:** In-house servers can provide low-latency access to local users, ensuring optimal performance for critical applications.
+- **Predictable Costs:** Ongoing maintenance costs can be more predictable, and hardware upgrades can be planned according to the organization's budget and timeline.
+
+### Cons of Running LLMs Locally
+
+- **High Initial Costs:** Building and maintaining an in-house server involves significant capital expenditures. This can be a barrier for small businesses or startups with limited budgets.
+- **Disaster Recovery:** In-house servers may not offer the same level of redundancy and disaster recovery capabilities as major cloud providers. Ensuring business continuity in the face of hardware failures or disasters becomes the organization's responsibility.
+- **Maintenance Burden:** In-house server management necessitates a dedicated IT team for maintenance, updates, security, and backups, diverting resources from research and development.
+- **Limited Scalability:** Scaling up in-house servers can be complex and costly. Additional hardware acquisitions and installations can be time-consuming.
+
+## Cloud Servers
+
+Running LLMs in the cloud means using cloud computing resources to train, deploy, and manage large language models (LLMs). Cloud computing allows access to powerful computational resources on demand, without the need to maintain expensive hardware or infrastructure. This can make it easier and more cost-effective to develop, test, and use advanced AI models like LLMs.
+
+### Pros of Using LLMs in the Cloud
+
+- **Scalability:** One of the foremost advantages of cloud servers is their scalability. Cloud providers offer the ability to scale resources up or down on-demand. This means that businesses can efficiently accommodate fluctuating workloads without the need for significant upfront investments in hardware.
+- **Initial Costs:** You don't have to invest in on-site hardware or incur capital expenses. This solution is particularly suitable for smaller companies that might rapidly exhaust their storage capacity.
+- **Ease of Use:** The cloud platform provides a variety of APIs, tools, and language frameworks that make it significantly easier to create, train, and deploy machine learning models.
+- **Accessibility:** Cloud servers are accessible from anywhere with an internet connection. This enables remote work and collaboration across locations. In-house servers require employees to be on-premises.
+- **Managed Services:** Cloud providers offer a plethora of managed services, such as automated backups, security solutions, and database management. This offloads many administrative tasks, allowing businesses to focus on their core objectives.
+- **Built-in AI Accelerators:** Cloud providers offer hardware accelerators like Nvidia GPUs and Google TPUs that are optimized for AI workloads and challenging for on-prem environments to match.
+
+### Cons of Using LLMs in the Cloud
+
+- **Limited Control:** Cloud users have limited control over the underlying infrastructure. This may not be suitable for businesses with specific hardware or software requirements that cannot be met within the cloud provider's ecosystem.
+- **Data Security Concerns:** Entrusting sensitive data to a third-party cloud provider can raise security concerns. While major cloud providers employ robust security measures and comply with industry standards, businesses must still take responsibility for securing their data within the cloud environment.
+- **Internet Dependency:** Cloud servers rely on internet connectivity. Any disruptions in internet service can impact access to critical applications and data. Businesses should have contingency plans for such scenarios.
+- **Cost Unpredictability:** While cloud bills typically start small, costs for GPUs, data storage, and bandwidth can grow rapidly as workloads scale, making long-term TCO difficult to predict.
+- **Lack of Customization:** In-house servers allow full hardware and software customization to meet specific needs. Cloud environments offer less customization and control.
+
+## Conclusion
+
+The decision to run LLMs in the cloud or on in-house servers is not one-size-fits-all. It depends on your business's specific needs, budget, and security considerations. Cloud-based LLMs offer scalability and cost-efficiency but come with potential security concerns, while in-house servers provide greater control, customization, and cost predictability.
+
+In some situations, using a mix of cloud and in-house resources can be the best way to go. Businesses need to assess their needs and assets carefully to pick the right method for using LLMs in the ever-changing world of AI technology.
--- a/docs/docs/hardware/overview/cpu-vs-gpu.md
+++ b/docs/docs/hardware/overview/cpu-vs-gpu.md
@ -1,3 +1,78 @@
 ---
-title: CPU vs. GPU
---
+title: GPU vs CPU What's the Difference?
+---
+
+## Introduction
+
+In the realm of machine learning, the choice of hardware can be the difference between slow, inefficient training and lightning-fast model convergence. Central Processing Units (CPUs) and Graphics Processing Units (GPUs) are the two primary players in this computational showdown. In this article, we'll delve deep into the architecture, pros, and cons of CPUs and GPUs for machine learning tasks, with a focus on their application in Large Language Models (LLMs).
+
+## Difference between CPU and GPU
+
+### CPU Basics:
+
+Central Processing Units, or CPUs, are the workhorses of traditional computing. They consist of various components, including the Arithmetic Logic Unit (ALU), registers, and cache. CPUs are renowned for their precision in executing instructions and their versatility across a wide range of computing tasks.
+
+### Pros of CPU for Machine Learning:
+
+**Precision:** CPUs excel in precise numerical calculations, making them ideal for tasks that require high accuracy. Versatility: They can handle a wide variety of tasks, from web browsing to database management. Cons of CPU for Machine Learning:
+
+**Versatility:** They can handle a wide variety of tasks, from web browsing to database management.
+
+### Cons of CPU for Machine Learning:
+
+**Limited Parallelism:** CPUs are inherently sequential processors, which makes them less efficient for parallelizable machine learning tasks.
+
+**Slower for Complex ML Tasks:** Deep learning and other complex machine learning algorithms can be slow on CPUs due to their sequential nature.
+
+### GPU Basics:
+
+Graphics Processing Units, or GPUs, were originally designed for rendering graphics, but their architecture has proven to be a game-changer for machine learning. GPUs consist of numerous cores and feature a highly efficient memory hierarchy. Their parallel processing capabilities set them apart.
+
+### Pros of GPU for Machine Learning:
+
+**Massive Parallelism:** GPUs can process thousands of parallel threads simultaneously, making them exceptionally well-suited for machine learning tasks that involve matrix operations.
+
+**Speed:** Deep learning algorithms benefit greatly from GPU acceleration, resulting in significantly faster training times.
+
+### Cons of GPU for Machine Learning:
+
+**Higher Power Consumption:** GPUs can be power-hungry, which might impact operational costs.
+
+**Limited Flexibility:** They are optimized for parallelism and may not be as versatile as CPUs for non-parallel tasks.
+
+![CPU VS GPU](https://media.discordapp.net/attachments/964896173401976932/1157998193741660222/CPU-vs-GPU-rendering.png?ex=651aa55b&is=651953db&hm=a22c80ed108a0d25106a20aa25236f7d0fa74167a50788194470f57ce7f4a6ca&=&width=807&height=426)
+
+## Similarities Between CPUs and GPUs
+
+CPUs (Central Processing Units) and GPUs (Graphics Processing Units) are both integral hardware components that power computers, functioning as the "brains" of these devices. Despite their distinct purposes, they share several key internal components that contribute to their functionality:
+
+**Cores:** Both CPUs and GPUs have cores that do the thinking and calculations. CPUs have fewer but powerful cores, while GPUs have many cores for multitasking.
+
+**Memory:** They use memory to work faster. Think of it like their short-term memory. CPUs and GPUs have different levels of memory, but it helps them process things quickly.
+
+**Control Unit:** This unit makes sure everything runs smoothly. CPUs and GPUs with higher frequencies are faster, but they're good at different tasks.
+
+In short, CPUs and GPUs share core elements that help them process information quickly, even though they have different roles in a computer.
+
+## Summary of differences: CPU vs. GPU
+
+|                     | CPU                                                                      | GPU                                                     |
+| ------------------- | ------------------------------------------------------------------------ | ------------------------------------------------------- |
+| **Function**        | Generalized component that handles main processing functions of a server | Specialized component that excels at parallel computing |
+| **Processing**      | Designed for serial instruction processing                               | Designed for parallel instruction processing            |
+| **Design**          | Fewer, more powerful cores                                               | More cores than CPUs, but less powerful than CPU cores  |
+| **Best suited for** | General-purpose computing applications                                   | High-performance computing applications                 |
+
+## CPU vs. GPU in Machine Learning
+
+When choosing between CPUs and GPUs for machine learning, the decision often boils down to the specific task at hand. For tasks that rely on precision and versatility, CPUs may be preferred. In contrast, for deep learning and highly parallelizable tasks, GPUs shine.
+
+For example, training a Large Language Model (LLM) like LLAMA2 on a CPU would be painfully slow and inefficient. On the other hand, a GPU can significantly speed up the training process, making it a preferred choice for LLMs.
+
+## Future Trends and Developments
+
+The world of hardware for machine learning is constantly evolving. CPUs and GPUs are becoming more powerful and energy-efficient. Additionally, specialized AI hardware, such as Neural Processing Units (NPUs), is emerging. These developments are poised to further revolutionize machine learning, making it essential for professionals in the field to stay updated with the latest trends.
+
+## Conclusion
+
+In the battle of brains for machine learning, the choice between CPUs and GPUs depends on the specific requirements of your tasks. CPUs offer precision and versatility, while GPUs excel in parallel processing and speed. Understanding the nuances of these architectures is crucial for optimizing machine learning workflows, especially when dealing with Large Language Models (LLMs). As technology advances, the lines between CPU and GPU capabilities may blur, but for now, choosing the right hardware can be the key to unlocking the full potential of your machine learning endeavors. Stay tuned for the ever-evolving landscape of AI hardware, and choose wisely to power your AI-driven future.
--- a/docs/docs/hardware/recommendations/by-budget.md
+++ b/docs/docs/hardware/recommendations/by-budget.md
@ -2,12 +2,61 @@
 title: Recommended AI Hardware by Budget
 ---

-## $1,000
+> :warning: **Warning:** Hardware suggestions on this site are only recommendations. Do your own research before any purchase. I'm not liable for compatibility, performance or other issues. Products can become outdated quickly. Use judgement and check return policies. Treat as informative guide only.

-## $2,500
+## Entry-level PC Build at $1000

-## $5,000
+| Type                 | Item                                                       | Price    |
+| :------------------- | :--------------------------------------------------------- | :------- |
+| **CPU**              | [Intel Core i5 12400 2.5GHz 6-Core Processor](#)           | $170.99  |
+| **CPU Cooler**       | [Intel Boxed Cooler (Included with CPU)](#)                | Included |
+| **Motherboard**      | [ASUS Prime B660-PLUS DDR4 ATX LGA1700](#)                 | $169.95  |
+| **GPU**              | [Nvidia RTX 3050 8GB - ZOTAC Gaming Twin Edge](#)          | $250     |
+| **Memory**           | [16GB (2 x 8GB) G.Skill Ripjaws V DDR4-3200 C16](#)        | $49.99   |
+| **Storage PCIe-SSD** | [ADATA XPG SX8200 Pro 512GB NVMe M.2 Solid State Drive](#) | $46.50   |
+| **Power Supply**     | [Corsair CX-M Series CX450M 450W ATX 2.4 Power Supply](#)  | $89.99   |
+| **Case**             | [be quiet! Pure Base 600 Black ATX Mid Tower Case](#)      | $97.00   |
+| **Total**            |                                                            | $870     |

-## $7,500
+## Entry-level PC Build at $1,500

-## $10,000
+| Type                 | Item                                                     | Price   |
+| :------------------- | :------------------------------------------------------- | :------ |
+| **CPU**              | [Intel Core i5 12600K 3.7GHz 6-Core Processor](#)        | $269.99 |
+| **CPU Cooler**       | [be quiet! Dark Rock Pro 4](#)                           | $99.99  |
+| **Motherboard**      | [ASUS ProArt B660-Creator DDR4 ATX LGA1700](#)           | $229.99 |
+| **GPU**              | [Nvidia RTX 3050 8GB - ZOTAC Gaming Twin Edge](#)        | $349.99 |
+| **Memory**           | [32GB (2 x 16GB) G.Skill Ripjaws V DDR4-3200 C16](#)     | $129.99 |
+| **Storage PCIe-SSD** | [ADATA XPG SX8200 Pro 1TB NVMe M.2 Solid State Drive](#) | $109.99 |
+| **Power Supply**     | [Corsair RMx Series RM650x 650W ATX 2.4 Power Supply](#) | $119.99 |
+| **Case**             | [Corsair Carbide Series 200R ATX Mid Tower Case](#)      | $59.99  |
+| **Total**            |                                                          | $1371   |
+
+## Mid-range PC Build at $3000
+
+| Type             | Item                                                                                                                                                                                                                     | Price     |
+| :--------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-------- |
+| **CPU**          | [AMD Ryzen 9 7950X 4.5 GHz 16-Core Processor](https://de.pcpartpicker.com/product/22XJ7P/amd-ryzen-9-7950x-45-ghz-16-core-processor-100-100000514wof)                                                                    | $556      |
+| **CPU Cooler**   | [Thermalright Peerless Assassin 120 White 66.17 CFM CPU Cooler](https://de.pcpartpicker.com/product/476p99/thermalright-peerless-assassin-120-white-6617-cfm-cpu-cooler-pa120-white)                                     | $59.99    |
+| **Motherboard**  | [Gigabyte B650 GAMING X AX ATX AM5 Motherboard](https://de.pcpartpicker.com/product/YZgFf7/gigabyte-b650-gaming-x-ax-atx-am5-motherboard-b650-gaming-x-ax)                                                               | $199.99   |
+| **Memory**       | [G.Skill Ripjaws S5 64 GB (2 x 32 GB) DDR5-6000 CL32 Memory](https://de.pcpartpicker.com/product/BJcG3C/gskill-ripjaws-s5-64-gb-2-x-32-gb-ddr5-6000-cl32-memory-f5-6000j3238g32gx2-rs5k)                                 | $194      |
+| **Storage**      | [Crucial P5 Plus 2 TB M.2-2280 PCIe 4.0 X4 NVME Solid ](https://de.pcpartpicker.com/product/VZWzK8/crucial-p5-plus-2-tb-m2-2280-pcie-40-x4-nvme-solid-state-drive-ct2000p5pssd8)                                         | $165.99   |
+| **GPU**          | [PNY XLR8 Gaming VERTO EPIC-X RGB OC GeForce RTX 4090 24 GB](https://de.pcpartpicker.com/product/TvpzK8/pny-xlr8-gaming-verto-epic-x-rgb-oc-geforce-rtx-4090-24-gb-video-card-vcg409024tfxxpb1-o)                        | $1,599.99 |
+| **Case**         | [Fractal Design Pop Air ATX Mid Tower Case](https://de.pcpartpicker.com/product/QnD7YJ/fractal-design-pop-air-atx-mid-tower-case-fd-c-poa1a-02)                                                                          | $89.99    |
+| **Power Supply** | [Thermaltake Toughpower GF A3 - TT Premium Edition 1050 W 80+ Gold](https://de.pcpartpicker.com/product/4v3NnQ/thermaltake-toughpower-gf-a3-1050-w-80-gold-certified-fully-modular-atx-power-supply-ps-tpd-1050fnfagu-l) | $139.99   |
+|                  |
+| **Total**        | **$3000**                                                                                                                                                                                                                |
+
+## High-End PC Build at $6,000
+
+| Type             | Item                                                                                                                                                                                     | Price    |
+| :--------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------- |
+| **CPU**          | [AMD Ryzen 9 3900X 3.8 GHz 12-Core Processor](https://pcpartpicker.com/product/tLCD4D/amd-ryzen-9-3900x-36-ghz-12-core-processor-100-100000023box)                                       | $365.00  |
+| **CPU Cooler**   | [Noctua NH-U12S chromax.black 55 CFM CPU Cooler](https://pcpartpicker.com/product/dMVG3C/noctua-nh-u12s-chromaxblack-55-cfm-cpu-cooler-nh-u12s-chromaxblack)                             | $89.95   |
+| **Motherboard**  | [Asus ProArt X570-CREATOR WIFI ATX AM4 Motherboard](https://pcpartpicker.com/product/8y8bt6/asus-proart-x570-creator-wifi-atx-am4-motherboard-proart-x570-creator-wifi)                  | $599.99  |
+| **Memory**       | [Corsair Vengeance LPX 128 GB (4 x 32 GB) DDR4-3200 CL16 Memory](https://pcpartpicker.com/product/tRH8TW/corsair-vengeance-lpx-128-gb-4-x-32-gb-ddr4-3200-memory-cmk128gx4m4e3200c16)    | $249.99  |
+| **Storage**      | [Sabrent Rocket 4 Plus 2 TB M.2-2280 PCIe 4.0 X4 NVME Solid State Drive](https://pcpartpicker.com/product/PMBhP6/sabrent-rocket-4-plus-2-tb-m2-2280-nvme-solid-state-drive-sb-rkt4p-2tb) | $129.99  |
+| **GPU**          | [PNY RTX A-Series RTX A6000 48 GB Video Card](https://pcpartpicker.com/product/HWt9TW/pny-rtx-a-series-rtx-a6000-48-gb-video-card-vcnrtxa6000-pb)                                        | $4269.00 |
+| **Power Supply** | [EVGA SuperNOVA 850 G2 850 W 80+ Gold ](https://pcpartpicker.com/product/LCfp99/evga-supernova-850-g2-850-w-80-gold-certified-fully-modular-atx-power-supply-220-g2-0850-xr)             | $322.42  |
+|                  |
+| **Total**        | **$6026.34**                                                                                                                                                                             |
--- a/docs/docs/hardware/recommendations/by-hardware.md
+++ b/docs/docs/hardware/recommendations/by-hardware.md
@ -2,6 +2,146 @@
 title: Recommended AI Models by Hardware
 ---

+## Overview
+
+Large language models(LLMs) have changed how computers handle human-like text and language tasks. However using them through APIs or cloud services can be costly and limiting. What if you could use these powerful models directly on your personal computer, without extra expenses? Running them on your own machine provides flexibility, control, and cost savings. In this guide, I'll show you how to do just that, unlocking their potential without relying on expensive APIs or cloud services.
+
+To run Large language models(LLMs) model at home machine, you will need a computer built with a GPU that can handle the large amount of data and computation required for inferencing.
+
+The GPU stands as the pivotal component for running LLMs, bearing the primary responsibility for processing tasks associated with model execution. The performance of the GPU directly dictates the speed of inference.
+
+While certain model variations and implementations may demand less potent hardware, the GPU retains its central role as the cornerstone of the system. The advantage of the GPU is that it can significantly improve performance compared to the CPU.
+
+## GPU Selection
+
+Selecting the optimal GPU for running Large Language Models (LLMs) on your home PC is a decision influenced by your budget and the specific LLMs you intend to work with. Your choice should strike a balance between performance, efficiency, and cost-effectiveness.
+
+In general, the following GPU features are important for running LLMs:
+
+- **High VRAM:** LLMs are typically very large and complex models, so they require a GPU with a high amount of VRAM. This will allow the model to be loaded into memory and processed efficiently.
+- **CUDA Compatibility:** When running LLMs on a GPU, CUDA compatibility is paramount. CUDA is NVIDIA's parallel computing platform, and it plays a vital role in accelerating deep learning tasks. LLMs, with their extensive matrix calculations, heavily rely on parallel processing. Ensuring your GPU supports CUDA is like having the right tool for the job. It allows the LLM to leverage the GPU's parallel processing capabilities, significantly speeding up model training and inference.
+- **Number of CUDA, Tensor, and RT Cores:** High-performance NVIDIA GPUs have both CUDA and Tensor cores. These cores are responsible for executing the neural network computations that underpin LLMs' language understanding and generation. The more CUDA cores your GPU has, the better equipped it is to handle the massive computational load that LLMs impose. Tensor cores in your GPU, further enhance LLM performance by accelerating the critical matrix operations integral to language modeling tasks.
+- **Generation (Series)**: When selecting a GPU for LLMs, consider its generation or series (e.g., RTX 30 series). Newer GPU generations often come with improved architectures and features. For LLM tasks, opting for the latest generation can mean better performance, energy efficiency, and support for emerging AI technologies. Avoid purchasing, RTX-2000 series GPUs which are much outdated nowadays.
+
+### Here are some of the best GPU options for this purpose:
+
+1. **NVIDIA RTX 3090**: The NVIDIA RTX 3090 is a high-end GPU with a substantial 24GB of VRAM. This copious VRAM capacity makes it exceptionally well-suited for handling large LLMs. Moreover, it's known for its relative efficiency, meaning it won't overheat or strain your home PC's cooling system excessively. The RTX 3090's robust capabilities are a boon for those who need to work with hefty language models.
+2. **NVIDIA RTX 4090**: If you're looking for peak performance and can afford the investment, the NVIDIA RTX 4090 represents the pinnacle of GPU power. Boasting 24GB of VRAM and featuring a cutting-edge Tensor Core architecture tailored for AI workloads, it outshines the RTX 3090 in terms of sheer capability. However, it's important to note that the RTX 4090 is also pricier and more power-hungry than its predecessor, the RTX 3090.
+3. **AMD Radeon RX 6900 XT**: On the AMD side, the Radeon RX 6900 XT stands out as a high-end GPU with 16GB of VRAM. While it may not quite match the raw power of the RTX 3090 or RTX 4090, it strikes a balance between performance and affordability. Additionally, it tends to be more power-efficient, which could translate to a more sustainable and quieter setup in your home PC.
+
+If budget constraints are a consideration, there are more cost-effective GPU options available:
+
+- **NVIDIA RTX 3070**: The RTX 3070 is a solid mid-range GPU that can handle LLMs effectively. While it may not excel with the most massive or complex language models, it's a reliable choice for users looking for a balance between price and performance.
+- **AMD Radeon RX 6800 XT**: Similarly, the RX 6800 XT from AMD offers commendable performance without breaking the bank. It's well-suited for running mid-sized LLMs and provides a competitive option in terms of both power and cost.
+
+When selecting a GPU for LLMs, remember that it's not just about the GPU itself. Consider the synergy with other components in your PC:
+
+- **CPU**: To ensure efficient processing, pair your GPU with a powerful CPU. LLMs benefit from fast processors, so having a capable CPU is essential.
+- **RAM**: Sufficient RAM is crucial for LLMs. They can be memory-intensive, and having enough RAM ensures smooth operation.
+- **Cooling System**: LLMs can push your PC's hardware to the limit. A robust cooling system helps maintain optimal temperatures, preventing overheating and performance throttling.
+
+By taking all of these factors into account, you can build a home PC setup that's well-equipped to handle the demands of running LLMs effectively and efficiently.
+
+## CPU Selection
+
+Selecting the right CPU for running Large Language Models (LLMs) on your home PC is contingent on your budget and the specific LLMs you intend to work with. It's a decision that warrants careful consideration, as the CPU plays a pivotal role in determining the overall performance of your system.
+
+In general, the following CPU features are important for running LLMs:
+
+- **Number of Cores and Threads:** the number of CPU cores and threads influences parallel processing. More cores and threads help handle the complex computations involved in language models. For tasks like training and inference, a higher core/thread count can significantly improve processing speed and efficiency, enabling quicker results.
+- **High clock speed:** The base clock speed, or base frequency, represents the CPU's default operating speed. So having a CPU with a high clock speed. This will allow the model to process instructions more quickly, which can further improve performance.
+- **Base Power (TDP):** LLMs often involve long training sessions and demanding computations. Therefore, a lower Thermal Design Power (TDP) is desirable. A CPU with a lower TDP consumes less power and generates less heat during prolonged LLM operations. This not only contributes to energy efficiency but also helps maintain stable temperatures in your system, preventing overheating and potential performance throttling.
+- **Generation (Series):** Consider its generation or series (e.g., 9th Gen, 11th Gen Intel Core). Newer CPU generations often come with architectural improvements that enhance performance and efficiency. For LLM tasks, opting for a more recent generation can lead to faster and more efficient language model training and inference.
+- **Support for AVX512:** AVX512 is a set of vector instruction extensions that can be used to accelerate machine learning workloads. Many LLMs are optimized to take advantage of AVX512, so it is important to make sure that your CPU supports this instruction set.
+
+### Here are some CPU options for running LLMs:
+
+1. **Intel Core i7-12700K**: Slightly less potent than the Core i9-12900K, the Intel Core i7-12700K is still a powerful CPU. With 12 cores and 20 threads, it strikes a balance between performance and cost-effectiveness. This CPU is well-suited for running mid-sized and large LLMs, making it a compelling option.
+2. **Intel Core i9-12900K**: Positioned as a high-end CPU, the Intel Core i9-12900K packs a formidable punch with its 16 cores and 24 threads. It's one of the fastest CPUs available, making it an excellent choice for handling large and intricate LLMs. The abundance of cores and threads translates to exceptional parallel processing capabilities, which is crucial for tasks involving massive language models.
+3. **AMD Ryzen 9 5950X**: Representing AMD's high-end CPU offering, the Ryzen 9 5950X boasts 16 cores and 32 threads. While it may not quite match the speed of the Core i9-12900K, it remains a robust and cost-effective choice. Its multicore prowess enables smooth handling of LLM workloads, and its affordability makes it an attractive alternative.
+4. **AMD Ryzen 7 5800X**: Slightly less potent than the Ryzen 9 5950X, the Ryzen 7 5800X is still a formidable CPU with 8 cores and 16 threads. It's well-suited for running mid-sized and smaller LLMs, providing a compelling blend of performance and value.
+
+For those operating within budget constraints, there are more budget-friendly CPU options:
+
+- **Intel Core i5-12600K**: The Core i5-12600K is a capable mid-range CPU that can still handle LLMs effectively, though it may not be optimized for the largest or most complex models.
+- **AMD Ryzen 5 5600X**: The Ryzen 5 5600X offers a balance of performance and affordability. It's suitable for running smaller to mid-sized LLMs without breaking the bank.
+
+**When selecting a CPU for LLMs, consider the synergy with other components in your PC:**
+
+- **GPU**: Pair your CPU with a powerful GPU to ensure smooth processing of LLMs. Some language models, particularly those used for AI, rely on GPU acceleration for optimal performance.
+- **RAM**: Adequate RAM is essential for LLMs, as these models can be memory-intensive. Having enough RAM ensures that your CPU can operate efficiently without bottlenecks.
+- **Cooling System**: Given the resource-intensive nature of LLMs, a robust cooling system is crucial to maintain optimal temperatures and prevent performance throttling.
+
+By carefully weighing your budget and performance requirements and considering the interplay of components in your PC, you can assemble a well-rounded system that's up to the task of running LLMs efficiently.
+
+> :memo: **Note:** It is important to note that these are just general recommendations. The specific CPU requirements for your LLM will vary depending on the specific model you are using and the tasks that you want to perform with it. If you are unsure what CPU to get, it is best to consult with an expert.
+
+## RAM Selection
+
+The amount of RAM you need to run an LLM depends on the size and complexity of the model, as well as the tasks you want to perform with it. For example, if you are simply running inference on a pre-trained LLM, you may be able to get away with using a relatively modest amount of RAM. However, if you are training a new LLM from scratch, or if you are running complex tasks like fine-tuning or code generation, you will need more RAM.
+
+### Here is a general guide to RAM selection for running LLMs:
+
+- **Capacity:** The amount of RAM you need will depend on the size and complexity of the LLM model you want to run. For inference, you will need at least 16GB of RAM, but 32GB or more is ideal for larger models and more complex tasks. For training, you will need at least 64GB of RAM, but 128GB or more is ideal for larger models and more complex tasks.
+- **Speed:** LLMs can benefit from having fast RAM, so it is recommended to use DDR4 or DDR5 RAM with a speed of at least 3200MHz.
+- **Latency:** RAM latency is the amount of time it takes for the CPU to access data in memory. Lower latency is better for performance, so it is recommended to look for RAM with a low latency rating.
+- **Timing:** RAM timing is a set of parameters that control how the RAM operates. It is important to make sure that the RAM timing is compatible with your motherboard and CPU.
+
+R**ecommended RAM** **options for running LLMs:**
+
+- **Inference:** For inference on pre-trained LLMs, you will need at least 16GB of RAM. However, 32GB or more is ideal for larger models and more complex tasks.
+- **Training:** For training LLMs from scratch, you will need at least 64GB of RAM. However, 128GB or more is ideal for larger models and more complex tasks.
+
+In addition to the amount of RAM, it is also important to consider the speed of the RAM. LLMs can benefit from having fast RAM, so it is recommended to use DDR4 or DDR5 RAM with a speed of at least 3200MHz.
+
+
+## Motherboard Selection
+
+When picking a motherboard to run advanced language models, you need to think about a few things. First, consider the specific language model you want to use, the type of CPU and GPU in your computer, and your budget. Here are some suggestions:
+
+1. **ASUS ROG Maximus Z790 Hero:** This is a top-notch motherboard with lots of great features. It works well with Intel's latest CPUs, fast DDR5 memory, and PCIe 5.0 devices. It's also good at keeping things cool, which is important for running demanding language models.
+2. **MSI MEG Z790 Ace:** Similar to the ASUS ROG Maximus, this motherboard is high-end and has similar features. It's good for running language models too.
+3. **Gigabyte Z790 Aorus Master:** This one is more budget-friendly but still works great with Intel's latest CPUs, DDR5 memory, and fast PCIe 5.0 devices. It's got a strong power system, which helps with running language models.
+
+If you're on a tighter budget, you might want to check out mid-range options like the **ASUS TUF Gaming Z790-Plus WiFi** or the **MSI MPG Z790 Edge WiFi DDR5**. They offer good performance without breaking the bank.
+
+No matter which motherboard you pick, make sure it works with your CPU and GPU. Also, check that it has the features you need, like enough slots for your GPU and storage drives.
+
+Other things to think about when choosing a motherboard for language models:
+
+- **Cooling:** Language models can make your CPU work hard, so a motherboard with good cooling is a must. This keeps your CPU from getting too hot.
+- **Memory:** Language models need lots of memory, so make sure your motherboard supports a good amount of it. Check if it works with the type of memory you want to use, like DDR5 or DDR4.
+- **Storage:** Language models can create and store a ton of data. So, look for a motherboard with enough slots for your storage drives.
+- **BIOS:** The BIOS controls your motherboard. Make sure it's up-to-date and has the latest features, especially if you plan to overclock or undervolt your system.
+
+## Cooling System Selection
+
+Modern computers have two critical components, the CPU and GPU, which can heat up during high-performance tasks. To prevent overheating, they come with built-in temperature controls that automatically reduce performance when temperatures rise. To keep them cool and maintain optimal performance, you need a reliable cooling system.
+
+For laptops, the only choice is a fan-based cooling system. Laptops have built-in fans and copper pipes to dissipate heat. Many gaming laptops even have two separate fans: one for the CPU and another for the GPU.
+
+For desktop computers, you have the option to install more efficient water cooling systems. These are highly effective but can be expensive. Or you can install more cooling fans to keep you components cool.
+
+Keep in mind that dust can accumulate in fan-based cooling systems, leading to malfunctions. So periodically clean the dust to keep your cooling system running smoothly.
+
+## Use MacBook to run LLMs
+
+An Apple MacBook equipped with either the M1 or the newer M2 Pro/Max processor. These cutting-edge chips leverage Apple's innovative Unified Memory Architecture (UMA), which revolutionizes the way the CPU and GPU interact with memory resources. This advancement plays a pivotal role in enhancing the performance and capabilities of LLMs.
+
+Unified Memory Architecture, as implemented in Apple's M1 and M2 series processors, facilitates seamless and efficient data access for both the CPU and GPU. Unlike traditional systems where data needs to be shuttled between various memory pools, UMA offers a unified and expansive memory pool that can be accessed by both processing units without unnecessary data transfers. This transformative approach significantly minimizes latency while concurrently boosting data access bandwidth, resulting in substantial improvements in both the speed and quality of outputs.
+![UMA](https://media.discordapp.net/attachments/1148534242104574012/1156600109967089714/IMG_3722.webp?ex=6516380a&is=6514e68a&hm=ebe3b6ecb1edb44cde58bd8d3fdd46cef66b60aa41ea6c03b51325fa65f8517e&=&width=807&height=426)
+
+The M1 and M2 Pro/Max chips offer varying levels of unified memory bandwidth, further underscoring their prowess in handling data-intensive tasks like AI processing. The M1/M2 Pro chip boasts an impressive capacity of up to 200 GB/s of unified memory bandwidth, while the M1/M2 Max takes it a step further, supporting up to a staggering 400 GB/s of unified memory bandwidth. This means that regardless of the complexity and demands of the AI tasks at hand, these Apple laptops armed with M1 or M2 processors are well-equipped to handle them with unparalleled efficiency and speed.
+
+## Optimizing Memory Speed for AI Models
+
+When it comes to utilizing LLMs effectively, you must delve into the intricacies of memory speed, as it plays a critical role in determining inference speed. These large language models necessitate full loading into RAM or VRAM each time they generate a new token, which is essentially a piece of text. For instance, a 4-bit 13-billion-parameter CodeLlama model consumes roughly 7.5GB of RAM.
+
+To understand the impact of memory bandwidth, let's consider an example. If your system boasts a RAM bandwidth of 50 Gbps (achieved with components like DDR4-3200 in tandem with a Ryzen 5 5600X), you can generate approximately 6 tokens per second. However, if you aspire to attain faster speeds, such as 11 tokens per second, you'll need a higher memory bandwidth, like DDR5-5600 with around 90 Gbps.
+
+For a broader context, consider top-tier GPUs like the Nvidia RTX 3090, which offers an impressive 930 Gbps of VRAM bandwidth. In contrast, the latest DDR5 RAM can provide up to 100GB/s of memory bandwidth. Recognizing and optimizing for memory bandwidth is paramount to efficiently running models like CodeLlama, as it directly influences the speed at which you can generate text tokens during inference.
+
+<!--
 ## Macbook 8GB RAM

-## Macbook 16GB RAM
+## Macbook 16GB RAM -->
--- a/docs/docs/hardware/recommendations/by-model.md
+++ b/docs/docs/hardware/recommendations/by-model.md
@ -4,5 +4,65 @@ title: Recommended AI Hardware by Model

 ## Codellama 34b

-## Falcon 180b
+### System Requirements:

+**For example**: If you want to use [Codellama 7B](https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GPTQ/tree/main) models on your own computer, you can take advantage of your GPU and run this with GPTQ file models.
+
+GPTQ is a format that compresses the model parameters to 4-bit, which reduces the VRAM requirements significantly. You can use the [oobabooga webui](https://github.com/oobabooga/text-generation-webui) or [JanAI](https://jan.ai/), which are simple interfaces that let you interact with different LLMS on your browser. It is pretty easy to set up and run. You can install it on Windows or Linux. (linked it to our installation page)
+
+**For 7B Parameter Models (4-bit Quantization)**
+
+| Format                                           | RAM Requirements     | VRAM Requirements | Minimum recommended GPU                   |
+| ------------------------------------------------ | -------------------- | ----------------- | ----------------------------------------- |
+| GPTQ (GPU inference)                             | 6GB (Swap to Load\*) | 6GB               | GTX 1660, 2060,RTX 3050, 3060 AMD 5700 XT |
+| GGML / GGUF (CPU inference)                      | 4GB                  | 300MB             |                                           |
+| Combination of GPTQ and GGML / GGUF (offloading) | 2GB                  | 2GB               |                                           |
+
+**For 13B Parameter Models (4-bit Quantization)**
+
+| Format                                           | RAM Requirements      | VRAM Requirements | Minimum recommended GPU                            |
+| ------------------------------------------------ | --------------------- | ----------------- | -------------------------------------------------- |
+| GPTQ (GPU inference)                             | 12GB (Swap to Load\*) | 10GB              |                                                    |
+| GGML / GGUF (CPU inference)                      | 8GB                   | 500MB             | AMD 6900 XT, RTX 2060 12GB, 3060 12GB, 3080, A2000 |
+| Combination of GPTQ and GGML / GGUF (offloading) | 10GB                  | 10GB              |                                                    |
+
+**For 34B Parameter Models (4-bit Quantization)**
+
+| Format                                           | RAM Requirements      | VRAM Requirements | Minimum recommended GPU                                              |
+| ------------------------------------------------ | --------------------- | ----------------- | -------------------------------------------------------------------- |
+| GPTQ (GPU inference)                             | 32GB (Swap to Load\*) | 20GB              |                                                                      |
+| GGML / GGUF (CPU inference)                      | 20GB                  | 500MB             | RTX 3080 20GB, A4500, A5000, 3090, 4090, 6000, Tesla V100, Tesla P40 |
+| Combination of GPTQ and GGML / GGUF (offloading) | 10GB                  | 4GB               |                                                                      |
+
+**For 7B Parameter Models (8-bit Quantization)**
+
+| Format                                           | RAM Requirements      | VRAM Requirements | Minimum recommended GPU                |
+| ------------------------------------------------ | --------------------- | ----------------- | -------------------------------------- |
+| GPTQ (GPU inference)                             | 24GB (Swap to Load\*) | 12GB              | RTX 3080, RTX 3080 Ti, RTX 3090, A5000 |
+| GGML / GGUF (CPU inference)                      | 16GB                  | 1GB               | RTX 3060 12GB, RTX 3070, A2000         |
+| Combination of GPTQ and GGML / GGUF (offloading) | 12GB                  | 4GB               | RTX 3060, RTX 3060 Ti, A2000           |
+
+**For 13B Parameter Models (8-bit Quantization)**
+
+| Format                                           | RAM Requirements      | VRAM Requirements | Minimum recommended GPU           |
+| ------------------------------------------------ | --------------------- | ----------------- | --------------------------------- |
+| GPTQ (GPU inference)                             | 36GB (Swap to Load\*) | 20GB              | RTX 4090, A6000, A6000 Ti, A8000  |
+| GGML / GGUF (CPU inference)                      | 24GB                  | 2GB               | RTX 3080 20GB, RTX 3080 Ti, A5000 |
+| Combination of GPTQ and GGML / GGUF (offloading) | 20GB                  | 8GB               | RTX 3080, RTX 3080 Ti, A5000      |
+
+**For 34B Parameter Models (8-bit Quantization)**
+
+| Format                                           | RAM Requirements      | VRAM Requirements | Minimum recommended GPU          |
+| ------------------------------------------------ | --------------------- | ----------------- | -------------------------------- |
+| GPTQ (GPU inference)                             | 64GB (Swap to Load\*) | 40GB              | A8000, A8000 Ti, A9000           |
+| GGML / GGUF (CPU inference)                      | 40GB                  | 2GB               | RTX 4090, A6000, A6000 Ti, A8000 |
+| Combination of GPTQ and GGML / GGUF (offloading) | 48GB                  | 20GB              | RTX 4090, A6000, A6000 Ti, A8000 |
+
+> :memo: **Note**: System RAM, not VRAM, required to load the model, in addition to having enough VRAM. Not required to run the model. You can use swap space if you do not have enough RAM.
+
+### Performance Recommendations:
+
+1. **Optimal Performance**: To achieve the best performance when working with CodeLlama models, consider investing in a high-end GPU such as NVIDIA's latest RTX 3090 or RTX 4090. For the largest models like the 65B and 70B, a dual GPU setup is recommended. Additionally, ensure your system boasts sufficient RAM, with a minimum of 16 GB, although 64 GB is ideal for seamless operation.
+2. **Budget-Friendly Approach**: If budget constraints are a concern, focus on utilizing CodeLlama GGML/GGUF models that can comfortably fit within your system's available RAM. Keep in mind that while you can allocate some model weights to the system RAM to save GPU memory, this may result in a performance trade-off.
+
+> :memo: **Note**: It's essential to note that these recommendations are guidelines, and the actual performance you experience will be influenced by various factors. These factors include the specific task you're performing, the implementation of the model, and the concurrent system processes. To optimize your setup, consider these recommendations as a starting point and adapt them to your unique requirements and constraints.
--- a/docs/docs/hardware/recommendations/by-usecase.md
+++ b/docs/docs/hardware/recommendations/by-usecase.md
@ -2,19 +2,21 @@
 title: Recommended AI Hardware by Use Case
 ---

-## Personal Use
+## Which AI Hardware to Choose Based on Your Use Case

-### Entry-level Experimentation
+Artificial intelligence (AI) is rapidly changing the world, and AI hardware is becoming increasingly important for businesses and individuals alike. Choosing the right hardware for your AI needs is crucial to get the best performance and results. Here are some tips for selecting AI hardware based on your specific use case and requirements.

-### Personal Use
+### Entry-level Experimentation:

- Macbook (16gb)
- 3090
+**Personal Use:**
+When venturing into the world of AI as an individual, your choice of hardware can significantly impact your experience. Here's a more detailed breakdown:

-### Prosumer Use
+- **Macbook (16GB):** A Macbook equipped with 16GB of RAM and either the M1 or the newer M2 Pro/Max processor is an excellent starting point for AI enthusiasts. These cutting-edge chips leverage Apple's innovative Unified Memory Architecture (UMA), which revolutionizes the way the CPU and GPU interact with memory resources. This advancement plays a pivotal role in enhancing the performance and capabilities of LLMs.
+- **Nvidia GeForce RTX 3090:** This powerful graphics card is a solid alternative for AI beginners, offering exceptional performance for basic experiments.

- Apple Silicon
- 2 x 3090 (48gb RAM)
+2.  **Serious AI Work:**
+
+- **2 x 3090 RTX Card (48GB RAM):** For those committed to more advanced AI projects, this configuration provides the necessary muscle. Its dual Nvidia GeForce RTX 3090 GPUs and ample RAM make it suitable for complex AI tasks and model training.

 ## Business Use

@ -25,7 +27,7 @@ Run a LLM trained on enterprise data (i.e. RAG)
 - Mac Studio M2 Ultra with 192GB unified memory
  - Cannot train
 - RTX 6000
-  - Should we recommend 2 x 4090 instead?  
+  - Should we recommend 2 x 4090 instead?

 ### For a 50-person Law Firm

@ -59,4 +61,4 @@ Run Codellama with RAG on existing codebase
 ### For a 10,000-person Enterprise

 - 8 x H100s
- NVAIE with vGPUs
+- NVAIE with vGPUs