gpu

Star

Here are 3,999 public repositories matching this topic...

ska-sa / katgpucbf

Star

A GPU-based correlator for MeerKAT Extension

python gpu cuda radio-astronomy

Updated Jun 12, 2024
Python

pytorch / pytorch

Star

Tensors and Dynamic neural networks in Python with strong GPU acceleration

python machine-learning deep-learning neural-network gpu numpy autograd tensor

Updated Jun 12, 2024
Python

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.

gpu transformers pytorch llm

Updated Jun 12, 2024
Python

vectorch-ai / ScaleLLM

Star

A high-performance inference system for large language models, designed for production environments.

performance gpu model production cuda efficiency inference transformer llama speculative serving llm llm-inference llama3

Updated Jun 12, 2024
C++

intel / compute-runtime

Star

Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver

gpu opencl intel gpgpu compute intel-hd-graphics

Updated Jun 12, 2024
C++

cupy / cupy

Sponsor

Star

NumPy & SciPy for GPU

python gpu numpy cuda cublas scipy tensor cudnn rocm cupy cusolver nccl curand cusparse nvrtc cutensor nvtx cusparselt

Updated Jun 12, 2024
Python

deepflowio / deepflow

Star

✨ Zero-code distributed tracing and profiling, observability via eBPF 🚀

kubernetes gpu cuda wasm apm profiling distributed-tracing service-map opentelemetry llm

Updated Jun 12, 2024
Go

apache / tvm

Star

Open deep learning compiler stack for cpu, gpu and specialized accelerators

javascript machine-learning performance deep-learning metal compiler gpu vulkan opencl tensor spirv rocm tvm

Updated Jun 12, 2024
Python

yosh-matsuda / gpu-ptr

Star

Cross-platform GPU smart pointer with C++20 range support

cpp gpu cuda header-only hip cpp20

Updated Jun 12, 2024
C++

ROCm / rpp

Star

AMD ROCm Performance Primitives (RPP) library is a comprehensive high-performance computer vision library for AMD processors with HIP/OpenCL/CPU back-ends.

cpu computer-vision hpc amd gpu opencl histogram contrast bitwise hip rocm openvx rpp mivisionx radeon-performance-primitives warp-affine channel-extract agumentation

Updated Jun 12, 2024
C++

NVIDIA / spark-rapids

Star

Spark RAPIDS plugin - accelerate Apache Spark with GPUs

big-data spark gpu rapids

Updated Jun 12, 2024
Scala

microsoft / DeepSpeed

Star

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

machine-learning compression deep-learning gpu inference pytorch zero data-parallelism model-parallelism mixture-of-experts pipeline-parallelism billion-parameters trillion-parameters

Updated Jun 12, 2024
Python

NVIDIA / TransformerEngine

Star

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

python machine-learning deep-learning gpu cuda pytorch jax fp8