Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
-
Updated
Sep 24, 2025 - Python
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
EfficientSAM3 compresses SAM3 into lightweight, edge-friendly models via progressive knowledge distillation for fast promptable concept segmentation and tracking.
[TMLR 2026] Survey: https://arxiv.org/pdf/2507.20198
A High-Efficiency System of Large Language Model Based Search Agents
Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)
Official PyTorch implementation of the paper "Dataset Distillation via the Wasserstein Metric" (ICCV 2025).
TinyML and Efficient Deep Learning Computing | MIT 6.S965/6.5940
Code for paper "Joint Architecture Design and Workload Partitioning for DNN Inference on Industrial IoT Clusters"
This is a repository accompanying the survey Edge AI Meets LLM (coming soon), containing a comprehensive list of papers, codebases, toolchains, and open-source frameworks. It is intended to serve as a handbook for researchers and developers interested in Edge/Mobile LLMs.
Dynamic Attention Mask (DAM) generate adaptive sparse attention masks per layer and head for Transformer models, enabling long-context inference with lower compute and memory overhead without fine-tuning.
Code for paper "EdgeKE: An On-Demand Deep Learning IoT System for Cognitive Big Data on Industrial Edge Devices"
Official PyTorch implementation of the paper "Towards Adversarially Robust Dataset Distillation by Curvature Regularization" (AAAI 2025).
A deep learning framework that implements Early Exit strategies in Convolutional Neural Networks (CNNs) using Deep Q-Learning (DQN). This project enhances computational efficiency by dynamically determining the optimal exit point in a neural network for image classification tasks on CIFAR-10.
Research-ready and production-friendly neural network pruning for PyTorch—transparent methods, reproducible baselines, and deployment metrics to compress models for real-world use.
⚡ Fast, concise, LLM-first Generative UI language
Transformer (GPT) implemented from scratch in C++. Runs on modest hardware with complete mathematical derivations and optimized tensor operations.
Ground-Truthing AI Energy Consumption: Validating CodeCarbon Against External Measurements
A non-Transformer hierarchical recurrent network with differentiable Gumbel-Softmax routing and bounded memory slots. Runs 7B+ parameter models layer-by-layer on low-budget GPUs.
Code for paper "Dynamic Deep Neural Network Inference via Adaptive Channel Skipping"
Add a description, image, and links to the efficient-ai topic page so that developers can more easily learn about it.
To associate your repository with the efficient-ai topic, visit your repo's landing page and select "manage topics."