From GPT to Llama: Deploy Any AI Model On-Premises with AIBOX

The AIBOX series of products have the characteristics of high performance, low power consumption, and strong environmental adaptability. The computing power covers 6-157 TOPS. By combining diversified deep learning algorithms, it is compact in size and can support the private deployment of mainstream large models, empowering digitalization for multiple smart industries.

Currently, Firefly has launched a total of 9 AIBOX products, which are adapted to application scenarios in different industries through different computing power, energy efficiency, appearance, etc.

Firefly x NVIDIA

Equipped with NVIDIA's original Jetson Orin series core module, it is compatible with accelerated computing functions of various performance and price points, with a computing power of up to 157 TOPS; it also supports the NVIDIA software ecosystem. With its powerful computing performance, excellent energy efficiency and easy development experience, it can meet the needs of various independent applications.

	AIBOX-OrinNX	AIBOX-OrinNano
SOC	NVIDIA Jetson Orin NX (16GB)	NVIDIA Jetson Orin Nano (8GB)
CPU	8-core 64-bit processor, up to 2.0 GHz main frequency	6-core 64-bit processor, up to 1.7 GHz main frequency
NPU	157 TOPS	67 TOPS
Video Encoding	14K@60fps, 34K@30fps, 61080p@60fps, 121080p@30fps	1080p@30fps
Video Decoding	18K@30fps, 24K@60fps, 44K@30fps, 91080p60fps, 18*1080p@30fps	14K@60fps, 24K@30fps, 51080p@60fps, 111080p@30fps
Memory	16GB LPDDR5	8GB LPDDR5
Power Consumption	Typical: 7.2W (12V/600mA) Maximum: 33.6W (12V/2800mA)	Typical: 7.2W (12V/600mA) Maximum: 18W (12V/1500mA)

Large Language Model

Robot Model: supports ROS robot model.
Language Model: supports large models under Transformer architecture, such as Llama2/ChatGLM, Qwen and other large language models for private deployment.
Visual Model: supports ViT, Grouding DINO, SAM and other large visual models for private deployment.
AI Painting: supports the private deployment of Stable Diffusion V1.5 image generation model in the AIGC field.

Firefly x Rockchip

Equipped with Rockchip's flagship AIoT chip, it adopts a big-small core architecture with a maximum main frequency of up to 2.4 GHz, providing powerful hardware support for high-performance computing and multi-tasking. At the same time, this series has industrial features such as low power consumption and long battery life, which is adapted to the needs of industrial application scenarios.

	AIBOX-3576	AIBOX-3588	AIBOX-3588S
SOC	Rockchip RK3576	Rockchip RK3588	Rockchip RK3588S
CPU	8-core 64-bit processor, up to 2.2 GHz	8-core 64-bit processor, up to 2.4 GHz	8-core 64-bit processor, up to 2.4 GHz
NPU	6 TOPS, support INT4/8/16/FP16/BF16/TF32 mixed operations	6 TOPS, support INT4/INT8/INT16 mixed operations	6 TOPS, support INT4/INT8/INT16 mixed operations
Video Encoding	4K@60fps: H.264/AVC	8K@30fps:H.264	8K@30fps:H.264
Video Decoding	8K@30fps 4K@120fps: VP9/AVS2/AV1 4K@60fps: H.264/AVC	8K@60fps 4K@120fps: VP9/AVS2 8K@30fps: H.264/AVC/MVC 4K@60fps:AV1 1080p@60fps:MPEG-2/-1/VC-1/VP8	8K@60fps: VP9/AVS2 8K@30fps: H.264 AVC/MVC 4K@60fps: AV1 1080p@60fps:MPEG-2/-1/VC-1/VP8
Memory	LPDDR4(4/8/16GB optional)	LPDDR4(4/8/16/32GB optional)	LPDDR5(4/8/16/32GB optional)
Power Consumption	Typical: 1.2W (12V/100mA) Maximum: 7.2W (12V/600mA) Sleep: 0.072W (12V/6mA)	Typical: 2.64W (12V/220mA) Maximum: 14.4W (12V/1200mA) Sleep: 0.18W (12V/15mA)	Typical: 1.26W (12V/105mA) Maximum: 13.2W (12V/1100mA) Sleep: 0.18W (12V/15mA)

Large Language Model

Supports large models under the Transformer architecture, such as Gemma, Llama2, ChatGLM, Qwen, Phi and other large language models for private deployment.

Firefly x SOPHON

This series is equipped with SOPHON series AI processors, which are extremely cost-effective. The AIBOX-1684X has a computing power of up to 32 TOPS, supports mainstream programming frameworks, video encoding and decoding, and can be applied to artificial intelligence reasoning in cloud and edge computing applications.

	AIBOX-1684X	AIBOX-1684	AIBOX-1688	AIBOX-186
SOC	SOPHON BM1684X	SOPHON BM1684	SOPHON BM1688	SOPHON CV186AH
CPU	8-core processor ARM A53, up to 2.3GHz	8-core processor ARM A53, up to 2.3GHz	8-core processor ARM A53, up to 1.6GHz	6-core processor ARM A53, up to 1.6GHz
NPU	32 TOPS	17.6 TOPS	16 TOPS	7.2 TOPS
Video Encoding	32 channels, 1080p@25fps 12 channels, 1080p@25fps H.264	2-channel 1080p@25fps H.264	Maximum performance: 19201080@300fps or 38402160@75fps	Maximum performance: 19201080@300fps or 38402160@75fps
Video Decoding	32 channels 1080p@25fps 1 channel 8K@25fps	32 channels 1080p@30fps H.264	Maximum performance: 19201080@480fps or 38402160@120fps	Maximum performance: 19201080@480fps or 38402160@120fps
Memory	LPDDR4/LPDDR4X (8/12/16GB optional)	LPDDR4/LPDDR4X (8/12/16GB optional)	8GB LPDDR4 (4/8/16GB optional)	16GB LPDDR4 (4/8/16GB optional)
Power Consumption	Typical: 20.4W (12V/1700mA) Maximum: 43.2W (12V/3600mA)	Typical: 9.6W (12V/800mA) Maximum: 26.4W (12V/2200mA)	Typical: 7.2W (12V/600mA) Maximum: 14.4W (12V/1200mA)	Typical: 6W (12V/500mA) Maximum: 10.8W (12V/900mA)

Large Language Model

Supports the private deployment of large models under the Transformer architecture, such as Llama2, ChatGLM, Qwen and other large language models.
Supports the private deployment of large visual models such as ViT, Grouding DINO, SAM.
Supports the private deployment of the Stable DiffusionV1.5 image generation model in the AIGC field.

Comprehensive AI Privatization Deployment

Most of the AIBOX series can support the private deployment of modern mainstream large models, such as the Gemma series, Llama series, ChatGLM series, Qwen series and other large language models.

Supports traditional network architectures such as CNN, RNN, LSTM, etc.
Supports multiple deep learning frameworks such as Caffe, TensorFlow, PyTorch, MXNet, and supports custom operator development.
Supports Docker container management technology for easy image deployment.

Support Video Codec

The AIBOX series basically supports video encoding and decoding, and can support up to 8K@60fps video decoding and 8K@30fps video encoding. It supports simultaneous encoding and decoding, high resolution and multi-channel decoding capabilities, allowing large models to quickly obtain information in the video, providing richer data for model training and reasoning, improving visual analysis accuracy, and accelerating algorithm training and optimization.

May 8, 2025

New AIBOX Member with RK3588S Chip Now Available

July 21, 2025

Firefly & Rockchip DevCon 2025: Pioneering AIoT Model Innovation

From GPT to Llama: Deploy Any AI Model On-Premises with AIBOX

Firefly x NVIDIA

Firefly x Rockchip

Firefly x SOPHON

Comprehensive AI Privatization Deployment

Support Video Codec

Leave a comment

Secure Payment

1-year Warranty

Customer Service

Quality Assurance

Your cart

Your cart is empty

Recommendations

Subtotal

Country/region

Language

Firefly x NVIDIA

Firefly x Rockchip

Firefly x SOPHON

Comprehensive AI Privatization Deployment

Support Video Codec

Leave a comment

Follow the latest news from Firefly

Secure Payment

1-year Warranty

Customer Service

Quality Assurance