From GPT to Llama: Deploy Any AI Model On-Premises with AIBOX

0 comments

The AIBOX series of products have the characteristics of high performance, low power consumption, and strong environmental adaptability. The computing power covers 6-157 TOPS. By combining diversified deep learning algorithms, it is compact in size and can support the private deployment of mainstream large models, empowering digitalization for multiple smart industries.

Currently, Firefly has launched a total of 9 AIBOX products, which are adapted to application scenarios in different industries through different computing power, energy efficiency, appearance, etc.

Firefly x NVIDIA

Equipped with NVIDIA's original Jetson Orin series core module, it is compatible with accelerated computing functions of various performance and price points, with a computing power of up to 157 TOPS; it also supports the NVIDIA software ecosystem. With its powerful computing performance, excellent energy efficiency and easy development experience, it can meet the needs of various independent applications.


AIBOX-OrinNX AIBOX-OrinNano
SOC NVIDIA Jetson Orin NX (16GB) NVIDIA Jetson Orin Nano (8GB)
CPU 8-core 64-bit processor, up to 2.0 GHz main frequency 6-core 64-bit processor, up to 1.7 GHz main frequency
NPU 157 TOPS 67 TOPS
Video Encoding 1*4K@60fps, 3*4K@30fps, 6*1080p@60fps, 12*1080p@30fps 1080p@30fps
Video Decoding 1*8K@30fps, 2*4K@60fps, 4*4K@30fps, 9*1080p60fps, 18*1080p@30fps 1*4K@60fps, 2*4K@30fps, 5*1080p@60fps, 11*1080p@30fps
Memory 16GB LPDDR5 8GB LPDDR5
Power Consumption Typical: 7.2W (12V/600mA)
Maximum: 33.6W (12V/2800mA)
Typical: 7.2W (12V/600mA)
Maximum: 18W (12V/1500mA)

Large Language Model

  • Robot Model: supports ROS robot model.
  • Language Model: supports large models under Transformer architecture, such as Llama2/ChatGLM, Qwen and other large language models for private deployment.
  • Visual Model: supports ViT, Grouding DINO, SAM and other large visual models for private deployment.
  • AI Painting: supports the private deployment of Stable Diffusion V1.5 image generation model in the AIGC field.

Firefly x Rockchip

Equipped with Rockchip's flagship AIoT chip, it adopts a big-small core architecture with a maximum main frequency of up to 2.4 GHz, providing powerful hardware support for high-performance computing and multi-tasking. At the same time, this series has industrial features such as low power consumption and long battery life, which is adapted to the needs of industrial application scenarios.


AIBOX-3576 AIBOX-3588  AIBOX-3588S
SOC Rockchip RK3576 Rockchip RK3588 Rockchip RK3588S
CPU 8-core 64-bit processor, up to 2.2 GHz 8-core 64-bit processor, up to 2.4 GHz 8-core 64-bit processor, up to 2.4 GHz
NPU 6 TOPS, support INT4/8/16/FP16/BF16/TF32 mixed operations 6 TOPS, support INT4/INT8/INT16 mixed operations 6 TOPS, support INT4/INT8/INT16 mixed operations
Video Encoding 4K@60fps: H.264/AVC 8K@30fps:H.264 8K@30fps:H.264
Video Decoding 8K@30fps
4K@120fps: VP9/AVS2/AV1
4K@60fps: H.264/AVC
8K@60fps
4K@120fps: VP9/AVS2
8K@30fps: H.264/AVC/MVC
4K@60fps:AV1
1080p@60fps:MPEG-2/-1/VC-1/VP8
8K@60fps: VP9/AVS2
8K@30fps: H.264 AVC/MVC
4K@60fps: AV1
1080p@60fps:MPEG-2/-1/VC-1/VP8
Memory LPDDR4(4/8/16GB optional) LPDDR4(4/8/16/32GB optional) LPDDR5(4/8/16/32GB optional)
Power Consumption Typical: 1.2W (12V/100mA)
Maximum: 7.2W (12V/600mA)
Sleep: 0.072W (12V/6mA)
Typical: 2.64W (12V/220mA)
Maximum: 14.4W (12V/1200mA)
Sleep: 0.18W (12V/15mA)
Typical: 1.26W (12V/105mA)
Maximum: 13.2W (12V/1100mA)
Sleep: 0.18W (12V/15mA)

Large Language Model

  • Supports large models under the Transformer architecture, such as Gemma, Llama2, ChatGLM, Qwen, Phi and other large language models for private deployment.

Firefly x SOPHON

This series is equipped with SOPHON series AI processors, which are extremely cost-effective. The AIBOX-1684X has a computing power of up to 32 TOPS, supports mainstream programming frameworks, video encoding and decoding, and can be applied to artificial intelligence reasoning in cloud and edge computing applications.


AIBOX-1684X AIBOX-1684 AIBOX-1688 AIBOX-186
SOC SOPHON BM1684X SOPHON BM1684 SOPHON BM1688 SOPHON CV186AH
CPU 8-core processor ARM A53, up to 2.3GHz 8-core processor ARM A53, up to 2.3GHz 8-core processor ARM A53, up to 1.6GHz 6-core processor ARM A53, up to 1.6GHz
NPU 32 TOPS 17.6 TOPS 16 TOPS 7.2 TOPS
Video Encoding 32 channels, 1080p@25fps
12 channels, 1080p@25fps H.264
2-channel 1080p@25fps
H.264
Maximum performance: 1920*1080@300fps or 3840*2160@75fps Maximum performance: 1920*1080@300fps or 3840*2160@75fps
Video Decoding 32 channels 1080p@25fps
1 channel 8K@25fps
32 channels 1080p@30fps
H.264
Maximum performance: 1920*1080@480fps or 3840*2160@120fps Maximum performance: 1920*1080@480fps or 3840*2160@120fps
Memory LPDDR4/LPDDR4X
(8/12/16GB optional)
LPDDR4/LPDDR4X
(8/12/16GB optional)
8GB LPDDR4
(4/8/16GB optional)
16GB LPDDR4
(4/8/16GB optional)
Power Consumption Typical: 20.4W (12V/1700mA)
Maximum: 43.2W (12V/3600mA)
Typical: 9.6W (12V/800mA)
Maximum: 26.4W (12V/2200mA)
Typical: 7.2W (12V/600mA)
Maximum: 14.4W (12V/1200mA)
Typical: 6W (12V/500mA)
Maximum: 10.8W (12V/900mA)

Large Language Model

  • Supports the private deployment of large models under the Transformer architecture, such as Llama2, ChatGLM, Qwen and other large language models.
  • Supports the private deployment of large visual models such as ViT, Grouding DINO, SAM.
  • Supports the private deployment of the Stable DiffusionV1.5 image generation model in the AIGC field.

Comprehensive AI Privatization Deployment

Most of the AIBOX series can support the private deployment of modern mainstream large models, such as the Gemma series, Llama series, ChatGLM series, Qwen series and other large language models.

  • Supports traditional network architectures such as CNN, RNN, LSTM, etc.
  • Supports multiple deep learning frameworks such as Caffe, TensorFlow, PyTorch, MXNet, and supports custom operator development.
  • Supports Docker container management technology for easy image deployment.

Support Video Codec

The AIBOX series basically supports video encoding and decoding, and can support up to 8K@60fps video decoding and 8K@30fps video encoding. It supports simultaneous encoding and decoding, high resolution and multi-channel decoding capabilities, allowing large models to quickly obtain information in the video, providing richer data for model training and reasoning, improving visual analysis accuracy, and accelerating algorithm training and optimization.


New AIBOX Member with RK3588S Chip Now Available

Leave a comment

Please note, comments need to be approved before they are published.