Documentation

Everything you need to know about using Bert CLI β€” commands, models, tips, and best practices.

πŸ“š Contents

πŸš€ Getting Started

About Bert

The idea of Bert was actually concived during 2022, During this time, AI assistants were just raising to what they are today, Bert idea begins when the idea of a AI model for you and only you that can be whatever you want it to be, when this Idea raised, Amphydia wasn't yet created, and I, Matias Nisperuza was younger and had no chance on making bert, but times change, for good.

Bert Identity

We frame bert as a service fundamentally oriented around human satisfaction, not raw capability benchmarks or abstract performance metrics.
At its core, bert is built to feel like a reliable presenceβ€”a system that users can return to consistently, trust implicitly, and engage with naturally. In practical terms, this means prioritizing clarity over cleverness, helpfulness over verbosity, and dependability over spectacle.
bert is not positioned as an omniscient authority or a cold computational engine. Instead, it operates as a dependable companion:
a system that understands context, respects user intent, and responds in a way that feels grounded, supportive, and predictable.

Bert palette

Bert Main colors:

  • Dark Sage (#598556)
  • Light Beige (#F5F5DC)
  • Sage (#9C9C9C)
  • Olive (#808000)

Installation

# Strongly Recommended- Using PyPI pip install
pip install bert-cli

#Run bert after install
bert

#cloning repo in github - not recommended
git clone https://github.com/mnisperuza/bert-cli.git
cd bert-cli

# Install dependencies
pip install -e .

# Run Bert
bert 

By installing bert you are agreeing to the terms of use and privacy.

First Run

When you first run Bert, you'll see an animated banner followed by a quantization picker. Choose based on your GPU:

πŸ’‘ Pro Tip

After the banner, Bert automatically loads the Nano model. You don't need to do anything β€” just start chatting!

⌨️ Commands Reference

Model Commands

Command Description
bert nano Switch to Bert Nano (fastest, LiquidAI-LFM2-700M)
bert mini Switch to Bert Mini (balanced, LiquidAI-LFM2-1.2B)
bert main Switch to Bert Main (thinking mode, Qwen3-1.7B)
bert max Switch to Bert Max (reasoning, LiquidAI-LFM2-2.6B)
bert coder Switch to Bert Coder (code, Qwen2.5 Coder-1.5B)
bert maxcoder Switch to Bert Max-Coder (heavy code, Qwen 2.5-coder-3B-Instruct)

Quantization Commands

Command Description
bert int4 Switch to INT4 quantization (4GB VRAM)
bert int8 Switch to INT8 quantization (6GB VRAM)
bert fp16 Switch to FP16 (8GB+ VRAM)
bert fp32 Switch to FP32 (CPU mode)

System Commands

Command Description
/*token XXXX Set your weekly token
/*tokens Show token status and remaining count
/*think - question Enable thinking mode for this query (bert main only-BETA-)
/*help Show help and all commands
/*status Show current model, quant, and device info
/*memory Clear conversation memory
/*clear Clear screen and show banner
/*exit Exit Bert CLI

During Generation

Key Action
ESC Stop generation immediately
Ctrl+C Stop generation (interrupt)

πŸ€– Model Guide

Bert comes with 6 specialized models, each optimized for different tasks.

Bert Nano Fastest

LiquidAI/LFM2-700M

Ultra-fast responses for quick questions, brainstorming, and casual chat. Perfect for low-end GPUs.

~2GB VRAM 32K context

Bert Mini

LiquidAI/LFM2-1.2B

Balanced performance for everyday tasks. Good quality with reasonable speed.

~4GB VRAM 32K context

Bert Main 🧠 Thinking

Qwen/Qwen3-1.7B

The flagship model with thinking capabilities. Shows its reasoning process when using /*think.

~5GB VRAM 128K context

Bert Max Powerful

LiquidAI/LFM2-2.6B

Advanced reasoning and complex analysis. Best for nuanced discussions and detailed explanations.

~8GB VRAM 16K context

Bert Coder Code

Qwen/Qwen2.5-Coder-1.5B-Instruct

Specialized for programming. Write, debug, and explain code across multiple languages.

~4GB VRAM 32K context

Bert Max-Coder Heavy Code

Qwen/Qwen2.5-Coder-3B-Instruct

For complex, multi-file projects and production-quality code. Best for professional development.

~8GB VRAM 32K context

Which Model Should I Use?

🧠 Thinking Mode-BETA

Bert Main (Qwen3-1.7B) supports a special thinking mode that shows the model's reasoning process after the response generation We are improving this feature, so expect a major update in a couple of months.

How to Use

/*think - What is the derivative of xΒ³ + 2xΒ²?

The response will show:

  1. The model's answer (streamed normally)
  2. A thinking box showing the reasoning process
  3. Token count (only counts the response, not thinking)

⚠️ Important

Thinking mode only works with Bert Main. Other models will show a warning if you try to use /*think.

When to Use Thinking

πŸ“ File References-BETA

You can reference files directly in your queries using the @ symbol. Bert can easily find, check and read files in the current directory, although, Features like reviewing files or adding a relative path are still in early development. if you enconter a issue, dont doubt emailing us at mnisperuza1102@gmail.com

Usage

# Reference a file
Check @main.py for bugs

# Multiple files
Compare @old_version.py and @new_version.py

# Relative paths
Review the code in @src/utils/helpers.js

Supported File Types

Category Extensions
Code .py, .js, .ts, .java, .c, .cpp, .go, .rs, .rb, .php
Web .html, .css, .jsx, .tsx, .vue, .svelte
Data .json, .yaml, .yml, .xml, .csv, .toml
Docs .md, .txt, .rst, .log

πŸ’‘ Pro Tip

File paths are often resolved relative to your current directory. Bert will show "πŸ“‚ Found: filename" when it successfully reads a file.

🎟️ Token System

Bert uses a weekly token system to manage usage. Every week, you get 20,000 free tokens.

Getting a Token

  1. Visit the Bert CLI homepage
  2. Enter your email
  3. Receive your token (format: BERT-XXXX-XXXX-XXXX-XXXX)
  4. In Bert, type: /*token YOUR-TOKEN-HERE

Token Commands

# Set your token
/*token BERT-A1B2-C3D4-E5F6-0123

# Check remaining tokens
/*tokens

How Tokens Are Counted

βš™οΈ Quantization Guide

Quantization reduces model size and memory usage, allowing larger models to run on smaller GPUs.

Level VRAM Quality Speed
INT4 ~4GB Good ⭐ Fast
INT8 ~6GB Very Good Medium
FP16 ~8GB Excellent Medium
FP32 CPU Best Slow

πŸ’‘ Recommendation

Start with INT4. It offers the best balance of quality, speed, and memory usage for most users.

✨ Tips & Best Practices

Getting Better Responses

Keyboard Shortcuts

Troubleshooting

Model won't load?

Try a smaller model or lower quantization. If you're out of VRAM, use bert fp32 for CPU mode.

Slow responses?

Switch to Bert Nano for faster responses, or use INT4 quantization.

Token expired?

Tokens are valid for one week. Get a new one at the homepage.

Model return strange responses?

Let us know which model, email us at mnisperuza1102@gmail.com We will review the issue!.