Documentation

Everything you need to know about using Bert CLI — commands, models, tips, and best practices.

📚 Contents

Getting Started
Commands Reference
Model Guide
Thinking Mode
File References
Token System
Quantization
Tips & Best Practices

🚀 Getting Started

About Bert

The idea of Bert was actually concived during 2022, During this time, AI assistants were just raising to what they are today, Bert idea begins when the idea of a AI model for you and only you that can be whatever you want it to be, when this Idea raised, Amphydia wasn't yet created, and I, Matias Nisperuza was younger and had no chance on making bert, but times change, for good.

Bert Identity

We frame bert as a service fundamentally oriented around human satisfaction, not raw capability benchmarks or abstract performance metrics.
At its core, bert is built to feel like a reliable presence—a system that users can return to consistently, trust implicitly, and engage with naturally. In practical terms, this means prioritizing clarity over cleverness, helpfulness over verbosity, and dependability over spectacle.
bert is not positioned as an omniscient authority or a cold computational engine. Instead, it operates as a dependable companion:
a system that understands context, respects user intent, and responds in a way that feels grounded, supportive, and predictable.

Bert palette

Bert Main colors:

Dark Sage (#598556)
Light Beige (#F5F5DC)
Sage (#9C9C9C)
Olive (#808000)

Installation

# Strongly Recommended- Using PyPI pip install
pip install bert-cli

#Run bert after install
bert

#cloning repo in github - not recommended
git clone https://github.com/mnisperuza/bert-cli.git
cd bert-cli

# Install dependencies
pip install -e .

# Run Bert
bert

By installing bert you are agreeing to the terms of use and privacy.

First Run

When you first run Bert, you'll see an animated banner followed by a quantization picker. Choose based on your GPU:

⭐ INT4-FAST — Best for most users. Works with 4GB+ VRAM.
🔷 INT8-BALANCED — Higher quality. Needs 6GB+ VRAM.
💎 FP16-HIGH-END — Best quality. Needs 8GB+ VRAM.
🖥️ FP32-CPU — CPU mode. Slower but no GPU needed.

💡 Pro Tip

After the banner, Bert automatically loads the Nano model. You don't need to do anything — just start chatting!

⌨️ Commands Reference

Model Commands

Command	Description
bert nano	Switch to Bert Nano (fastest, LiquidAI-LFM2-700M)
bert mini	Switch to Bert Mini (balanced, LiquidAI-LFM2-1.2B)
bert main	Switch to Bert Main (thinking mode, Qwen3-1.7B)
bert max	Switch to Bert Max (reasoning, LiquidAI-LFM2-2.6B)
bert coder	Switch to Bert Coder (code, Qwen2.5 Coder-1.5B)
bert maxcoder	Switch to Bert Max-Coder (heavy code, Qwen 2.5-coder-3B-Instruct)

Quantization Commands

Command	Description
bert int4	Switch to INT4 quantization (4GB VRAM)
bert int8	Switch to INT8 quantization (6GB VRAM)
bert fp16	Switch to FP16 (8GB+ VRAM)
bert fp32	Switch to FP32 (CPU mode)

System Commands

Command	Description
/*token XXXX	Set your weekly token
/*tokens	Show token status and remaining count
/*think - question	Enable thinking mode for this query (bert main only-BETA-)
/*help	Show help and all commands
/*status	Show current model, quant, and device info
/*memory	Clear conversation memory
/*clear	Clear screen and show banner
/*exit	Exit Bert CLI

During Generation

Key	Action
ESC	Stop generation immediately
Ctrl+C	Stop generation (interrupt)

🤖 Model Guide

Bert comes with 6 specialized models, each optimized for different tasks.

Bert Nano Fastest

LiquidAI/LFM2-700M

Ultra-fast responses for quick questions, brainstorming, and casual chat. Perfect for low-end GPUs.

~2GB VRAM 32K context

Bert Mini

LiquidAI/LFM2-1.2B

Balanced performance for everyday tasks. Good quality with reasonable speed.

~4GB VRAM 32K context

Bert Main 🧠 Thinking

Qwen/Qwen3-1.7B

The flagship model with thinking capabilities. Shows its reasoning process when using /*think.

~5GB VRAM 128K context

Bert Max Powerful

LiquidAI/LFM2-2.6B

Advanced reasoning and complex analysis. Best for nuanced discussions and detailed explanations.

~8GB VRAM 16K context

Bert Coder Code

Qwen/Qwen2.5-Coder-1.5B-Instruct

Specialized for programming. Write, debug, and explain code across multiple languages.

~4GB VRAM 32K context

Bert Max-Coder Heavy Code

Qwen/Qwen2.5-Coder-3B-Instruct

For complex, multi-file projects and production-quality code. Best for professional development.

~8GB VRAM 32K context

Which Model Should I Use?

💬 Casual chat, quick questions: Bert Nano
📝 Writing, summarizing, general tasks: Bert Mini or Main
🧠 Complex reasoning, math, analysis: Bert Main (with /*think) or Max
💻 Simple code, scripts, debugging: Bert Coder
🏗️ Large codebases, architecture: Bert Max-Coder

🧠 Thinking Mode-BETA

Bert Main (Qwen3-1.7B) supports a special thinking mode that shows the model's reasoning process after the response generation We are improving this feature, so expect a major update in a couple of months.

How to Use

/*think - What is the derivative of x³ + 2x²?

The response will show:

The model's answer (streamed normally)
A thinking box showing the reasoning process
Token count (only counts the response, not thinking)

⚠️ Important

Thinking mode only works with Bert Main. Other models will show a warning if you try to use /*think.

When to Use Thinking

✓ Math problems and calculations
✓ Logic puzzles and reasoning tasks
✓ Complex questions with multiple steps
✓ When you want to understand HOW Bert reached an answer

📁 File References-BETA

You can reference files directly in your queries using the @ symbol. Bert can easily find, check and read files in the current directory, although, Features like reviewing files or adding a relative path are still in early development. if you enconter a issue, dont doubt emailing us at mnisperuza1102@gmail.com

Usage

# Reference a file
Check @main.py for bugs

# Multiple files
Compare @old_version.py and @new_version.py

# Relative paths
Review the code in @src/utils/helpers.js

Supported File Types

Category	Extensions
Code	.py, .js, .ts, .java, .c, .cpp, .go, .rs, .rb, .php
Web	.html, .css, .jsx, .tsx, .vue, .svelte
Data	.json, .yaml, .yml, .xml, .csv, .toml
Docs	.md, .txt, .rst, .log

💡 Pro Tip

File paths are often resolved relative to your current directory. Bert will show "📂 Found: filename" when it successfully reads a file.

🎟️ Token System

Bert uses a weekly token system to manage usage. Every week, you get 20,000 free tokens.

Getting a Token

Visit the Bert CLI homepage
Enter your email
Receive your token (format: BERT-XXXX-XXXX-XXXX-XXXX)
In Bert, type: /*token YOUR-TOKEN-HERE

Token Commands

# Set your token
/*token BERT-A1B2-C3D4-E5F6-0123

# Check remaining tokens
/*tokens

How Tokens Are Counted

📊 Each response uses tokens based on length
🧠 Thinking content does NOT count against your tokens
🔄 Tokens reset every week (Sunday midnight)
📧 One token per email per week

⚙️ Quantization Guide

Quantization reduces model size and memory usage, allowing larger models to run on smaller GPUs.

Level	VRAM	Quality	Speed
INT4	~4GB	Good ⭐	Fast
INT8	~6GB	Very Good	Medium
FP16	~8GB	Excellent	Medium
FP32	CPU	Best	Slow

💡 Recommendation

Start with INT4. It offers the best balance of quality, speed, and memory usage for most users.

✨ Tips & Best Practices

Getting Better Responses

1. Be specific: "Write a Python function that sorts a list" is better than "Write code"
2. Provide context: Share relevant background information
3. Use the right model: Bert Coder for code, Bert Main for reasoning
4. Reference files: Use @filename instead of pasting code

Keyboard Shortcuts

⎋ ESC: Stop generation instantly
↑ Up Arrow: Previous command (terminal dependent)
⌃C Ctrl+C: Interrupt / stop

Troubleshooting

Model won't load?

Try a smaller model or lower quantization. If you're out of VRAM, use bert fp32 for CPU mode.

Slow responses?

Switch to Bert Nano for faster responses, or use INT4 quantization.

Token expired?

Tokens are valid for one week. Get a new one at the homepage.

Model return strange responses?

Let us know which model, email us at mnisperuza1102@gmail.com We will review the issue!.