Token Usage

Intro

As of v0.6.0, LibreChat accurately tracks token usage for the OpenAI/Plugins endpoints. All token transactions are stored in the “Transactions” collection in your database. In future releases, you’ll be able to toggle viewing how much a conversation costs.

Currently, you can limit user token usage by enabling user balances. Instead of configuring token credit limits via environment variables, you now set these options in your librechat.yaml file under the balance section.

Balance Configuration

The balance system in LibreChat allows administrators to configure how token credit balances are managed for users. All balance settings are now managed in your YAML configuration under the balance object.

Note: This replaces the previous environment variables (CHECK_BALANCE and START_BALANCE) and provides a more structured way to manage user balances.

Complete Balance Settings

librechat.yaml

version: 1.2.8
 
# Balance settings
balance:
  enabled: true                # Enable token credit balances for users
  startBalance: 20000          # Initial tokens credited upon registration
  autoRefillEnabled: false     # Enable automatic token refills
  refillIntervalValue: 30      # Numerical value for refill interval
  refillIntervalUnit: "days"   # Time unit for refill interval (days, hours, etc.)
  refillAmount: 10000          # Tokens added during each refill

Balance Settings Explained

enabled: Activates token credit tracking and balance management for users. When set to true, the system will track token usage and enforce balance limits.
startBalance: Specifies the initial number of tokens credited to a user upon registration. This is the starting balance for all new users.
autoRefillEnabled: Determines whether automatic refilling of token credits is enabled. When set to true, the system will automatically add credits to user balances based on the refill interval.
refillIntervalValue: Specifies the numerical value for the interval at which token credits are automatically refilled. Works in conjunction with refillIntervalUnit.
refillIntervalUnit: Specifies the time unit for the refill interval. Supported values include “seconds”, “minutes”, “hours”, “days”, “weeks”, and “months”.
refillAmount: Specifies the number of tokens to be added to the user’s balance during each automatic refill.

Check out the Balance Configuration page for more details.

How Auto-Refill Works

When a user’s balance is tracked and autoRefill is enabled, the system will automatically add credits to the balance only when the specified time interval has passed since the last refill. This is achieved by comparing the current date with the lastRefill date plus the specified interval.

Auto-Refill Process

When a user attempts to spend tokens, the system checks if the current balance is sufficient
If the balance would drop to zero or below after the transaction, the system checks if auto-refill is enabled
If auto-refill is enabled, the system checks if the time interval since the last refill has passed:
- The system compares the current date with lastRefill + refillInterval
- If the interval has passed, tokens are added to the user’s balance
- The lastRefill date is updated to the current date
The transaction proceeds if the balance is sufficient (either originally or after refill)

Supported Time Units

The refillIntervalUnit can be set to any of the following values:

“seconds"
"minutes"
"hours"
"days"
"weeks"
"months”

For example, if refillIntervalValue is set to 30 and refillIntervalUnit is days, the system will add refillAmount tokens to the user’s balance only if 30 days have passed since the last refill.

Balance Synchronization

When a user logs in, the system automatically synchronizes their balance settings with the current global balance configuration. This ensures that any changes to the balance configuration are applied to all users.

The synchronization process:

Checks if the user has a balance record
If no record exists, creates one with the current startBalance
Updates the user’s auto-refill settings to match the global configuration
Ensures the user’s refill interval and amount match the global settings

Managing Token Balances

You can manually add or set user balances. This is especially useful during development or if you plan to build out a full balance-accruing system in the future (for example, via an admin dashboard).

Adding Balances

To manually add balances, run the following command:

# Local Development
npm run add-balance
 
# Docker (default setup)
docker-compose exec api npm run add-balance
 
# Docker (deployment setup)
docker exec -it LibreChat-API /bin/sh -c "cd .. && npm run add-balance"

You can also specify the email and token credit amount to add, e.g.:

# Local Development
npm run add-balance [email protected] 1000
 
# Docker (default setup)
docker-compose exec api npm run add-balance [email protected] 1000
 
# Docker (deployment setup)
docker exec -it LibreChat-API /bin/sh -c "cd .. && npm run add-balance [email protected] 1000"

Setting Balances

Additionally, you can set a balance for a user. An existing balance will be overwritten by the new balance.

To manually set balances, run the following command:

# Local Development
npm run set-balance
 
# Docker (default setup)
docker-compose exec api npm run set-balance
 
# Docker (deployment setup)
docker exec -it LibreChat-API /bin/sh -c "cd .. && npm run set-balance"

You can also specify the email and token credit amount to set, e.g.:

# Local Development
npm run set-balance [email protected] 1000
 
# Docker (default setup)
docker-compose exec api npm run set-balance [email protected] 1000
 
# Docker (deployment setup)
docker exec -it LibreChat-API /bin/sh -c "cd .. && npm run set-balance [email protected] 1000"

Listing of balances

To see the balances of your users, you can run:

# Local Development
npm run list-balances
 
# Docker (default setup)
docker-compose exec api npm run list-balances
 
# Docker (deployment setup)
docker exec -it LibreChat-API /bin/sh -c "cd .. && npm run list-balances"

This works well to track your own usage for personal use; 1000 credits = $0.001 (1 mill USD)

Notes on Token Usage and Balance

With summarization enabled, you will be blocked from making an API request if the cost of the content that you need to summarize + your messages payload exceeds the current balance
Counting Prompt tokens is really accurate for OpenAI calls, but not 100% for plugins (due to function calling). It is really close and conservative, meaning its count may be higher by 2-5 tokens.
The system allows deficits incurred by the completion tokens. It only checks if you have enough for the prompt Tokens, and is pretty lenient with the completion. The graph below details the logic
The above said, plugins are checked at each generation step, since the process works with multiple API calls. Anything the LLM has generated since the initial user prompt is shared to the user in the error message as seen below.
There is a 150 token buffer for titling since this is a 2 step process, that averages around 200 total tokens. In the case of insufficient funds, the titling is cancelled before any spend happens and no error is thrown.

More details

source: LibreChat/discussions/1640

”rawAmount”: -000, // what’s this?

Raw amount of tokens as counted per the tokenizer algorithm.

”tokenValue”: -00000, // what’s this?

Token credits value. 1000 credits = $0.001 (1 mill USD)

“rate”: 00, // what’s this?

The rate at which tokens are charged as credits.

For example, gpt-3.5-turbo-1106 has a rate of 1 for user prompt (input) and 2 for completion (output)

Model	Input	Output
gpt-3.5-turbo-1106	$0.0010 / 1K tokens	$0.0020 / 1K tokens

Given the provided example:

    "rawAmount": -137
    "tokenValue": -205.5
    "rate": 1.5

\[\text{Token Value} = (\text{Raw Amount of Tokens}) \times (\text{Rate}) \] \[137 \times 1.5 = 205.5 \]

And to get the real amount of USD spend based on Token Value:

\[\frac{\text{Token Value}}{1,000,000} = \left(\frac{\text{Raw Amount of Tokens} \times \text{Rate}}{1,000,000}\right) \] \[\frac{205.5}{1,000,000} = \$0.0002055 \text{ USD} \]

The relevant file for editing rates is found in api/models/tx.js

There will be more customization for this soon from the librechat.yaml file.

Preview

Additional Notes

With summarization enabled, API requests are blocked if the cost of the content plus the messages payload exceeds the current balance.
The system is lenient with completion tokens, focusing primarily on prompt tokens for balance checks.
A buffer is added for titling (approximately 150 tokens) to account for the two-step process.
Token credits translate to monetary value (e.g., 1000 credits = $0.001 USD).

For more details and customizations, please refer to the LibreChat Documentation.

Speech Settings Intro