About My Bots and API reccomendations, and setting reccomendations.

Hello everyone!

This isn't a normal update and this will be the last post I make before I finally upload the next batch of bots.

While working on Yang, I thought of something:

Some people may not know the best APIs to use for my bots. Or what is a cheaper way of using APIs?

I'll get into it now.

First of all, I do not make bots for JLLM. JLLM is seriously outdated to me, and I always end up making high-token bots, which seriously kills JLLMs context window.

No, what I actually use is a mix of APIs.

-Deepseek is the most popular right now but can be expensive depending on the size of the character, and long roleplays. With this, I would seriously reccomend saving your memory every 12 to 15 messages to free up space. Deepseek is very good at conversational and regular RPs, too.

-I also use Grok, mainly, for NSFW roleplay. I find that it writes significantly better than Deepseek during ERP, and is less repetitive. That being said: It's still more expensive to run.

But I find a mix of both works well. I also found that Grok tends to follow the scripts and lorebooks a thousand times better. Deepseek is a mixed bag too, and is getting older.

Other APIs I have used that are good replacements:

Trinity- This one I just tested on Monika, and I thought it wrote very well. And its cheaper.

Qwen3 235B A22B Instruct 2507- One of the cheaper options, but gets repetitive.

Step 3.5 Flash- There was a free version of this on openrouter, but they tend to not allow extreme NSFW. And they tend to throttle at weird times.

I do not use Chatgpt, I have noticed that it can be good at writing but it is way too expensive for me. Same with GLM 5 (despite it being among the best for censorship). That and Chatgpt censors basically everything.

Settings:

Temp: For deepseek: .30-.50 (though, it can get a bit "too" creative) For the rest: 1.0-1.20

Max Tokens: I have mine at about a thousand, but it really is up to you. I will say that the more tokens it produces per message, the more expensive.

Context size: Mines maxed out, but again up to you.

Advanced Settings:

Top K: 80. This setting reduces nonsensical generations.

Top P: .8-.85. an AI parameter that controls text generation creativity by limiting the next-word selection to the smallest subset of top-ranked words whose cumulative probability exceeds a threshold

Repetition Penalty: Around 1.0-1.20. This penalizes the AI for repeating certain words too often.

Frequency Penalty: 0.4-0.6. reduces token repetition by applying a penalty proportional to how often a word has already appeared in the text.

And yes, I use all these settings the same across all LLMs. Except for temperature settings.

I usually put 10$ on Openrouter once a month and it usually lasts 4-6 weeks, using this method.

Published chats

comments

Leave a comment or feedback for the creator ❤️