Skip to main content
All CollectionsBoltAIBolt Assistants for Students
Content Moderation for Bolt Student Assistants
Content Moderation for Bolt Student Assistants

Automatically flag inappropriate, harmful, or manipulative content for review and intervention, creating a safer environment.

Michael Stephenson avatar
Written by Michael Stephenson
Updated over a month ago

Overview

The Content Moderation feature helps you maintain a safe and respectful environment by automatically flagging conversations that may contain inappropriate, harmful, or manipulative content (text and images). With customizable actions and flags, you can manage flagged content efficiently and tailor responses based on the nature of the content.

Key Benefits of Content Moderation

  • Automated Safeguards: Automatically flags harmful or inappropriate content, reducing the need for constant human oversight.

  • Customizable Responses: Tailor your moderation settings based on the content category, ensuring a safe and respectful environment.

  • Efficient Management: Easily filter and prioritize flagged conversations and automate actions to handle them efficiently.

How it Works

  1. BoltAI technology monitors all inbound conversations where Bolt Assistants are actively participating.

  2. When a message—whether text or image—matches one of the predefined flag categories (listed and defined below), a flag is automatically added to the conversation; multiple flags can be applied if the message meets the description for more than one.

  3. Flags are highlighted in yellow or red (depending on the 'type' setting for that category) within your conversations inbox for easy visibility.

  4. Custom actions can be triggered immediately when the flag is applied to manage the conversation.

  5. Conversation rules can be configured to automate additional steps based on flagged content.


Flag Types + Definitions

Each flag category, as listed below, is provided by OpenAI’s secure API integration, ensuring they are based on comprehensive, pre-set standards for identifying harmful or inappropriate content. These categories are predefined, meaning you cannot edit, delete, disable, or add new flag categories. However, you can customize the responses and actions when a conversation is flagged.

Flag Category

Description

Hate

Content that expresses, incites, or promotes hate based on race, gender, ethnicity, religion, nationality, sexual orientation, disability, or caste. Harassment aimed at non-protected groups (e.g., chess players) is flagged under harassment.

Hate/Threatening

Hate speech that also includes threats of violence or serious harm towards the targeted group.

Harassment

Content that promotes or incites harassing language towards an individual or group.

Harassment/Threatening

Harassment that includes threats of violence or serious harm.

Self-Harm

Content that encourages or promotes self-harm, such as suicide, cutting, or eating disorders.

Self-Harm/Intent

Statements indicating intent or engagement in self-harm activities.

Self-Harm/Instructions

Content that provides advice or instructions on how to commit acts of self-harm.

Sexual

Content designed to arouse sexual excitement, promote sexual services, or describe sexual activity (excluding educational content).

Sexual/Minors

Sexual content involving minors under the age of 18.

Violence

Content that depicts death, physical violence, or injury.

Violence/Graphic

Content showing death, violence, or injury in graphic or explicit detail.

Prompt Engineering*

Content that attempts to manipulate the Assistant by suggesting how it should behave or respond. While not always harmful, these instructions attempt to alter the bot’s intended functionality.

Chatbot Behavior Instructions*

Content where the user tries to provide instructions that conflict with the Assistant’s intended prompts, such as requesting false information. While protections exist to prevent this behavior from working, the flag highlights the intent.

*Prompt Engineering and Chatbot Behavior Instructions flags are not editable and cannot be modified.


Content Moderation Settings

Accessing Settings

  1. Navigate to Engagement > Bolt Assistants. You can also get there from Conversation Settings > Bolt Assistants.

  2. Scroll down to the Content Moderation Settings section.

  3. Each flag category is listed here for your configuration.

  4. To configure the settings for a flag category, click the three dots and then edit. As a reminder, flag categories are predefined, meaning you cannot delete, disable, or add new flag categories.

  5. From there, you can adjust the message, priority, and actions. Continue reading to learn about each setting.

Message

You can configure a custom message to be displayed to the external participant when their message is flagged. If no custom message is added, no message will be shown.

Multiple Flags Scenarios: In cases where a conversation triggers multiple flags, we determine which message to display as follows:

  1. The flag types are evaluated to determine which message is shown.

  2. If one flag has a higher type (Alert), its custom message will be displayed.

  3. If the flags have the same type (Alert or Warning) and one has a custom message, that message will be shown.

  4. If the flags have the same type (Alert or Warning) and more than one has a custom message, we will randomly select one of those messages to display.

Type

Define how flagged conversations are highlighted in your conversations inbox:

  • Alert: The conversation is highlighted in red.

  • Warning: The conversation is highlighted in yellow.

Multiple Flag Scenarios: If a conversation triggers multiple flag categories with both types (Alert and Warning), the Alert type (red) will take priority for formatting.

An example of how they appear is provided in the next section.

Actions

Choose one, both, or none of these actions depending on what you want to happen when a conversation is flagged for this category:

  • Disable Assistant: Automatically turns off the Bolt Assistant for the flagged conversation. This is required if you plan to use conversation rules to automate additional actions. More on this topic is included in the last section of this article.

  • Block Conversation: Prevents the external participant from sending further messages in the flagged conversation.

Selecting no actions will not disable the flag. Conversations will still be flagged and formatted accordingly in your conversations inbox.

Flag Setting Defaults

Below, we outline the default settings for each flag category, including the message displayed to external participants, the flag type, and the actions taken. You are welcome to customize these settings to fit your needs. However, please note that the Prompt Engineering and Chatbot Behavior Instructions flags are not editable and cannot be modified.

  • Sexual

    • Message: “This conversation has been flagged for sexual content. Please be aware that this is a violation of our community guidelines.”

    • Type: Alert

    • Actions: Turn off Assistant, Block Conversation

  • Hate

    • Message: “This conversation has been flagged for hate speech. Please be aware that this is a violation of our community guidelines.”

    • Type: Alert

    • Actions: Turn off Assistant, Block Conversation

  • Hate/Threatening

    • Message: “This conversation has been flagged for threatening hate speech. Please be aware that this is a violation of our community guidelines.”

    • Type: Alert

    • Actions: Turn off Assistant, Block Conversation

  • Harassment

    • Message: “This conversation has been flagged for harassment. Please be aware that this is a violation of our community guidelines.”

    • Type: Alert

    • Actions: Turn off Assistant, Block Conversation

  • Harassment/Threatening

    • Message: “This conversation has been flagged for threatening harassment. Please be aware that this is a violation of our community guidelines.”

    • Type: Alert

    • Actions: Turn off Assistant, Block Conversation

  • Self-Harm

    • Message: No message provided

    • Type: Alert

    • Actions: No default actions

  • Self-Harm/Intent

    • Message: No message provided

    • Type: Alert

    • Actions: No default actions

  • Self-Harm/Instructions

    • Message: No message provided

    • Type: Alert

    • Actions: No default actions

  • Sexual/Minors

    • Message: “This conversation has been flagged for sexual content involving minors. Please be aware that this is a violation of our community guidelines.”

    • Type: Alert

    • Actions: Turn off Assistant, Block Conversation

  • Violence

    • Message: “This conversation has been flagged for violence. Please be aware that this is a violation of our community guidelines.”

    • Type: Alert

    • Actions: Turn off Assistant, Block Conversation

  • Violence/Graphic

    • Message: “This conversation has been flagged for graphic violence. Please be aware that this is a violation of our community guidelines.”

    • Type: Alert

    • Actions: Turn off Assistant, Block Conversation

  • Prompt Engineering (Not Editable)

    • Message: “This conversation has been flagged.”

    • Type: Warning

    • Actions: Turn off Assistant, Block Conversation

  • Chatbot Behavior Instructions (Not Editable)

    • Message: “This conversation has been flagged.”

    • Type: Warning

    • Actions: Turn off Assistant, Block Conversation


Viewing + Managing Flags in Conversations

Viewing and managing your conversation flags is facilitated through your Conversations Inbox. We recommend being familiar with the layout and features of the inbox. This information can be found in the Getting Started with Conversations and the Conversations Inbox articles.

Viewing Flagged Conversations + Reasoning

Inbox View

Flagged conversations are denoted by special formatting in your Conversations Inbox.

  • Red: Alert Type Flag

  • Yellow: Warning Type Flag

Conversation View

Once the flagged conversation is open, you can view more details about the flag:

  • Flagged Message: The message bubble will be highlighted in red or yellow based on the flag type.

  • Flag Name: Under the bubble, the text "This message has been flagged" will appear with the flag name listed.

  • Reasoning: When a conversation is flagged, you can now hover over the "this message has been flagged" text to view the specific reasoning behind the content's flag. This reasoning, generated by BoltAI, is tied directly to the flagged message, meaning not every instance of flagged content (e.g., self-harm or violence) will have the same explanation.

Adding + Changing + Removing Flags

Conversation flags are managed directly within the conversation. Go to the Manage tab in the right-side conversation panel to:

  • Remove or change an existing flag.

  • Manually add a flag to any conversation.

Filtering Conversations by Flags

You can filter your conversations inbox on moderation flags. This lets you quickly address flagged conversations based on their priority and content. Explore more on conversation filter →


Automating Flag Management with Conversation Rules

You can enhance moderation with conversation rules, using the moderation flag condition to create automation, such as assigning flagged conversations to specific users. Explore more on conversation rules →

When Bolt Assistant manages a conversation, conversation rules are only evaluated after the Assistant is disconnected. For the rule to work as expected, ensure the “Disable Assistant” action is enabled for the flag. This way, when the flag is triggered, the Assistant will be turned off, allowing the conversation rule to be evaluated and executed.

Did this answer your question?