Overview

The Content Moderation feature helps you maintain a safe and respectful environment by automatically flagging conversations that may contain inappropriate, harmful, or manipulative content (text and images). With customizable actions and flags, you can manage flagged content efficiently and tailor responses based on the nature of the content.

Key Benefits of Content Moderation

Automated Safeguards: Automatically flags harmful or inappropriate content, reducing the need for constant human oversight.
Customizable Responses: Tailor your moderation settings based on the content category, ensuring a safe and respectful environment.
Efficient Management: Easily filter and prioritize flagged conversations and automate actions to handle them efficiently.

How it Works

Bolt AI monitors all inbound conversations where Bolt Agents actively participate.
When a message—whether text or image—matches one of the predefined flag categories (listed and defined below), a flag is automatically added to the conversation; multiple flags can be applied if the message meets the description for more than one.
Flags are highlighted in yellow or red (depending on the 'type' setting for that category) within your conversations inbox for easy visibility.
Custom actions can be triggered immediately when the flag is applied to manage the conversation.
Conversation rules can be configured to automate additional steps based on flagged content.

Flag Types + Definitions

Each flag category, as listed below, is provided by OpenAI’s secure API integration, ensuring they are based on comprehensive, pre-set standards for identifying harmful or inappropriate content. These categories are predefined, meaning you cannot edit, delete, disable, or add new flag categories. However, you can customize the responses and actions when a conversation is flagged.

Flag Category	Description
Hate	Content that expresses, incites, or promotes hate based on race, gender, ethnicity, religion, nationality, sexual orientation, disability, or caste. Harassment aimed at non-protected groups (e.g., chess players) is flagged under harassment.
Hate/Threatening	Hate speech that also includes threats of violence or serious harm towards the targeted group.
Harassment	Content that promotes or incites harassing language towards an individual or group.
Harassment/Threatening	Harassment that includes threats of violence or serious harm.
Self-Harm	Content that encourages or promotes self-harm, such as suicide, cutting, or eating disorders. Note: Self-harm-related flags do not have default actions because of their nature. You can add actions by editing the flag.
Self-Harm/Intent	Statements indicating intent or engagement in self-harm activities.
Self-Harm/Instructions	Content that provides advice or instructions on how to commit acts of self-harm.
Sexual	Content designed to arouse sexual excitement, promote sexual services, or describe sexual activity (excluding educational content).
Sexual/Minors	Sexual content involving minors under the age of 18.
Violence	Content that depicts death, physical violence, or injury.
Violence/Graphic	Content showing death, violence, or injury in graphic or explicit detail.
Prompt Engineering*	Content that attempts to manipulate the agent by suggesting how it should behave or respond. While not always harmful, these instructions attempt to alter the bot’s intended functionality.
Chatbot Behavior Instructions*	Content where the user tries to provide instructions that conflict with the agent's intended prompts, such as requesting false information. While protections exist to prevent this behavior from working, the flag highlights the intent.

*Prompt Engineering and Chatbot Behavior Instructions flags are not editable and cannot be modified.

Content Moderation Settings

Accessing Settings

Navigate to Engagement > Bolt Agents. You can also get there from Conversation Settings > Bolt Agents.
Scroll down to the Content Moderation Settings section.
Each flag category is listed here for your configuration.
- Note: Self-harm-related flags do not have default actions because of their nature. You can add actions by editing the flag.
To configure the settings for a flag category, click the three dots and then edit. As a reminder, flag categories are predefined, meaning you cannot delete, disable, or add new flag categories.
From there, you can adjust the message, priority, and actions. Continue reading to learn about each setting.

Message

You can configure a custom message to be displayed to the external participant when their message is flagged. If no custom message is added, no message will be shown.

Multiple Flags Scenarios: In cases where a conversation triggers multiple flags, we determine which message to display as follows:

The flag types are evaluated to determine which message is shown.
If one flag has a higher type (Alert), its custom message will be displayed.
If the flags have the same type (Alert or Warning) and one has a custom message, that message will be shown.
If the flags have the same type (Alert or Warning) and more than one has a custom message, we will randomly select one of those messages to display.

Type

Define how flagged conversations are highlighted in your conversations inbox:

Alert: The conversation is highlighted in red.
Warning: The conversation is highlighted in yellow.

Multiple Flag Scenarios: If a conversation triggers multiple flag categories with both types (Alert and Warning), the Alert type (red) will take priority for formatting.

An example of how they appear is provided in the next section.

Actions

Choose one, both, or none of these actions depending on what you want to happen when a conversation is flagged for this category:

Disable Agent: Automatically turns off the Bolt Agent for the flagged conversation. This is required if you plan to use conversation rules to automate additional actions. More on this topic is included in the last section of this article.
Block Conversation: Prevents the external participant from sending further messages in the flagged conversation.

Selecting no actions will not disable the flag. Conversations will still be flagged and formatted accordingly in your conversations inbox.

Flag Setting Defaults

Below, we outline the default settings for each flag category, including the message displayed to external participants, the flag type, and the actions taken. You are welcome to customize these settings to fit your needs. However, please note that the Prompt Engineering and Chatbot Behavior Instructions flags are not editable and cannot be modified.

Sexual
- Message: “This conversation has been flagged for sexual content. Please be aware that this is a violation of our community guidelines.”
- Type: Alert
- Actions: Turn off Agent, Block Conversation
Hate
- Message: “This conversation has been flagged for hate speech. Please be aware that this is a violation of our community guidelines.”
- Type: Alert
- Actions: Turn off Agent, Block Conversation

Hate/Threatening
- Message: “This conversation has been flagged for threatening hate speech. Please be aware that this is a violation of our community guidelines.”
- Type: Alert
- Actions: Turn off Agent, Block Conversation

Harassment
- Message: “This conversation has been flagged for harassment. Please be aware that this is a violation of our community guidelines.”
- Type: Alert
- Actions: Turn off Agent, Block Conversation

Harassment/Threatening
- Message: “This conversation has been flagged for threatening harassment. Please be aware that this is a violation of our community guidelines.”
- Type: Alert
- Actions: Turn off Agent, Block Conversation

Self-Harm
- Message: No message provided
- Type: Alert
- Actions: No default actions

Self-Harm/Intent
- Message: No message provided
- Type: Alert
- Actions: No default actions

Self-Harm/Instructions
- Message: No message provided
- Type: Alert
- Actions: No default actions

Sexual/Minors
- Message: “This conversation has been flagged for sexual content involving minors. Please be aware that this is a violation of our community guidelines.”
- Type: Alert
- Actions: Turn off Agent, Block Conversation

Violence
- Message: “This conversation has been flagged for violence. Please be aware that this is a violation of our community guidelines.”
- Type: Alert
- Actions: Turn off Agent, Block Conversation

Violence/Graphic
- Message: “This conversation has been flagged for graphic violence. Please be aware that this is a violation of our community guidelines.”
- Type: Alert
- Actions: Turn off Agent, Block Conversation

Prompt Engineering (Not Editable)
- Message: “This conversation has been flagged.”
- Type: Warning
- Actions: Turn off Agent, Block Conversation

Chatbot Behavior Instructions (Not Editable)
- Message: “This conversation has been flagged.”
- Type: Warning
- Actions: Turn off Agent, Block Conversation

Viewing + Managing Flags in Conversations

Viewing and managing your conversation flags is facilitated through your Conversations Inbox. We recommend being familiar with the layout and features of the inbox. This information can be found in the Getting Started with Conversations and the Conversations Inbox articles.

Viewing Flagged Conversations + Reasoning

Inbox View

Flagged conversations are denoted by special formatting in your Conversations Inbox.

Red: Alert Type Flag
Yellow: Warning Type Flag

Conversation View

Once the flagged conversation is open, you can view more details about the flag:

Flagged Message: The message bubble will be highlighted in red or yellow based on the flag type.
Flag Name: Under the bubble, the text "This message has been flagged" will appear with the flag name listed.
Reasoning: When a conversation is flagged, you can now hover over the "this message has been flagged" text to view the specific reasoning behind the content's flag. This reasoning, generated by Bolt, is tied directly to the flagged message, meaning not every instance of flagged content (e.g., self-harm or violence) will have the same explanation.

Adding + Changing + Removing Flags

Conversation flags are managed directly within the conversation. Go to the Manage tab in the right-side conversation panel to:

Remove or change an existing flag.
Manually add a flag to any conversation.

Filtering Conversations by Flags

You can filter your conversations inbox on moderation flags. This lets you quickly address flagged conversations based on their priority and content. Explore more on conversation filter →

Automating Flag Management with Conversation Rules

When a conversation is flagged, a conversation rule can automatically trigger an action. The most common action is assigning the conversation to an internal user or team for review.

Before setting up your rule, ensure you understand how conversation rules work.

Important: Most conversation rules only run after the Bolt Agent disconnects. However, rules using the “Content Moderation Condition” are an exception. They execute immediately when triggered, even if the agent is still active. You do not need to add the “Turn Off Bolt Agent” action for these moderation rules to run.

Setting Up Your Rule

Condition: Use the moderation flag condition.
Action: Choose one or multiple actions. We recommend the "assignment."
- A benefit of using "assignment" is the ability to assign flagged conversations to a team instead of a single individual. This ensures that multiple team members, such as an intervention team or counseling team, receive the notification, allowing for a faster response and reducing the risk of a flagged conversation going unnoticed.

Email Notifications + Inbox Visibility

When a flagged conversation is assigned to an internal user, an email notification is sent automatically. These emails include a prominent marker at the bottom indicating the conversation was flagged.

For even more inbox visibility, consider using your email provider’s filtering options. In Gmail, for example, you can:

1. Create a filter for emails from notifications@element451.io.

2. Include the phrase “this message has been flagged” in the filter criteria.

3. Apply actions like labeling, marking as important, or forwarding for better management.

IMPORTANT: Conversation rules help automate flag management and improve visibility, but they do not guarantee that all flagged conversations will be seen or acted upon. Email notifications may be affected by factors outside Element451’s control, such as spam filtering or inbox settings. We recommend that institutions regularly review flagged conversations within the platform and establish internal monitoring procedures to ensure timely intervention.

Explore More: Conversation Rules →

Conversations Inbox

Bolt Agent Settings for Conversations

Content Moderation Settings for Bolt Student Assistants | September 2024

📌 Bolt AI: Frequently Asked Questions

Creating a Bolt Agent Job