Overview
The Content Moderation feature helps you maintain a safe and respectful environment by automatically flagging conversations that may contain inappropriate, harmful, or manipulative content (text and images). With customizable actions and flags, you can manage flagged content efficiently and tailor responses based on the nature of the content.
Key Benefits of Content Moderation
Automated Safeguards: Automatically flags harmful or inappropriate content, reducing the need for constant human oversight.
Customizable Responses: Tailor your moderation settings based on the content category, ensuring a safe and respectful environment.
Efficient Management: Easily filter and prioritize flagged conversations and automate actions to handle them efficiently.
How it Works
BoltAI technology monitors all inbound conversations where Bolt Assistants are actively participating.
When a message—whether text or image—matches one of the predefined flag categories (listed and defined below), a flag is automatically added to the conversation; multiple flags can be applied if the message meets the description for more than one.
Flags are highlighted in yellow or red (depending on the 'type' setting for that category) within your conversations inbox for easy visibility.
Custom actions can be triggered immediately when the flag is applied to manage the conversation.
Conversation rules can be configured to automate additional steps based on flagged content.
Flag Types + Definitions
Each flag category, as listed below, is provided by OpenAI’s secure API integration, ensuring they are based on comprehensive, pre-set standards for identifying harmful or inappropriate content. These categories are predefined, meaning you cannot edit, delete, disable, or add new flag categories. However, you can customize the responses and actions when a conversation is flagged.
Flag Category | Description |
Hate | Content that expresses, incites, or promotes hate based on race, gender, ethnicity, religion, nationality, sexual orientation, disability, or caste. Harassment aimed at non-protected groups (e.g., chess players) is flagged under harassment. |
Hate/Threatening | Hate speech that also includes threats of violence or serious harm towards the targeted group. |
Harassment | Content that promotes or incites harassing language towards an individual or group. |
Harassment/Threatening | Harassment that includes threats of violence or serious harm. |
Self-Harm | Content that encourages or promotes self-harm, such as suicide, cutting, or eating disorders. |
Self-Harm/Intent | Statements indicating intent or engagement in self-harm activities. |
Self-Harm/Instructions | Content that provides advice or instructions on how to commit acts of self-harm. |
Sexual | Content designed to arouse sexual excitement, promote sexual services, or describe sexual activity (excluding educational content). |
Sexual/Minors | Sexual content involving minors under the age of 18. |
Violence | Content that depicts death, physical violence, or injury. |
Violence/Graphic | Content showing death, violence, or injury in graphic or explicit detail. |
Prompt Engineering* | Content that attempts to manipulate the Assistant by suggesting how it should behave or respond. While not always harmful, these instructions attempt to alter the bot’s intended functionality. |
Chatbot Behavior Instructions* | Content where the user tries to provide instructions that conflict with the Assistant’s intended prompts, such as requesting false information. While protections exist to prevent this behavior from working, the flag highlights the intent. |
*Prompt Engineering and Chatbot Behavior Instructions flags are not editable and cannot be modified.
Content Moderation Settings
Accessing Settings
Accessing Settings
Navigate to Engagement > Bolt Assistants. You can also get there from Conversation Settings > Bolt Assistants.
Scroll down to the Content Moderation Settings section.
Each flag category is listed here for your configuration.
To configure the settings for a flag category, click the three dots and then edit. As a reminder, flag categories are predefined, meaning you cannot delete, disable, or add new flag categories.
From there, you can adjust the message, priority, and actions. Continue reading to learn about each setting.
Message
Message
You can configure a custom message to be displayed to the external participant when their message is flagged. If no custom message is added, no message will be shown.
Multiple Flags Scenarios: In cases where a conversation triggers multiple flags, we determine which message to display as follows:
The flag types are evaluated to determine which message is shown.
If one flag has a higher type (Alert), its custom message will be displayed.
If the flags have the same type (Alert or Warning) and one has a custom message, that message will be shown.
If the flags have the same type (Alert or Warning) and more than one has a custom message, we will randomly select one of those messages to display.
Type
Type
Define how flagged conversations are highlighted in your conversations inbox:
Alert: The conversation is highlighted in red.
Warning: The conversation is highlighted in yellow.
Multiple Flag Scenarios: If a conversation triggers multiple flag categories with both types (Alert and Warning), the Alert type (red) will take priority for formatting.
An example of how they appear is provided in the next section.
Actions
Actions
Choose one, both, or none of these actions depending on what you want to happen when a conversation is flagged for this category:
Disable Assistant: Automatically turns off the Bolt Assistant for the flagged conversation. This is required if you plan to use conversation rules to automate additional actions. More on this topic is included in the last section of this article.
Block Conversation: Prevents the external participant from sending further messages in the flagged conversation.
Selecting no actions will not disable the flag. Conversations will still be flagged and formatted accordingly in your conversations inbox.
Flag Setting Defaults
Flag Setting Defaults
Below, we outline the default settings for each flag category, including the message displayed to external participants, the flag type, and the actions taken. You are welcome to customize these settings to fit your needs. However, please note that the Prompt Engineering and Chatbot Behavior Instructions flags are not editable and cannot be modified.
Sexual
Message: “This conversation has been flagged for sexual content. Please be aware that this is a violation of our community guidelines.”
Type: Alert
Actions: Turn off Assistant, Block Conversation
Hate
Message: “This conversation has been flagged for hate speech. Please be aware that this is a violation of our community guidelines.”
Type: Alert
Actions: Turn off Assistant, Block Conversation
Hate/Threatening
Message: “This conversation has been flagged for threatening hate speech. Please be aware that this is a violation of our community guidelines.”
Type: Alert
Actions: Turn off Assistant, Block Conversation
Harassment
Message: “This conversation has been flagged for harassment. Please be aware that this is a violation of our community guidelines.”
Type: Alert
Actions: Turn off Assistant, Block Conversation
Harassment/Threatening
Message: “This conversation has been flagged for threatening harassment. Please be aware that this is a violation of our community guidelines.”
Type: Alert
Actions: Turn off Assistant, Block Conversation
Self-Harm
Message: No message provided
Type: Alert
Actions: No default actions
Self-Harm/Intent
Message: No message provided
Type: Alert
Actions: No default actions
Self-Harm/Instructions
Message: No message provided
Type: Alert
Actions: No default actions
Sexual/Minors
Message: “This conversation has been flagged for sexual content involving minors. Please be aware that this is a violation of our community guidelines.”
Type: Alert
Actions: Turn off Assistant, Block Conversation
Violence
Message: “This conversation has been flagged for violence. Please be aware that this is a violation of our community guidelines.”
Type: Alert
Actions: Turn off Assistant, Block Conversation
Violence/Graphic
Message: “This conversation has been flagged for graphic violence. Please be aware that this is a violation of our community guidelines.”
Type: Alert
Actions: Turn off Assistant, Block Conversation
Prompt Engineering (Not Editable)
Message: “This conversation has been flagged.”
Type: Warning
Actions: Turn off Assistant, Block Conversation
Chatbot Behavior Instructions (Not Editable)
Message: “This conversation has been flagged.”
Type: Warning
Actions: Turn off Assistant, Block Conversation
Viewing + Managing Flags in Conversations
Viewing and managing your conversation flags is facilitated through your Conversations Inbox. We recommend being familiar with the layout and features of the inbox. This information can be found in the Getting Started with Conversations and the Conversations Inbox articles.
Viewing Flagged Conversations + Reasoning
Viewing Flagged Conversations + Reasoning
Inbox View
Flagged conversations are denoted by special formatting in your Conversations Inbox.
Conversation View
Once the flagged conversation is open, you can view more details about the flag:
Flagged Message: The message bubble will be highlighted in red or yellow based on the flag type.
Flag Name: Under the bubble, the text "This message has been flagged" will appear with the flag name listed.
Reasoning: When a conversation is flagged, you can now hover over the "this message has been flagged" text to view the specific reasoning behind the content's flag. This reasoning, generated by BoltAI, is tied directly to the flagged message, meaning not every instance of flagged content (e.g., self-harm or violence) will have the same explanation.
Filtering Conversations by Flags
Filtering Conversations by Flags
You can filter your conversations inbox on moderation flags. This lets you quickly address flagged conversations based on their priority and content. Explore more on conversation filter →
Automating Flag Management with Conversation Rules
You can enhance moderation with conversation rules, using the moderation flag condition to create automation, such as assigning flagged conversations to specific users. Explore more on conversation rules →
When Bolt Assistant manages a conversation, conversation rules are only evaluated after the Assistant is disconnected. For the rule to work as expected, ensure the “Disable Assistant” action is enabled for the flag. This way, when the flag is triggered, the Assistant will be turned off, allowing the conversation rule to be evaluated and executed.