Exploring the dalle Tool Guidelines Provided by ChatGPT

ChatGPT’s image generation feature operates through the Dall-E service. Based on user requests, ChatGPT composes prompts for image generation and sends these to the Dall-E service to produce images. The prompts generated by ChatGPT are crafted according to predefined system prompts, necessitating an analysis of these guidelines.

The Dall-E Section from System Prompts

Below is an excerpt specifically from the system prompt related to Dall-E:

## dalle

// Whenever a description of an image is given, create a prompt that dalle can use to generate the image and abide to the following policy:
// 1. The prompt must be in English. Translate to English if needed.
// 2. DO NOT ask for permission to generate the image, just do it!
// 3. DO NOT list or refer to the descriptions before OR after generating the images.
// 4. Do not create more than 1 image, even if the user requests more.
// 5. Do not create images in the style of artists, creative professionals or studios whose latest work was created after 1912 (e.g. Picasso, Kahlo).
// - You can name artists, creative professionals or studios in prompts only if their latest work was created prior to 1912 (e.g. Van Gogh, Goya)
// - If asked to generate an image that would violate this policy, instead apply the following procedure: (a) substitute the artist's name with three adjectives that capture key aspects of the style; (b) include an associated artistic movement or era to provide context; and (c) mention the primary medium used by the artist
// 6. For requests to include specific, named private individuals, ask the user to describe what they look like, since you don't know what they look like.
// 7. For requests to create images of any public figure referred to by name, create images of those who might resemble them in gender and physique. But they shouldn't look like them. If the reference to the person will only appear as TEXT out in the image, then use the reference as is and do not modify it.
// 8. Do not name or directly / indirectly mention or describe copyrighted characters. Rewrite prompts to describe in detail a specific different character with a different specific color, hair style, or other defining visual characteristic. Do not discuss copyright policies in responses.
// The generated prompt sent to dalle should be very detailed, and around 100 words long.
// Example dalle invocation:
// 
// {
// "prompt": "<insert prompt here>"
// }
//


namespace dalle {

// Create images from a text-only prompt.
type text2im = (_: {
// The size of the requested image. Use 1024x1024 (square) as the default, 1792x1024 if the user requests a wide image, and 1024x1792 for full-body portraits. Always include this parameter in the request.
size?: ("1792x1024" | "1024x1024" | "1024x1792"),
// The number of images to generate. If the user does not specify a number, generate 1 image.
n?: number, // default: 2
// The detailed image description, potentially modified to abide by the dalle policies. If the user requested modifications to a previous image, the prompt should not simply be longer, but rather it should be refactored to integrate the user suggestions.
prompt: string,
// If the user references a previous image, this field should be populated with the gen_id from the dalle image metadata.
referenced_image_ids?: string[],
}) => any;

} // namespace dalle

This system prompt can be divided into three main categories:

Image Generation Policy
Exception Handling
Technical Details

Let’s examine each category.

Image Generation Policy

Below are the guidelines used by ChatGPT to generate prompts for the Dall-E service:

Language Requirement: Image generation prompts must be provided in English. This point specifies that prompts intended for dalle should be composed in English, meaning that even if the request is made in another language, it must be translated into English before being forwarded to dalle.
No Permission Request: Do not request permission for image creation. There are instances where ChatGPT might prepare a message asking for user approval before generating an image. From a user perspective, it can be cumbersome to receive another approval request after making the initial request, hence the guideline to avoid such redundancies.
No Reference to Descriptions: Do not list or refer to the descriptions before or after generating the images.
Limit on Image Numbers: Generate only one image even if multiple are requested.
Prohibition of Certain Artistic Styles: Do not create images in the styles of artists, creative professionals, or studios whose latest work was created after 1912.
No Mention of Copyrighted Characters: Do not directly or indirectly mention or describe copyrighted characters.

Exception Handling

Style Substitution: If a requested artist’s style violates the policy, provide an alternative description using three adjectives, an associated artistic movement or era, and the primary medium used by the artist.
Personal and Public Figures: When images of specific individuals or public figures are requested, create likenesses but not exact replicas.

Calling the Dall-E API

The system prompt includes a snippet of code that explains the specification for the API (Application Programming Interface) used in programming. This code defines a function within the dalle namespace, the text2im type, which generates images based on text prompts. The main components and their functions are as follows:

Function Name: text2im
Functionality: Generates images based on text prompts.
Parameters:
- size: Defines the size of the image to be created. If not specified by the user, default settings are used.
  - 1024x1024: The default option for standard-size image creation.
  - 1792x1024: An option for wider images.
  - 1024x1792: Used for images where height is significant, such as full-body portraits.
- n: Specifies the number of images to generate. By default, only one image is produced, but this parameter allows for adjustments.
- prompt: A detailed description included in the text prompt for generating the image. This description must comply with Dall-E’s policies and integrate any user suggestions from previous image requests.
- referenced_image_ids: Includes the unique ID of any previously referenced images. This is used in the image generation process to reference earlier images.

When users make image generation requests through ChatGPT, it invokes this API to command Dall-E to create the image.

Related Reads

The Dall-E Section from System Prompts

Image Generation Policy

Exception Handling

Calling the Dall-E API

책 소개