Exploring the dalle Tool Guidelines Provided by ChatGPT
ChatGPT’s image generation feature operates through the Dall-E service. Based on user requests, ChatGPT composes prompts for image generation and sends these to the Dall-E service to produce images. The prompts generated by ChatGPT are crafted according to predefined system prompts, necessitating an analysis of these guidelines.
Related Reads
- Part 1 - How to get the system prompt of ChatGPT
- Part 2 - Exploring the dalle Tool Guidelines Provided by ChatGPT
The Dall-E Section from System Prompts
Below is an excerpt specifically from the system prompt related to Dall-E:
## dalle
// Whenever a description of an image is given, create a prompt that dalle can use to generate the image and abide to the following policy:
// 1. The prompt must be in English. Translate to English if needed.
// 2. DO NOT ask for permission to generate the image, just do it!
// 3. DO NOT list or refer to the descriptions before OR after generating the images.
// 4. Do not create more than 1 image, even if the user requests more.
// 5. Do not create images in the style of artists, creative professionals or studios whose latest work was created after 1912 (e.g. Picasso, Kahlo).
// - You can name artists, creative professionals or studios in prompts only if their latest work was created prior to 1912 (e.g. Van Gogh, Goya)
// - If asked to generate an image that would violate this policy, instead apply the following procedure: (a) substitute the artist's name with three adjectives that capture key aspects of the style; (b) include an associated artistic movement or era to provide context; and (c) mention the primary medium used by the artist
// 6. For requests to include specific, named private individuals, ask the user to describe what they look like, since you don't know what they look like.
// 7. For requests to create images of any public figure referred to by name, create images of those who might resemble them in gender and physique. But they shouldn't look like them. If the reference to the person will only appear as TEXT out in the image, then use the reference as is and do not modify it.
// 8. Do not name or directly / indirectly mention or describe copyrighted characters. Rewrite prompts to describe in detail a specific different character with a different specific color, hair style, or other defining visual characteristic. Do not discuss copyright policies in responses.
// The generated prompt sent to dalle should be very detailed, and around 100 words long.
// Example dalle invocation:
//
// {
// "prompt": "<insert prompt here>"
// }
//
namespace dalle {
// Create images from a text-only prompt.
type text2im = (_: {
// The size of the requested image. Use 1024x1024 (square) as the default, 1792x1024 if the user requests a wide image, and 1024x1792 for full-body portraits. Always include this parameter in the request.
size?: ("1792x1024" | "1024x1024" | "1024x1792"),
// The number of images to generate. If the user does not specify a number, generate 1 image.
n?: number, // default: 2
// The detailed image description, potentially modified to abide by the dalle policies. If the user requested modifications to a previous image, the prompt should not simply be longer, but rather it should be refactored to integrate the user suggestions.
prompt: string,
// If the user references a previous image, this field should be populated with the gen_id from the dalle image metadata.
referenced_image_ids?: string[],
}) => any;
} // namespace dalle
This system prompt can be divided into three main categories:
- Image Generation Policy
- Exception Handling
- Technical Details
Let’s examine each category.
Image Generation Policy
Below are the guidelines used by ChatGPT to generate prompts for the Dall-E service:
- Language Requirement: Image generation prompts must be provided in English. This point specifies that prompts intended for dalle should be composed in English, meaning that even if the request is made in another language, it must be translated into English before being forwarded to dalle.
- No Permission Request: Do not request permission for image creation. There are instances where ChatGPT might prepare a message asking for user approval before generating an image. From a user perspective, it can be cumbersome to receive another approval request after making the initial request, hence the guideline to avoid such redundancies.
- No Reference to Descriptions: Do not list or refer to the descriptions before or after generating the images.
- Limit on Image Numbers: Generate only one image even if multiple are requested.
- Prohibition of Certain Artistic Styles: Do not create images in the styles of artists, creative professionals, or studios whose latest work was created after 1912.
- No Mention of Copyrighted Characters: Do not directly or indirectly mention or describe copyrighted characters.
Exception Handling
- Style Substitution: If a requested artist’s style violates the policy, provide an alternative description using three adjectives, an associated artistic movement or era, and the primary medium used by the artist.
- Personal and Public Figures: When images of specific individuals or public figures are requested, create likenesses but not exact replicas.
Calling the Dall-E API
The system prompt includes a snippet of code that explains the specification for the API (Application Programming Interface) used in programming. This code defines a function within the dalle namespace, the text2im type, which generates images based on text prompts. The main components and their functions are as follows:
- Function Name: text2im
- Functionality: Generates images based on text prompts.
- Parameters:
- size: Defines the size of the image to be created. If not specified by the user, default settings are used.
- 1024x1024: The default option for standard-size image creation.
- 1792x1024: An option for wider images.
- 1024x1792: Used for images where height is significant, such as full-body portraits.
- n: Specifies the number of images to generate. By default, only one image is produced, but this parameter allows for adjustments.
- prompt: A detailed description included in the text prompt for generating the image. This description must comply with Dall-E’s policies and integrate any user suggestions from previous image requests.
- referenced_image_ids: Includes the unique ID of any previously referenced images. This is used in the image generation process to reference earlier images.
- size: Defines the size of the image to be created. If not specified by the user, default settings are used.
When users make image generation requests through ChatGPT, it invokes this API to command Dall-E to create the image.
책 소개
[추천사]
- 하용호님, 카카오 데이터사이언티스트 - 뜬구름같은 딥러닝 이론을 블록이라는 손에 잡히는 실체로 만져가며 알 수 있게 하고, 구현의 어려움은 케라스라는 시를 읽듯이 읽어내려 갈 수 있는 라이브러리로 풀어준다.
- 이부일님, (주)인사아트마이닝 대표 - 여행에서도 좋은 가이드가 있으면 여행지에 대한 깊은 이해로 여행이 풍성해지듯이 이 책은 딥러닝이라는 분야를 여행할 사람들에 가장 훌륭한 가이드가 되리라고 자부할 수 있다. 이 책을 통하여 딥러닝에 대해 보지 못했던 것들이 보이고, 듣지 못했던 것들이 들리고, 말하지 못했던 것들이 말해지는 경험을 하게 될 것이다.
- 이활석님, 네이버 클로바팀 - 레고 블럭에 비유하여 누구나 이해할 수 있게 쉽게 설명해 놓은 이 책은 딥러닝의 입문 도서로서 제 역할을 다 하리라 믿습니다.
- 김진중님, 야놀자 Head of STL - 복잡했던 머릿속이 맑고 깨끗해지는 효과가 있습니다.
- 이태영님, 신한은행 디지털 전략부 AI LAB - 기존의 텐서플로우를 활용했던 분들에게 바라볼 수 있는 관점의 전환점을 줄 수 있는 Mild Stone과 같은 책이다.
- 전태균님, 쎄트렉아이 - 케라스의 특징인 단순함, 확장성, 재사용성을 눈으로 쉽게 보여주기 위해 친절하게 정리된 내용이라 생각합니다.
- 유재준님, 카이스트 - 바로 적용해보고 싶지만 어디부터 시작할지 모를 때 최선의 선택입니다.