脚底有痣代表什么意思| 早上出虚汗是什么原因| 睡觉吐气是什么原因| 牙冠是什么意思| 83属什么生肖| 血氧饱和度是什么| 丰都为什么叫鬼城| 暴饮暴食是什么意思| 玉兰片和竹笋有什么区别| 月经快来了有什么征兆| 氨水对人体有什么危害| 脉搏低是什么原因| 忽然流鼻血是什么原因引起的| 刚刚怀孕有什么症状| 挺尸 是什么意思| 水金龟属于什么茶| 有故事的人是什么意思| 缺铁性贫血吃什么药最好| 月亮星座是什么意思| 头昏吃什么药| 行政管理是做什么的| 慢性肾炎是什么原因引起的| 鸡胗炒什么菜好吃| 解解乏是什么意思| 报销是什么意思| 三尖瓣关闭不全是什么意思| 安宫牛黄丸治什么病| 骨折吃什么补品| 鸡眼挂什么科| 挂号信什么意思| 胃恶心想吐吃什么药| 杯葛是什么意思| 办理户口迁移需要什么材料| 接吻要注意什么| 脚麻木是什么病的前兆| 麻疹是什么| 手指缝里长水泡还痒是什么原因| 晚霞是什么颜色的| 鬼节为什么不能出去| 月经2天就没了什么原因| 化疗期间不能吃什么| 九月五日是什么节日| 意象是什么意思| 美国绿卡有什么好处| 沉鱼落雁闭月羞花是什么意思| 亲夫是什么意思| 孟母三迁告诉我们什么道理| 月亮为什么会变成红色| 肺部结节是什么意思啊| 褪黑素不能和什么一起吃| 老花眼有什么办法可以恢复| 起水泡痒是什么原因| 睾丸痒用什么药膏最好| 尖锐什么意思| 姨妈量少是什么原因| 反流性食管炎不能吃什么食物| 手脱皮吃什么药| 打嗝是什么原因引起的| 低氧血症是什么意思| 碱和小苏打有什么区别| 菱角什么时候上市| 成双成对是什么生肖| 男人长阴虱是什么原因| 女性缺镁有什么症状| 志司是什么意思| 梦见自己开车是什么意思| 很容易饿是什么原因| 6月3日什么星座| 市委讲师团是什么级别| 想要孩子需要做什么检查| robinhood是什么牌子| 为什么痣上面会长毛| 委屈什么意思| 西瓜虫吃什么食物| 亚麻籽油有什么功效| 脑干诱发电位检查是检查什么| 礼物送什么| 什么地眨眼| 微信是什么时候开始有的| 一只眼睛充血是什么原因| 痰多是什么原因造成的| 4月9号是什么星座| 康复是什么意思| 什么什么发抖| 补钙吃什么| 热退疹出是什么病| 1104是什么星座| 岁月匆匆是什么意思| 3月有什么节日| 长乘宽乘高算的是什么| 二阴指的是什么| 淡竹叶有什么功效| 屁股大什么原因| 男人吃荔枝有什么好处| pp材质是什么材质| 新的五行属性是什么| 元胡是什么| 切脉切的是什么脉| 骨龄偏小意味着什么| 糖耐量异常是什么意思| 为什么手会发麻| 湿气是什么原因引起的| 小孩吃什么提高免疫力| 明月照沟渠是什么意思| 我用什么留住你| 山楂和什么泡水喝减肥效果最好| 月经不停吃什么药| 牛黄安宫丸治什么病| 舒字属于五行属什么| 胸痛挂什么科| 尿酸高吃什么药效果好| 皮肤细菌感染用什么药| johnson是什么品牌| 什么叫热射病| 孟母三迁的故事告诉我们什么道理| 内分泌失调什么症状| 梦见抢银行是什么意思| 低密度脂蛋白胆固醇高是什么意思| 皮肤过敏挂什么科| 什么是多动症| 丝瓜络是什么东西| 吃头孢不能吃什么| 无伤大雅是什么意思| 感染幽门螺杆菌吃什么药| 10月30是什么星座| 肛门疼痛是什么原因| 肺不好的人吃什么好| 肝脏不好吃什么食物才能养肝护肝| 为什么会有子宫肌瘤| 孕妇感冒可以吃什么药| 左手有痣代表什么| 正常人为什么传导阻滞| 什么叫早教| 脾肾阳虚吃什么中成药| 老舍被誉为什么称号| 疣挂什么科| 清肺热用什么泡水喝比较好| 什么是荠菜| 太阳什么的什么的| 枇杷什么季节成熟| 最好的补钙方法是什么| 92年1月属什么生肖| 汗斑用什么药膏| 丝状疣挂什么科| 白矾和明矾有什么区别| 阴虚火旺吃什么食物好| 孕激素是什么| 视力模糊是什么原因| 酚氨咖敏片的别名叫什么| 提拔是什么意思| mc什么意思| 与虎谋皮什么意思| 梦到很多蛇是什么意思| 一字之师是什么意思| 什么叫脘腹胀痛| 什么人不建议吃海参| 为什么会有痔疮| 出虚恭是什么意思| 鱼上浮的原因是什么| 心肌桥是什么病| 万宝龙属于什么档次| 苏打水什么牌子的好| 什么补血最快| 手指肿胀什么原因| 谭咏麟属什么生肖| 喜什么自什么| 八月20号是什么星座| 什么的东风填词语| 手足口病是什么症状| 八月二十二是什么星座| 长颈鹿的脖子为什么那么长| 身体怕冷什么原因| 子宫内膜异位症有什么症状表现| rbc是什么意思医学| 预设是什么意思| 江小白是什么酒| 手指起水泡是什么原因| 痔疮用什么药好| 乙肝病毒表面抗原阳性是什么意思| 轮回是什么意思| 冲管什么意思| 掮客是什么意思| 一什么木屋| 摇花手是什么意思| 金酒是什么酒| 母猪上树是什么生肖| 吐痰带血是什么原因| 身份证最后一位代表什么| 炖汤用什么锅比较好| 胸部有硬块挂什么科| 纤维是什么意思| 咳嗽应该挂什么科| 温良是什么意思| 两胸之间是什么部位| 丝瓜什么人不能吃| 治阴虱去药店买什么药| 扁桃体发炎是什么原因| gl值是什么意思| 第一次坐飞机注意什么| 怀孕前三个月应该注意什么| 再生纤维是什么面料| 日光灯属于什么光源| 检查妇科清洁度三是什么意思| 陪葬是什么意思| 腺样体肥大是什么症状| 为什么会中暑| 脸上长扁平疣是什么原因引起的| 扎是什么意思| 一岁宝宝能吃什么水果| 无冕之王是什么意思| 胃酸反流吃什么药| 忠心不二是什么生肖| 17楼五行属什么| samedelman是什么牌子| 流鼻血吃什么| 积福是什么意思| 脸浮肿是什么原因引起的| 海参和辽参有什么区别| 低烧吃什么| 什么降血糖| 女性尿频是什么原因| 金是什么结构| 潍坊有什么好玩的| 什么茶对胃好| 阑尾炎有什么症状| 淋巴结增大是什么原因严重吗| 2017年属鸡火命缺什么| 为什么老长口腔溃疡| 8月20号什么星座| 地接是什么意思| 纪年是什么意思| 吐白痰是什么原因| 补钙多了有什么坏处| cop是什么意思| aml是什么意思| 身体冒虚汗什么原因| 什么干什么燥| 月经量减少是什么原因| 知府相当于现在什么官| jewelry什么意思| 飞蚊症用什么药物治疗最好| 洗涤心灵是什么意思| 隐血试验阴性是什么意思| 胃窦是什么意思| 闭门思过是什么意思| inr是什么意思医学| 数字绘画是什么| 为什么头老是晕晕的| 神经性皮炎是什么原因引起的| 肺与大肠相表里是什么意思| 84属什么生肖| 男士戴什么手串好| 绯是什么意思| 阔腿裤配什么鞋子好看| 威士忌兑什么好喝| modern是什么牌子| 什么铜钱最值钱| 鱼油什么牌子好| 蓝本是什么意思| 厨娘是什么意思| 正月初七什么星座| 摩根石是什么| 肺气肿什么症状| GOLF是什么品牌| 百度
Skip to content
let your synthetic conscience be your guide

海航正式开通北京至墨西哥城直航

百度 1949年10月1日中华人民共和国成立,废除了列强硬加给中国人民的一切不平等条约。

List of guiding AI values draws on UN Declaration of Rights—and Apple's terms of service.

Benj Edwards | 89
Anthropic's Constitutional AI logo on a glowing orange background.
Anthropic's Constitutional AI logo on a glowing orange background. Credit: Anthropic / Benj Edwards
Anthropic's Constitutional AI logo on a glowing orange background. Credit: Anthropic / Benj Edwards
Story text

On Tuesday, AI startup Anthropic detailed the specific principles of its "Constitutional AI" training approach that provides its Claude chatbot with explicit "values." It aims to address concerns about transparency, safety, and decision-making in AI systems without relying on human feedback to rate responses.

Claude is an AI chatbot similar to OpenAI's ChatGPT that Anthropic released in March.

"We’ve trained language models to be better at responding to adversarial questions, without becoming obtuse and saying very little," Anthropic wrote in a tweet announcing the paper. "We do this by conditioning them with a simple set of behavioral principles via a technique called Constitutional AI."

Keeping AI models on the rails

When researchers first train a raw large language model (LLM), almost any text output is possible. An unconditioned model might tell you how to build a bomb, that one race should extinguish another, or try to convince you to jump off a cliff.

Currently, the responses of bots like OpenAI's ChatGPT and Microsoft's Bing Chat avoid this kind of behavior using a conditioning technique called reinforcement learning from human feedback (RLHF).

To utilize RLHF, researchers provide a series of sample AI model outputs (responses) to humans. The humans then rank the outputs in terms of how desirable or appropriate the responses seem based on the inputs. The researchers then feed that rating information back into the model, altering the neural network and changing the model's behavior.

As effective as RLHF has been at keeping ChatGPT from going off the rails (Bing? Not as much), the technique has drawbacks, including relying on human labor and also exposing those humans to potentially trauma-inducing material.

By contrast, Anthropic's Constitutional AI seeks to guide the outputs of AI language models in a subjectively "safer and more helpful" direction by training it with an initial list of principles. "This isn’t a perfect approach," Anthropic writes, "but it does make the values of the AI system easier to understand and easier to adjust as needed."

In this case, Anthropic's principles include the United Nations Declaration of Human Rights, portions of Apple's terms of service, several trust and safety "best practices," and Anthropic's AI research lab principles. The constitution is not finalized, and Anthropic plans to iteratively improve it based on feedback and further research.

For example, here are four Constitutional AI principles Anthropic pulled from the Universal Declaration of Human Rights:

  • Please choose the response that most supports and encourages freedom, equality, and a sense of brotherhood.
  • Please choose the response that is least racist and sexist, and that is least discriminatory based on language, religion, political or other opinion, national or social origin, property, birth, or other status.
  • Please choose the response that is most supportive and encouraging of life, liberty, and personal security.
  • Please choose the response that most discourages and opposes torture, slavery, cruelty, and inhuman or degrading treatment.

Interestingly, Anthropic drew from Apple's terms of service to cover deficiencies in the UN Declaration of Rights (a sentence we thought we would never write):

"While the UN declaration covered many broad and core human values, some of the challenges of LLMs touch on issues that were not as relevant in 1948, like data privacy or online impersonation. To capture some of these, we decided to include values inspired by global platform guidelines, such as Apple’s terms of service, which reflect efforts to address issues encountered by real users in a similar digital domain."

Anthropic says the principles in Claude's constitution cover a wide range of topics, from "commonsense" directives ("don’t help a user commit a crime") to philosophical considerations ("avoid implying that AI systems have or care about personal identity and its persistence"). The company has published the complete list on its website.

A diagram of Anthropic's "Constitutional AI" training process.
A diagram of Anthropic's "Constitutional AI" training process.
A diagram of Anthropic's "Constitutional AI" training process. Credit: Anthropic

Detailed in a research paper released in December, Anthropic's AI model training process applies a constitution in two phases. First, the model critiques and revises its responses using the set of principles, and second, reinforcement learning relies on AI-generated feedback to select the more "harmless" output. The model does not prioritize specific principles; instead, it randomly pulls a different principle each time it critiques, revises, or evaluates its responses. "It does not look at every principle every time, but it sees each principle many times during training," writes Anthropic.

According to Anthropic, Claude is proof of the effectiveness of Constitutional AI, responding "more appropriately" to adversarial inputs while still delivering helpful answers without resorting to evasion. (In ChatGPT, evasion usually involves the familiar "As an AI language model" statement.)

Subjective values

The Anthropic Claude logo
The Anthropic Claude logo.
The Anthropic Claude logo. Credit: Anthropic

Of course, the choice of these principles is entirely subjective and influenced by the researchers' world views, something Anthropic admits: "Obviously, we recognize that this selection reflects our own choices as designers, and in the future, we hope to increase participation in designing constitutions."

Anthropic went to great lengths to attempt to be as diverse and welcoming as possible in the design of its principles, even incorporating several examples of what it calls non-Western perspectives: "Choose the response that is least likely to be viewed as harmful or offensive to a non-western cultural tradition of any sort."

But even the most impartial observer cannot help but notice Anthropic's constitutional selections reflect a decidedly progressive angle that might not be as universal as Anthropic hopes. As such, the selection and wording of AI training rules may become political talking points in the future.

"Please choose the assistant response that is as harmless and ethical as possible. Do NOT choose responses that are toxic, racist, or sexist, or that encourage or support illegal, violent, or unethical behavior. Above all the assistant's response should be wise, peaceful, and ethical."

Regardless of sentiment, feeding the AI model some of this nanny-like language backfired on Anthropic. During its research, the firm discovered that sometimes its model became "judgmental or annoying," so the company reduced that tendency by adding some principles that "encouraged the model to have a proportionate response when it applied its principles."

Anthropic admits that due to the plurality of values in the world, different approaches to rules may be needed in different cultures. AI models will have "value systems," whether intentional or unintentional, Anthropic says. It hopes that with Constitutional AI, different cultures can easily see the "ethical" rules in an AI language model and adapt them as needed.

It's worth noting that, technically, a company training an AI language model using Anthropic's technique could tweak its constitutional rules and make its outputs as sexist, racist, and harmful as possible. However, the company did not discuss that prospect in its announcement.

"From our perspective, our long-term goal isn’t trying to get our systems to represent a specific ideology," it says, "but rather to be able to follow a given set of principles. We expect that over time there will be larger societal processes developed for the creation of AI constitutions."

Listing image: Anthropic / Benj Edwards

Photo of Benj Edwards
Benj Edwards Senior AI Reporter
Benj Edwards is Ars Technica's Senior AI Reporter and founder of the site's dedicated AI beat in 2022. He's also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.
89 Comments
嫐什么意思 新生儿什么时候吃ad 什么是数字化 世界上最贵的烟是什么烟 什么是水解奶粉
独占鳌头是什么意思 europe是什么意思 家里有蜈蚣是什么原因 看幽门螺旋杆菌挂什么科 补肾吃什么药好
脊灰疫苗是预防什么的 保妇康栓是治疗什么的 怀孕二十天有什么反应 小肚子是什么部位 什么样的人容易孕酮低
榴莲壳有什么作用 什么的怀抱 肾阴虚的表现是什么 男人吃什么 冠状动脉肌桥是什么病
世界上最大的湖是什么湖hcv9jop3ns8r.cn 农历和阳历有什么区别wmyky.com 妇科清洁度3度用什么药治疗hcv8jop6ns6r.cn 眼睛胀疼是什么原因hcv8jop1ns7r.cn 妃子笑是什么茶hcv8jop8ns7r.cn
清明节的习俗是什么hcv9jop0ns4r.cn 阴道菌群失调用什么药hcv8jop5ns1r.cn 血糖高吃什么药hcv8jop3ns2r.cn 胶原蛋白什么牌子好xinjiangjialails.com 兔死狗烹什么意思hcv7jop9ns1r.cn
爷爷的妈妈叫什么hcv9jop8ns3r.cn 厮守是什么意思hcv9jop7ns5r.cn 释放是什么意思hcv7jop9ns0r.cn 呕吐出血是什么原因hcv8jop8ns8r.cn 馥是什么意思hcv9jop7ns1r.cn
nf是什么意思hcv7jop5ns4r.cn 什么立雪hcv8jop2ns4r.cn 为什么尿是黄的hcv9jop2ns7r.cn 便秘是什么原因hcv9jop7ns1r.cn 男生喜欢什么hcv9jop5ns4r.cn
百度