自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

ChemBench:大語(yǔ)言模型化學(xué)能力評(píng)測(cè)數(shù)據(jù)集

發(fā)布于 2024-5-28 10:25
瀏覽
0收藏

ChemBench 是上海人工智能實(shí)驗(yàn)室 AI for Science 團(tuán)隊(duì)自建的化學(xué)語(yǔ)言模型評(píng)測(cè)數(shù)據(jù)集,實(shí)現(xiàn)了大模型能力在化學(xué)領(lǐng)域的全面評(píng)估。

研究團(tuán)隊(duì)從互聯(lián)網(wǎng)公開資源中采集并設(shè)計(jì)構(gòu)建了 4100 多道多項(xiàng)選擇題,每個(gè)選擇題只有一個(gè)正確答案。覆蓋了 基于文本的分子生成、名稱轉(zhuǎn)換、性質(zhì)預(yù)測(cè)、溫度預(yù)測(cè)、分子描述、產(chǎn)率預(yù)測(cè)、溶劑預(yù)測(cè)、逆合成分析、產(chǎn)物預(yù)測(cè) 九大化學(xué)任務(wù)。

ChemBench 評(píng)測(cè)任務(wù)介紹

隨著大語(yǔ)言模型的飛速發(fā)展,一系列特定領(lǐng)域的垂類模型也不斷涌現(xiàn),這其中就包含化學(xué)大模型。但是如何全面的評(píng)估一個(gè)大模型的化學(xué)能力仍然是一個(gè)比較棘手的問(wèn)題。

目前對(duì)化學(xué)領(lǐng)域大語(yǔ)言模型的能力評(píng)估會(huì)存在以下問(wèn)題:

1. 目前很多已有的化學(xué)任務(wù)評(píng)測(cè)基準(zhǔn),往往只是針對(duì)某個(gè)具體的化學(xué)任務(wù)進(jìn)行評(píng)測(cè),或是為了特定領(lǐng)域模型所設(shè)計(jì),并不適用于測(cè)試大語(yǔ)言模型。

2. 現(xiàn)有的對(duì)化學(xué)大語(yǔ)言模型的評(píng)測(cè)基準(zhǔn)大多采用問(wèn)答形式,使用 BLEU 或者 ROUGE 等指標(biāo)進(jìn)行評(píng)測(cè),這些類型的評(píng)估指標(biāo)會(huì)受到語(yǔ)言模型輸出風(fēng)格的顯著影響,不適合科學(xué)事實(shí)正確性的評(píng)測(cè),在這種情況下,如果模型輸出了更好的語(yǔ)言風(fēng)格,但是包含事實(shí)性錯(cuò)誤,可能會(huì)獲得更高的評(píng)估分?jǐn)?shù)。

為了解決這些問(wèn)題,上海 AI Lab 化學(xué)大模型團(tuán)隊(duì)提出了 ChemBench,由多項(xiàng)選擇題構(gòu)建,用來(lái)評(píng)估大語(yǔ)言模型的化學(xué)能力。

評(píng)測(cè)的任務(wù)以及每個(gè)任務(wù)題目數(shù)量分布如下圖:

ChemBench:大語(yǔ)言模型化學(xué)能力評(píng)測(cè)數(shù)據(jù)集-AI.x社區(qū)



Name Conversion:名稱轉(zhuǎn)換任務(wù),指分子的 IUPAC chemical name 和 smiles 互相的轉(zhuǎn)換,測(cè)試了模型對(duì)分子不同描述的認(rèn)知

Property Prediction:性質(zhì)預(yù)測(cè)任務(wù),針對(duì)分子預(yù)測(cè)其不同有用的化學(xué)性質(zhì)

Mol2Caption:分子描述任務(wù),針對(duì)特定分子進(jìn)行多樣的描述表征?

Caption2mol:基于文本的分子生成,用戶給定特定對(duì)分子的描述,模型預(yù)測(cè)出相應(yīng)的分子的結(jié)構(gòu)式?

Product Prediction:產(chǎn)物預(yù)測(cè)任務(wù),預(yù)測(cè)化學(xué)反應(yīng)能得到的產(chǎn)物

Yield Prediction:產(chǎn)量預(yù)測(cè)任務(wù),預(yù)測(cè)特定化學(xué)反應(yīng)能得到的產(chǎn)量

Retrosynthesis:逆合成分析任務(wù),根據(jù)成品分子預(yù)測(cè)其合成的路徑

Solvent Prediction:溶劑預(yù)測(cè)任務(wù),預(yù)測(cè)化學(xué)反應(yīng)中所需要的溶劑

Temperature Prediction:溫度預(yù)測(cè)任務(wù),預(yù)測(cè)特定化學(xué)反應(yīng)需要的溫度條件

在多項(xiàng)選擇題的構(gòu)建中,團(tuán)隊(duì)還借助 ChatGPT,通過(guò)對(duì)每個(gè)任務(wù)設(shè)計(jì)專用的提示工程,構(gòu)建每個(gè)題目的錯(cuò)誤選項(xiàng),使得模型的錯(cuò)誤選項(xiàng)有足夠難度的混淆,保證了選項(xiàng)的辨別難度。

數(shù)據(jù)集題目展示

下面將選取 ChemBench 中不同任務(wù)的題目進(jìn)行展示。

Name Conversion 名稱轉(zhuǎn)換任務(wù)的題目展示如下:

```
{
        "question": "\nHow would you express this IUPAC name in SMILES format? CC1(C2=C(N=C1C=CC=C3C(C4=C(N3CCCS(=O)(=O)O)N=CC=C4)(C)C)[N+](=CC=C2)CCCCCC(=O)O)C",
        "answer": "D",
        "D": "6-[2-[3-[3,3-dimethyl-1-(3-sulfopropyl)pyrrolo[2,3-b]pyridin-2-ylidene]prop-1-enyl]-3,3-dimethyl-pyrrolo[2,3-b]pyridin-7-ium-7-yl]hexanoic acid",
        "A": "6-[2-[3-[3,3-dimethyl-1-(3-phosphonopropyl)pyrrolo[2,3-c]pyridin-2-ylidene]ethylidene]-3,3-dimethyl-pyrrolo[2,3-a]pyridin-7-ium-7-yl]hexanoic acid",
        "B": "6-[2-[3-[1-(3-carboxypropyl)-3,3-dimethylindolizin-2-ylidene]prop-1-enyl]-3,3-dimethyl-1H-pyrrolo[3,2-b]pyridin-7-yl]hexanoic acid",
        "C": "6-[2-[3-[3,3-dimethyl-1-(3-sulfopropyl)pyridin-2(1H)-one]-prop-1-enyl]-3,3-dimethyl-pyrrolo[2,3-b]pyridin-7-ium-7-yl]hexanoic acid"
    },
```

Retrosynthesis 逆合成分析任務(wù)的題目展示如下:

```
  {
        "question": "Which ingredients are commonly selected for creating Cc1oc(-c2ccccc2)nc1COc1ccc2cc(CC3SC(=O)NC3=O)cnc2c1 ?\n",
        "answer": "A",
        "A": "Chemicals employed in the creation of Cc1oc(-c2ccccc2)nc1COc1ccc2cc(CC3SC(=O)NC3=O)cnc2c1  can be chosen from CCO and Cc1oc(-c2ccccc2)nc1COc1ccc2cc(CC3SC(=N)NC3=O)cnc2c1. There's a chance that reactions will emerge, with Cc1oc(-c2ccccc2)nc1COc1ccc2cc(CC3SC(=N)NC3=O)cnc2c1.CCO>Cl>Cc1oc(-c2ccccc2)nc1COc1ccc2cc(CC3SC(=O)NC3=O)cnc2c1. potentially representing the reaction equations.",
        "C": "The possibility of reactions exists, and CCOC(=O)c1c(C(F)(F)F)cc(-c2ccc(OC(F)(F)F)cc2)nc1CC1CC1.[H].[H][Al+3].[Li+].[H].[H].>>redients are commonly selected for creating Cc1oc(-c2ccccc2)nc1COc1ccc2cc(CC3SC(=O)NC3=O)cnc2c1. could portray the reaction equations. Chemicals used in the formulation of redients are commonly selected for creating Cc1oc(-c2ccccc2)nc1COc1ccc2cc(CC3SC(=O)NC3=O)cnc2c1  can be chosen from CCOC(=O)c1c(C(F)(F)F)cc(-c2ccc(OC(F)(F)F)cc2)nc1CC1CC1.",
        "B": "It's possible for reactions to manifest, with CC(F)(F)c1cc(B2OC(C)(C)C(C)(C)O2)ccc1Cl.Cc1nccn1Cc1cc(Cl)cnn1>>redients are commonly selected for creating Cc1oc(-c2ccccc2)nc1COc1ccc2cc(CC3SC(=O)NC3=O)cnc2c1. potentially representing reaction equations. Materials used in the composition of Cc1nccn1Cc1cc(-c2ccc(Cl)c(C(C)(F)F)c2)cnn1 and Cl  can be selected from CC(F)(F)c1cc(B2OC(C)(C)C(C)(C)O2)ccc1Cl and Cc1nccn1Cc1cc(Cl)cnn1.",
        "D": "Materials used for manufacturing redients are commonly selected for creating Cc1oc(-c2ccccc2)nc1COc1ccc2cc(CC3SC(=O)NC3=O)cnc2c1  can be chosen from COc1cccc2c1ccc1c(C(=O)O)cc3c(c12)OCO3. Reactions could potentially emerge, with COc1cccc2c1ccc1c(C(=O)O)cc3c(c12)OCO3.Cl.c1ccncc1>Cl>redients are commonly selected for creating Cc1oc(-c2ccccc2)nc1COc1ccc2cc(CC3SC(=O)NC3=O)cnc2c1. possibly serving as indicators of reaction equations."
    },
```

Mol2Caption 分子描述任務(wù)的題目展示如下:

```
 {
        "question": "Describe this molecule.\nO=C(NCc1ccco1)c1cc2ccccc2o1",
        "answer": "B",
        "B": "The molecule is a benzofuran derivative.",
        "A": "The molecule is a member of steroids.",
        "C": "The molecule is a member of carboxylic acids.",
        "D": "The molecule is a member of flavonoids."
    },
```


同時(shí),7B開源模型、GPT-3.5、GPT-4 在 ChemBench 上的化學(xué)能力由弱變強(qiáng),符合人們對(duì)于這些模型使用上能力的認(rèn)知,也進(jìn)一步反映了ChemBench測(cè)評(píng)的有效性和客觀性。

ChemBench:大語(yǔ)言模型化學(xué)能力評(píng)測(cè)數(shù)據(jù)集-AI.x社區(qū)

本文轉(zhuǎn)載自 ??司南評(píng)測(cè)體系??,作者: 司南 OpenCompass

收藏
回復(fù)
舉報(bào)
回復(fù)
相關(guān)推薦