你听过这些AI术语并跟着点头,现在我们来搞清楚它们到底是什么。

内容总结:
人工智能术语速查手册:帮你读懂AI世界的“黑话”
人工智能正以前所未有的速度改变世界,同时也催生了一套全新的语言体系来描述自身的发展。当您花五分钟阅读AI相关内容时,很可能会遇到LLM(大语言模型)、RAG(检索增强生成)、RLHF(基于人类反馈的强化学习)等十几个专业术语,即便是科技领域的资深人士有时也会感到困惑。以下术语表旨在为您厘清这些概念,并将随技术演进定期更新,如同一份“活文档”。
AGI(通用人工智能)
一个含义模糊的术语,通常指在多数任务上能力超越普通人类的AI。OpenAI CEO山姆·奥特曼将其描述为“可以雇来当同事的普通人类水平”,而OpenAI章程则定义为“在大多数有经济价值的工作上超越人类的高度自主系统”。谷歌DeepMind的理解略有不同,认为它是“在大多数认知任务上至少与人类能力相当的AI”。专家们对此也众说纷纭,无需过分纠结。
AI Agent(AI智能体)
指能够代表用户自主执行一系列复杂任务的工具,远超基础聊天机器人的能力,例如报销费用、预订机票餐厅、甚至编写和维护代码。这一新兴领域定义尚不统一,基础设施也在建设中,但其核心概念是一个可调用多个AI系统来完成多步骤任务的自主系统。
API端点(API接口)
可以理解为软件后台的“按钮”,其他程序可以“按下”它来调用软件功能。开发者利用这些接口构建集成,例如让一个应用从另一个应用拉取数据,或让AI Agent无需人工手动操作即可直接控制第三方服务。大多数智能家居设备都具备此类隐藏接口。
思维链(Chain of Thought)
类似于人类解复杂数学题时需要“打草稿”分步计算。对大语言模型而言,思维链推理意味着将问题分解为若干中间步骤,从而提升最终答案的质量。虽然耗时更长,但在逻辑推理和代码编写中准确率更高。推理模型正是在传统大语言模型基础上通过强化学习优化而来的。
编码智能体
AI智能体的细分领域,专门用于软件开发。它不仅能建议代码供人审查,还能自主编写、测试和调试代码,处理那些通常占据开发者大量时间的迭代试错工作。它能在整个代码库中发现问题、运行测试并推送修复,如同一位“永不睡觉、永不分心”的实习生——但最终仍需人工把关。
算力(Compute)
指支撑AI模型运行的至关重要的计算能力。这一术语常引申为提供计算能力的硬件,如GPU(图形处理器)、CPU(中央处理器)、TPU(张量处理器)等,它们是现代AI产业的基石。
深度学习(Deep Learning)
机器学习的一个子集,采用多层人工神经网络结构,能比简单线性模型做出更复杂的关联分析。其结构灵感来源于人脑神经元相互连接的路径。深度学习模型能自主识别数据中的重要特征,并通过试错和调整不断提升输出。但需要海量数据(数百万以上),训练周期长,成本较高。
扩散模型(Diffusion)
众多AI艺术、音乐和文本生成模型的核心技术。受物理学启发,扩散系统通过不断添加“噪声”逐渐“破坏”数据结构,然后AI通过学习“逆向扩散”过程,从噪声中恢复出原始数据。
蒸馏(Distillation)
一种“教师-学生”模型的知识萃取技术。开发者向“教师”大模型发送请求并记录输出,然后用这些输出来训练“学生”小模型,使其模仿教师的行为。蒸馏可用于创建更小、更高效的模型,估计OpenAI的GPT-4 Turbo就是通过蒸馏开发的。从竞争对手处蒸馏通常违反API服务条款。
微调(Fine-tuning)
在已有AI模型基础上,通过输入新的、特定领域的数据进行进一步训练,以优化其在特定任务上的表现。许多AI初创公司以此为基础,结合自身领域知识对通用大模型进行微调,以打造商业产品。
GAN(生成对抗网络)
一种机器学习框架,由一对神经网络(生成器与判别器)相互博弈:生成器试图生成逼真数据骗过判别器,判别器则努力识别伪造数据。这种对抗能自动优化AI输出的真实性,常用于深度伪造等应用,但更适用于照片、视频等相对狭窄的领域。
幻觉(Hallucination)
AI行业的术语,指模型凭空捏造信息,产生不正确的输出。这是AI质量的重大隐患,可能导致误导性结果甚至现实风险,例如医疗查询给出有害建议。问题源于训练数据存在空白,因此行业正推动开发更专业的垂直领域AI来降低风险。
推理(Inference)
指运行AI模型的过程,即让模型根据已学数据进行预测或得出结论。没有训练就没有推理。不同硬件运行推理的速度天差地别,大模型在普通笔记本上运行将极其缓慢。
大语言模型(LLM)
驱动ChatGPT、Claude、Gemini等AI助手的核心模型。LLM是由数十亿参数构成的深度神经网络,能从海量书籍、文章和对话记录中学习词语关系,形成语言的多维映射。用户输入提示词后,模型生成最符合语境的回应。
内存缓存(Memory Cache)
一种提高推理效率的优化技术。通过保存特定计算结果供后续查询复用,减少重复计算,从而降低功耗、加快响应速度。常见的有KV缓存,可在Transformer模型中提升效率。
神经网络(Neural Network)
支撑深度学习乃至整个生成式AI热潮的多层算法结构。其设计灵感源于人脑神经元网络,但直到近年GPU的普及才真正释放其潜力,使AI在语音识别、自动驾驶、药物研发等领域取得飞跃。
开源(Open Source)
指软件或AI模型的底层代码公开,供任何人使用、检查和修改。Meta的Llama系列是AI领域的典型代表,与之相对的是闭源(如OpenAI的GPT系列)。开源能加速技术进步和独立安全审计,是AI行业的核心争议之一。
并行化(Parallelization)
同时执行多项任务而非顺序执行。现代GPU专为大规模并行计算设计,是AI产业的硬件基石。随着模型日益庞大,跨芯片、跨机器的并行能力成为决定模型构建速度与成本的关键因素。
内存危机(RAMageddon)
描述AI行业引发的RAM芯片短缺潮。顶级科技公司和AI实验室为数据中心大量采购RAM,导致游戏机、智能手机、企业服务器等行业面临芯片涨价缺货。这种供应瓶颈短期内难以缓解。
强化学习(Reinforcement Learning)
一种通过试错和奖励机制训练AI的方法,类似用零食训练宠物。与基于固定数据集的监督学习不同,强化学习让模型自主探索环境、根据反馈持续调整行为。目前在游戏、机器人控制及提升大模型推理能力方面表现出色。
Token(词元)
人机通信的基本单元。AI通过分词技术将原始文本切分成模型能处理的小段数据(常为词的一部分)。在企业应用中,Token也是计费单位——AI公司按Token数量收费,使用越多费用越高。
Token吞吐量
衡量AI系统单位时间内能处理多少工作量的指标。高Token吞吐量意味着模型能同时服务更多用户且响应更快。AI专家卡帕西形容,看着AI订阅闲置不用的感觉,就像当年读研时昂贵的硬件设备没被充分利用一样焦虑。
训练(Training)
机器学习AI的开发过程,即向模型输入大量数据,使其从模式中学习并生成有效输出。训练成本高昂,因为需要海量数据且呈上升趋势。采用微调等混合方法可在不从头开始的情况下控制成本。
迁移学习(Transfer Learning)
利用已训练好的AI模型作为起点,开发用于不同但相关任务的新模型,实现知识复用。这能节省开发成本,尤其在目标任务数据有限时。但需注意其局限性,模型仍需额外数据才能在特定领域表现良好。
权重(Weights)
AI训练的核心参数,决定训练数据中不同特征的重要程度。模型开始时随机分配权重,训练过程中不断调整,使输出更接近目标。例如房价预测模型中,卧室数量、是否有车库等特征的权重反映了它们对房价的影响程度。
验证损失(Validation Loss)
衡量AI模型学习效果的数值,越低越好。研究人员通过它实时判断何时停止训练、调整参数或排查问题。它主要用于发现“过拟合”——即模型只是死记硬背了训练数据,而非真正理解规律。好比死记硬背去年考题的学生,与真正掌握知识的学生之间的区别。
中文翻译:
人工智能正在改变世界,同时也在发明一套全新的语言来描述它是如何做到的。只需花五分钟阅读关于AI的文章,你就会遇到LLM、RAG、RLHF以及其他十几个术语,这些术语甚至能让科技领域非常聪明的人感到不安。本词汇表正是我们试图解决这一问题的尝试。随着该领域的发展,我们会定期更新,因此可以将其视为一份活的文档,就像它所描述的AI系统一样。
AGI
通用人工智能,或称AGI,是一个模糊的术语。但它通常指的是在多数(甚至大多数)任务上能力超过普通人类的AI。OpenAI首席执行官萨姆·奥尔特曼曾将AGI描述为“相当于你可以雇佣为同事的普通人类”。与此同时,OpenAI的章程将AGI定义为“在大多数具有经济价值的工作上超越人类的高度自主系统”。谷歌DeepMind的理解与这两个定义略有不同;该实验室将AGI视为“在大多数认知任务上至少与人类能力相当的AI”。感到困惑吗?别担心——就连AI研究前沿的专家们也感到困惑。
AI Agent
AI Agent指的是一种利用AI技术代表你执行一系列任务的工具——其能力超越更基础的AI聊天机器人——例如处理费用报销、预订机票或餐厅座位,甚至编写和维护代码。然而,正如我们之前解释过的,这个新兴领域有很多不断变化的组成部分,因此“AI Agent”对不同的人可能意味着不同的事物。支撑其预期能力的基础设施也仍在建设之中。但基本概念意味着一个自主系统,它可能利用多个AI系统来执行多步骤任务。
API端点
你可以将API端点视为软件背面的一些“按钮”,其他程序可以“按下”这些按钮来让该软件执行操作。开发者使用这些接口来构建集成——例如,允许一个应用程序从另一个应用程序拉取数据,或者使AI Agent能够直接控制第三方服务,而无需人工手动操作每个接口。大多数智能家居设备和互联平台都有这些隐藏按钮可用,即使普通用户从未见过或与它们交互过。随着AI Agent的能力越来越强,它们越来越能够自主发现并使用这些端点,从而为自动化开启强大且有时意想不到的可能性。
思维链
面对一个简单问题,人脑可以不假思索地回答——比如“哪种动物更高,长颈鹿还是猫?”但在许多情况下,你需要纸笔才能得出正确答案,因为其中涉及中间步骤。例如,如果一个农民有鸡和牛,它们总共有40个头和120条腿,你可能需要写出一个简单的方程才能得出答案(20只鸡和20头牛)。
在AI语境中,大语言模型的思维链推理意味着将一个问题分解成更小的、中间的步骤,以提高最终结果的质量。获得答案通常需要更长的时间,但答案更可能正确,尤其是在逻辑或编码场景中。推理模型是从传统大语言模型发展而来的,并通过强化学习针对思维链思维进行了优化。
(参见:大语言模型)
仅限本周:买一送一,第二张半价
你的下一轮融资。你的下一个人才。你的下一次突破性机遇。在TechCrunch Disrupt 2026大会上找到它,届时将有超过10,000名创始人、投资者和科技领袖齐聚一堂,参加为期三天的250多场战术研讨会、强力引荐和市场定义级别的创新活动。在5月8日前注册,即可半价携带一位同伴参加。
仅限本周:买一送一,第二张半价
你的下一轮融资。你的下一个人才。你的下一次突破性机遇。在TechCrunch Disrupt 2026大会上找到它,届时将有超过10,000名创始人、投资者和科技领袖齐聚一堂,参加为期三天的250多场战术研讨会、强力引荐和市场定义级别的创新活动。在5月8日前注册,即可半价携带一位同伴参加。
编码Agent
这是一个比“AI Agent”更具体的概念,指的是一个能够自主、逐步采取行动以完成目标的程序。编码Agent是应用于软件开发的专门版本。它不仅仅建议代码以供人类审查和粘贴,还可以自主编写、测试和调试代码,处理那些通常会占用开发者一天时间的迭代式试错工作。这些Agent能够跨越整个代码库运行,发现错误、运行测试并在最少的人工监督下推送修复。可以把它想象成雇佣了一个速度极快、从不睡觉且从不分心的实习生——不过,和任何实习生一样,仍然需要人类来审查其工作。
算力
虽然算力是一个多义术语,但它通常指的是使AI模型能够运行的关键计算能力。这种类型的处理过程为AI行业提供动力,使其能够训练和部署其强大的模型。该术语常常是提供计算能力的硬件类型的简称——比如GPU、CPU、TPU以及其他构成现代AI行业基石的基础设施形式。
深度学习
这是一种自我改进的机器学习子集,其中AI算法采用多层人工神经网络结构设计。这使得它们能够比简单的基于机器学习的系统(如线性模型或决策树)建立更复杂的关联。深度学习算法的结构借鉴了人脑中神经元相互连接的路径。
深度学习AI模型能够自行识别数据中的重要特征,而无需人类工程师来定义这些特征。这种结构还支持算法能够从错误中学习,并通过重复和调整的过程来改进自身的输出。然而,深度学习系统需要大量的数据点(数百万或更多)才能产生良好的结果。它们通常也比更简单的机器学习算法需要更长的训练时间——因此开发成本往往更高。
(参见:神经网络)
扩散
扩散是许多艺术、音乐和文本生成AI模型的核心技术。受物理学启发,扩散系统通过添加噪声,直到数据(例如照片、歌曲等)的结构被“破坏”殆尽,直至消失。在物理学中,扩散是自发且不可逆的——扩散到咖啡中的糖无法恢复成方块形状。但AI中的扩散系统旨在学习一种“反向扩散”过程来恢复被破坏的数据,从而获得从噪声中恢复数据的能力。
蒸馏
蒸馏是一种使用“教师-学生”模型从大型AI模型中提取知识的技术。开发人员向教师模型发送请求并记录输出。有时会将答案与数据集进行比较以检验其准确性。然后使用这些输出来训练学生模型,该模型被训练成近似教师模型的行为。
蒸馏可用于基于大型模型创建一个更小、更高效的模型,同时保持最小的蒸馏损失。这很可能就是OpenAI开发GPT-4 Turbo(GPT-4的更快版本)的方式。
虽然所有AI公司都在内部使用蒸馏,但一些AI公司也可能用它来追赶前沿模型。从竞争对手处进行蒸馏通常违反AI API和聊天助手的服务条款。
微调
这指的是对AI模型进行进一步训练,以优化其在某些更具体任务或领域上的表现,而这些任务或领域之前并非其训练的重点——通常是通过输入新的、专门化(即面向特定任务)的数据。
许多AI初创公司以大语言模型为起点来构建商业产品,但它们正通过基于自身领域知识和专长进行微调来补充早期的训练周期,以争夺在目标领域或任务上的实用性提升。
(参见:大语言模型 [LLM])
GAN
GAN,或称生成对抗网络,是一种机器学习框架,是生成式AI在生成逼真数据(包括但不限于深度伪造工具)方面取得一些重要进展的基础。GAN使用一对神经网络,其中一个网络利用其训练数据生成输出,然后传递给另一个模型进行评估。
这两个模型本质上被编程为试图超越对方。生成器试图使其输出通过鉴别器,而鉴别器则致力于识别人工生成的数据。这种结构化竞赛可以优化AI输出,使其更加逼真,而无需额外的人工干预。尽管GAN在较窄的应用场景(例如生成逼真的照片或视频)中表现最佳,而非通用型AI。
幻觉
幻觉是AI行业用来描述AI模型编造信息——即生成不正确信息——的偏好的术语。显然,这对AI质量来说是一个巨大的问题。
幻觉会产生可能具有误导性的生成式AI输出,甚至可能导致现实生活中的风险——并带来潜在的危险后果(想象一下,一个健康查询返回了有害的医疗建议)。
AI编造信息的问题被认为源于训练数据的空白。幻觉正推动着向日益专业化和/或垂直化AI模型(即需要更窄领域知识的特定领域AI)的趋势发展,以此作为减少知识空白可能性并降低虚假信息风险的一种方式。
推理
推理是运行AI模型的过程。它是让模型基于之前见过的数据进行预测或得出结论。需要明确的是,没有训练就无法进行推理;模型必须先学习一组数据中的模式,然后才能有效地从这些训练数据中进行推断。
多种硬件都可以执行推理,从智能手机处理器到强大的GPU,再到定制设计的AI加速器。但并非所有硬件都能同样出色地运行模型。非常大的模型在笔记本电脑上做预测可能需要很长时间,而在配备高端AI芯片的云服务器上则快得多。
[参见:训练]
大语言模型(LLM)
大语言模型,或称LLM,是诸如ChatGPT、Claude、谷歌的Gemini、Meta的AI Llama、微软Copilot或Mistral的Le Chat等流行AI助手所使用的AI模型。当你与AI助手聊天时,你实际上是在与大语言模型交互,该模型直接处理你的请求,或借助各种可用工具(如网页浏览或代码解释器)进行处理。
LLM是深度神经网络,由数十亿个数值参数(或称权重,见下文)构成,这些参数学习单词和短语之间的关系,并创建语言的表征,一种多维的单词地图。
这些模型是通过编码它们在数十亿本书籍、文章和转录文本中发现的模式来创建的。当你向LLM发出提示时,模型会生成最符合该提示的模式。
(参见:神经网络)
内存缓存
内存缓存指的是提升推理(即AI运作并生成用户查询响应的过程)效率的一个重要过程。本质上,缓存是一种优化技术,旨在使推理更高效。AI显然是由高强度的数学计算驱动的,每次进行计算都会消耗更多能量。缓存旨在通过为未来的用户查询和操作保存特定的计算结果,来减少模型可能需要运行的计算次数。内存缓存有不同的类型,其中一种比较知名的是KV(键值)缓存。KV缓存适用于基于Transformer的模型,通过减少生成用户问题答案所需的时间(和算法工作量)来提高效率,从而更快地得出结果。
(参见:推理)
神经网络
神经网络指的是支撑深度学习——以及更广泛地说,支撑大语言模型出现后整个生成式AI工具热潮——的多层算法结构。
尽管从人脑密集互连的路径中汲取灵感作为数据处理算法设计结构的想法可以追溯到20世纪40年代,但真正释放这一理论力量的是通过视频游戏行业兴起的图形处理硬件(GPU)。事实证明,这些芯片非常适合训练具有比早期时代更多层数的算法——使得基于神经网络的AI系统能够在许多领域实现远胜于此前的性能,包括语音识别、自主导航和药物发现。
(参见:大语言模型 [LLM])
开源
开源指的是软件(或者,越来越常见的是AI模型)的底层代码公开发布,供任何人使用、检查或修改。在AI领域,Meta的Llama系列模型是一个突出的例子;Linux是操作系统领域中著名的历史类比。开源方法允许世界各地的研究人员、开发者和公司在彼此的工作基础上进行构建,加速进步,并实现封闭系统难以轻易提供的独立安全审计。闭源意味着代码是私有的——你可以使用产品,但看不到其运作方式,例如OpenAI的GPT模型——这一区别已成为AI行业中最具定义性的辩论之一。
并行化
并行化意味着同时做许多事情,而不是一件接一件地做——就像让10名员工同时处理一个项目的不同部分,而不是让一名员工依次完成所有工作。在AI领域,并行化对于训练和推理都至关重要:现代GPU专门设计用于并行执行数千次计算,这是它们成为该行业硬件骨干的重要原因之一。随着AI系统变得越来越复杂,模型变得越来越大,跨多个芯片和多台机器并行化工作的能力已成为决定模型构建和部署速度及成本效益的最重要因素之一。对更好并行化策略的研究现在本身已成为一个研究领域。
内存危机
内存危机是一个有趣的新术语,用来描述席卷科技行业的一个不那么有趣的趋势:随机存取存储器芯片的日益短缺,而正是这些芯片为我们日常生活中使用的几乎所有科技产品提供动力。随着AI行业的蓬勃发展,最大的科技公司和AI实验室——都在争相拥有最强大、最高效的AI——正在购买大量内存来为其数据中心供电,以至于留给其他人的内存所剩无几。而这种供应瓶颈意味着剩下的内存越来越昂贵。
这会影响到包括游戏行业(大型公司不得不提高游戏机价格,因为为其设备寻找内存芯片更加困难)、消费电子行业(内存短缺可能导致智能手机出货量出现十多年来的最大跌幅)以及一般企业计算领域(因为这些公司无法为自己的数据中心获得足够的内存)在内的诸多行业。这种价格飙升预计只有在可怕的短缺结束后才会停止,但不幸的是,目前几乎没有迹象表明这种情况会很快发生。
强化学习
强化学习是一种训练AI的方式,系统通过尝试并因正确答案而获得奖励来学习——就像用零食训练你心爱的宠物一样,只不过这里的“宠物”是神经网络,“零食”是指示成功的数学信号。与监督学习(模型在固定的标记示例数据集上训练)不同,强化学习允许模型探索其环境、采取行动,并根据收到的反馈持续更新其行为。这种方法已被证明对于训练AI玩游戏、控制机器人以及最近增强大语言模型的推理能力特别强大。诸如基于人类反馈的强化学习(RLHF)等技术,现已成为领先AI实验室微调其模型以使其更有用、更准确和更安全的核心方法。
Token
在人机通信方面,存在一些明显的挑战——人们使用人类语言交流,而AI程序通过复杂的、由数据驱动的算法过程来执行任务。Token弥合了这一差距:它们是人与AI通信的基本构建块,代表由LLM处理或生成的离散数据段。它们通过一个称为分词化的过程创建,该过程将原始文本分解成语言模型可以消化的小单元,类似于编译器将人类语言翻译成计算机可以理解的二进制代码。在企业环境中,Token也决定了成本——大多数AI公司按Token数量收取LLM使用费,这意味着企业使用得越多,支付的费用就越高。
Token吞吐量
再次说明,Token是AI语言模型在处理语言之前将其分解成的小文本块——通常是单词的一部分而非整个单词;为了理解AI工作负载,它们大致类似于“单词”。吞吐量指的是在给定时间段内可以处理多少数据,因此Token吞吐量本质上衡量的是系统一次可以处理多少AI工作。高Token吞吐量是AI基础设施团队的一个关键目标,因为它决定了模型可以同时服务多少用户以及每个用户收到响应的速度。AI研究员Andrej Karpathy曾描述过当他的AI订阅闲置时感到焦虑——这反映了他作为研究生时,当昂贵的计算机硬件未被充分利用时的感受——这种情绪揭示了为什么最大化Token吞吐量已成为该领域近乎痴迷的目标。
训练
开发机器学习AI涉及一个称为训练的过程。简单来说,这指的是为了模型能够从模式中学习并生成有用输出而向其输入数据。本质上,这是系统对数据特征做出响应的过程,使其能够根据所需目标调整输出——无论是识别猫的图像还是按需创作俳句。
训练成本可能很高,因为它需要大量的输入,并且所需的量呈上升趋势——这就是为什么混合方法(例如使用针对性数据微调基于规则的AI)有助于在不从头开始的情况下管理成本。
[参见:推理]
迁移学习
这是一种技术,利用之前训练好的AI模型作为起点,为另一项不同但通常相关的任务开发新模型——从而允许重新应用先前训练周期中获得的知识。
迁移学习可以通过缩短模型开发周期来提高效率。当为模型开发目标任务的数据有限时,它也会非常有用。但需要注意的是,这种方法有其局限性。依赖于迁移学习来获得通用能力的模型,可能需要在额外的数据上进行训练,才能在其专注领域内有良好表现。
(参见:微调)
权重
权重是AI训练的核心,因为它们决定了在训练系统的数据中,不同特征(或输入变量)被赋予多大的重要性(或权重)——从而塑造AI模型的输出。
换句话说,权重是定义在给定训练任务中数据集中哪些特征最突出的数值参数。它们通过对输入进行乘法运算来实现其功能。模型训练通常从随机分配的权重开始,但随着过程的展开,权重会随着模型试图产生更接近目标的输出而进行调整。
例如,一个用于预测房价的AI模型,在目标位置的历史房地产数据上进行训练,可能包括对卧室和浴室数量、房产是独立式还是半独立式、是否有停车位、车库等特征的权重。
最终,模型赋予每个输入特征的权重,反映了基于给定数据集,该特征对房产价值的影响程度。
验证损失
验证损失是一个数字,告诉你AI模型在训练期间的学习效果如何——数值越低越好。研究人员将其作为一份实时的“成绩单”密切关注,用它来决定何时停止训练、何时调整超参数,或者是否要调查潜在的问题。它帮助标记出的一个关键问题是过拟合,即模型死记硬背训练数据,而不是真正学习能够泛化到新情况的模式。可以把它想象成一个真正理解材料的优等生与一个仅仅记住了去年考试答案的差生之间的区别——验证损失有助于揭示你的模型正变成哪一种。
本文会定期更新新信息。
英文来源:
Artificial intelligence is changing the world, and simultaneously inventing a whole new language to describe how it’s doing it. Spend five minutes reading about AI and you’ll run into LLMs, RAG, RLHF, and a dozen other terms that can make even very smart people in the tech world feel insecure. This glossary is our attempt to fix that. We update it regularly as the field evolves, so consider it a living document, much like the AI systems it describes.
AGI
Artificial general intelligence, or AGI, is a nebulous term. But it generally refers to AI that’s more capable than the average human at many, if not most, tasks. OpenAI CEO Sam Altman once described AGI as the “equivalent of a median human that you could hire as a co-worker.” Meanwhile, OpenAI’s charter defines AGI as “highly autonomous systems that outperform humans at most economically valuable work.” Google DeepMind’s understanding differs slightly from these two definitions; the lab views AGI as “AI that’s at least as capable as humans at most cognitive tasks.” Confused? Not to worry — so are experts at the forefront of AI research.
AI agent
An AI agent refers to a tool that uses AI technologies to perform a series of tasks on your behalf — beyond what a more basic AI chatbot could do — such as filing expenses, booking tickets or a table at a restaurant, or even writing and maintaining code. However, as we’ve explained before, there are lots of moving pieces in this emergent space, so “AI agent” might mean different things to different people. Infrastructure is also still being built out to deliver on its envisaged capabilities. But the basic concept implies an autonomous system that may draw on multiple AI systems to carry out multistep tasks.
API endpoints
Think of API endpoints as “buttons” on the back of a piece of software that other programs can press to make it do things. Developers use these interfaces to build integrations — for example, allowing one application to pull data from another, or enabling an AI agent to control third-party services directly without a human manually operating each interface. Most smart home devices and connected platforms have these hidden buttons available, even if ordinary users never see or interact with them. As AI agents grow more capable, they are increasingly able to find and use these endpoints on their own, opening up powerful — and sometimes unexpected — possibilities for automation.
Chain of thought
Given a simple question, a human brain can answer without even thinking too much about it — things like “which animal is taller, a giraffe or a cat?” But in many cases, you often need a pen and paper to come up with the right answer because there are intermediary steps. For instance, if a farmer has chickens and cows, and together they have 40 heads and 120 legs, you might need to write down a simple equation to come up with the answer (20 chickens and 20 cows).
In an AI context, chain-of-thought reasoning for large language models means breaking down a problem into smaller, intermediate steps to improve the quality of the end result. It usually takes longer to get an answer, but the answer is more likely to be correct, especially in a logic or coding context. Reasoning models are developed from traditional large language models and optimized for chain-of-thought thinking thanks to reinforcement learning.
(See: Large language model)
This Week Only: Buy one pass, get the second at 50% off
Your next round. Your next hire. Your next breakout opportunity. Find it at TechCrunch Disrupt 2026, where 10,000+ founders, investors, and tech leaders gather for three days of 250+ tactical sessions, powerful introductions, and market-defining innovation. Register before May 8 to bring a +1 at half the cost.
This Week Only: Buy one pass, get the second at 50% off
Your next round. Your next hire. Your next breakout opportunity. Find it at TechCrunch Disrupt 2026, where 10,000+ founders, investors, and tech leaders gather for three days of 250+ tactical sessions, powerful introductions, and market-defining innovation. Register before May 8 to bring a +1 at half the cost.
Coding agents
This is a more specific concept that an “AI agent,” which means a program that can take actions on its own, step by step, to complete a goal. A coding agent is a specialized version applied to software development. Rather than simply suggesting code for a human to review and paste in, a coding agent can write, test, and debug code autonomously, handling the kind of iterative, trial-and-error work that typically consumes a developer’s day. These agents can operate across entire codebases, spotting bugs, running tests, and pushing fixes with minimal human oversight. Think of it like hiring a very fast intern who never sleeps and never loses focus — though, as with any intern, a human still needs to review the work.
Compute
Although somewhat of a multivalent term, compute generally refers to the vital computational power that allows AI models to operate. This type of processing fuels the AI industry, giving it the ability to train and deploy its powerful models. The term is often a shorthand for the kinds of hardware that provides the computational power — things like GPUs, CPUs, TPUs, and other forms of infrastructure that form the bedrock of the modern AI industry.
Deep learning
A subset of self-improving machine learning in which AI algorithms are designed with a multi-layered, artificial neural network (ANN) structure. This allows them to make more complex correlations compared to simpler machine learning-based systems, such as linear models or decision trees. The structure of deep learning algorithms draws inspiration from the interconnected pathways of neurons in the human brain.
Deep learning AI models are able to identify important characteristics in data themselves, rather than requiring human engineers to define these features. The structure also supports algorithms that can learn from errors and, through a process of repetition and adjustment, improve their own outputs. However, deep learning systems require a lot of data points to yield good results (millions or more). They also typically take longer to train compared to simpler machine learning algorithms — so development costs tend to be higher.
(See: Neural network)
Diffusion
Diffusion is the tech at the heart of many art-, music-, and text-generating AI models. Inspired by physics, diffusion systems slowly “destroy” the structure of data — for example, photos, songs, and so on — by adding noise until there’s nothing left. In physics, diffusion is spontaneous and irreversible — sugar diffused in coffee can’t be restored to cube form. But diffusion systems in AI aim to learn a sort of “reverse diffusion” process to restore the destroyed data, gaining the ability to recover the data from noise.
Distillation
Distillation is a technique used to extract knowledge from a large AI model with a ‘teacher-student’ model. Developers send requests to a teacher model and record the outputs. Answers are sometimes compared with a dataset to see how accurate they are. These outputs are then used to train the student model, which is trained to approximate the teacher’s behavior.
Distillation can be used to create a smaller, more efficient model based on a larger model with a minimal distillation loss. This is likely how OpenAI developed GPT-4 Turbo, a faster version of GPT-4.
While all AI companies use distillation internally, it may have also been used by some AI companies to catch up with frontier models. Distillation from a competitor usually violates the terms of service of AI API and chat assistants.
Fine-tuning
This refers to the further training of an AI model to optimize performance for a more specific task or area than was previously a focal point of its training — typically by feeding in new, specialized (i.e., task-oriented) data.
Many AI startups are taking large language models as a starting point to build a commercial product but are vying to amp up utility for a target sector or task by supplementing earlier training cycles with fine-tuning based on their own domain-specific knowledge and expertise.
(See: Large language model [LLM])
GAN
A GAN, or Generative Adversarial Network, is a type of machine learning framework that underpins some important developments in generative AI when it comes to producing realistic data — including (but not only) deepfake tools. GANs involve the use of a pair of neural networks, one of which draws on its training data to generate an output that is passed to the other model to evaluate.
The two models are essentially programmed to try to outdo each other. The generator is trying to get its output past the discriminator, while the discriminator is working to spot artificially generated data. This structured contest can optimize AI outputs to be more realistic without the need for additional human intervention. Though GANs work best for narrower applications (such as producing realistic photos or videos), rather than general purpose AI.
Hallucination
Hallucination is the AI industry’s preferred term for AI models making stuff up – literally generating information that is incorrect. Obviously, it’s a huge problem for AI quality.
Hallucinations produce GenAI outputs that can be misleading and could even lead to real-life risks — with potentially dangerous consequences (think of a health query that returns harmful medical advice).
The problem of AIs fabricating information is thought to arise as a consequence of gaps in training data. Hallucinations are contributing to a push toward increasingly specialized and/or vertical AI models — i.e. domain-specific AIs that require narrower expertise – as a way to reduce the likelihood of knowledge gaps and shrink disinformation risks.
Inference
Inference is the process of running an AI model. It’s setting a model loose to make predictions or draw conclusions from previously seen data. To be clear, inference can’t happen without training; a model must learn patterns in a set of data before it can effectively extrapolate from this training data.
Many types of hardware can perform inference, ranging from smartphone processors to beefy GPUs to custom-designed AI accelerators. But not all of them can run models equally well. Very large models would take ages to make predictions on, say, a laptop versus a cloud server with high-end AI chips.
[See: Training]
Large language model (LLM)
Large language models, or LLMs, are the AI models used by popular AI assistants, such as ChatGPT, Claude, Google’s Gemini, Meta’s AI Llama, Microsoft Copilot, or Mistral’s Le Chat. When you chat with an AI assistant, you interact with a large language model that processes your request directly or with the help of different available tools, such as web browsing or code interpreters.
LLMs are deep neural networks made of billions of numerical parameters (or weights, see below) that learn the relationships between words and phrases and create a representation of language, a sort of multidimensional map of words.
These models are created from encoding the patterns they find in billions of books, articles, and transcripts. When you prompt an LLM, the model generates the most likely pattern that fits the prompt.
(See: Neural network)
Memory cache
Memory cache refers to an important process that boosts inference (which is the process by which AI works to generate a response to a user’s query). In essence, caching is an optimization technique, designed to make inference more efficient. AI is obviously driven by high-octane mathematical calculations and every time those calculations are made, they use up more power. Caching is designed to cut down on the number of calculations a model might have to run by saving particular calculations for future user queries and operations. There are different kinds of memory caching, although one of the more well-known is KV (or key value) caching. KV caching works in transformer-based models, and increases efficiency, driving faster results by reducing the amount of time (and algorithmic labor) it takes to generate answers to user questions.
(See: Inference)
Neural network
A neural network refers to the multi-layered algorithmic structure that underpins deep learning — and, more broadly, the whole boom in generative AI tools following the emergence of large language models.
Although the idea of taking inspiration from the densely interconnected pathways of the human brain as a design structure for data processing algorithms dates all the way back to the 1940s, it was the much more recent rise of graphical processing hardware (GPUs) — via the video game industry — that really unlocked the power of this theory. These chips proved well suited to training algorithms with many more layers than was possible in earlier epochs — enabling neural network-based AI systems to achieve far better performance across many domains, including voice recognition, autonomous navigation, and drug discovery.
(See: Large language model [LLM])
Open source
Open source refers to software — or, increasingly, AI models — where the underlying code is made publicly available for anyone to use, inspect, or modify. In the AI world, Meta’s Llama family of models is a prominent example; Linux is the famous historical parallel in operating systems. Open source approaches allow researchers, developers, and companies around the world to build on top of one another’s work, accelerating progress and enabling independent safety audits that closed systems cannot easily provide. Closed source means the code is private — you can use the product but not see how it works, as is the case with OpenAI’s GPT models — a distinction that has become one of the defining debates in the AI industry.
Parallelization
Parallelization means doing many things at the same time instead of one after another — like having 10 employees working on different parts of a project at the same time instead of one employee doing everything sequentially. In AI, parallelization is fundamental to both training and inference: modern GPUs are specifically designed to perform thousands of calculations in parallel, which is a big reason why they became the hardware backbone of the industry. As AI systems grow more complex and models grow larger, the ability to parallelize work across many chips and many machines has become one of the most important factors in determining how quickly and cost-effectively models can be built and deployed. Research into better parallelization strategies is now a field of study in its own right.
RAMageddon
RAMageddon is the fun new term for a not-so-fun trend that is sweeping the tech industry: an ever-increasing shortage of random access memory, or RAM chips, which power pretty much all the tech products we use in our daily lives. As the AI industry has blossomed, the biggest tech companies and AI labs — all vying to have the most powerful and efficient AI — are buying so much RAM to power their data centers that there’s not much left for the rest of us. And that supply bottleneck means that what’s left is getting more and more expensive.
That includes industries like gaming (where major companies have had to raise prices on consoles because it’s harder to find memory chips for their devices), consumer electronics (where memory shortage could cause the biggest dip in smartphone shipments in more than a decade), and general enterprise computing (because those companies can’t get enough RAM for their own data centers). The surge in prices is only expected to stop after the dreaded shortage ends but, unfortunately, there’s not really much of a sign that’s going to happen anytime soon.
Reinforcement learning
Reinforcement learning is a way of training AI where a system learns by trying things and receiving rewards for correct answers — like training your beloved pet with treats, except the “pet” in this scenario is a neural network and the “treat” is a mathematical signal indicating success. Unlike supervised learning, where a model is trained on a fixed dataset of labeled examples, reinforcement learning lets a model explore its environment, take actions, and continuously update its behavior based on the feedback it receives. This approach has proven especially powerful for training AI to play games, control robots, and, more recently, sharpen the reasoning ability of large language models. Techniques like reinforcement learning from human feedback, or RLHF, are now central to how leading AI labs fine-tune their models to be more helpful, accurate, and safe.
Token
When it comes to human-machine communication, there are some obvious challenges — people communicate using human language, while AI programs execute tasks through complex algorithmic processes informed by data. Tokens bridge that gap: they are the basic building blocks of human-AI communication, representing discrete segments of data that have been processed or produced by an LLM. They are created through a process called tokenization, which breaks down raw text into bite-sized units a language model can digest, similar to how a compiler translates human language into binary code a computer can understand. In enterprise settings, tokens also determine cost — most AI companies charge for LLM usage on a per-token basis, meaning the more a business uses, the more it pays.
Token throughput
So again, tokens are the small chunks of text — often parts of words rather than whole ones — that AI language models break language into before processing it; they are roughly analogous to “words” for the purposes of understanding AI workloads. Throughput refers to how much can be processed in a given period of time, so token throughput is essentially a measure of how much AI work a system can handle at once. High token throughput is a key goal for AI infrastructure teams, since it determines how many users a model can serve simultaneously and how quickly each of them receives a response. AI researcher Andrej Karpathy has described feeling anxious when his AI subscriptions sit idle — echoing the feeling he had as a grad student when expensive computer hardware wasn’t being fully utilized — a sentiment that captures why maximizing token throughput has become something of an obsession in the field.
Training
Developing machine learning AIs involves a process known as training. In simple terms, this refers to data being fed in in order that the model can learn from patterns and generate useful outputs. Essentially, it’s the process of the system responding to characteristics in the data that enables it to adapt outputs towards a sought-for goal — whether that’s identifying images of cats or producing a haiku on demand.
Training can be expensive because it requires lots of inputs, and the volumes required have been trending upwards — which is why hybrid approaches, such as fine-tuning a rules-based AI with targeted data, can help manage costs without starting entirely from scratch.
[See: Inference]
Transfer learning
A technique where a previously trained AI model is used as the starting point for developing a new model for a different but typically related task – allowing knowledge gained in previous training cycles to be reapplied.
Transfer learning can drive efficiency savings by shortcutting model development. It can also be useful when data for the task that the model is being developed for is somewhat limited. But it’s important to note that the approach has limitations. Models that rely on transfer learning to gain generalized capabilities will likely require training on additional data in order to perform well in their domain of focus
(See: Fine tuning)
Weights
Weights are core to AI training, as they determine how much importance (or weight) is given to different features (or input variables) in the data used for training the system — thereby shaping the AI model’s output.
Put another way, weights are numerical parameters that define what’s most salient in a dataset for the given training task. They achieve their function by applying multiplication to inputs. Model training typically begins with weights that are randomly assigned, but as the process unfolds, the weights adjust as the model seeks to arrive at an output that more closely matches the target.
For example, an AI model for predicting housing prices that’s trained on historical real estate data for a target location could include weights for features such as the number of bedrooms and bathrooms, whether a property is detached or semi-detached, whether it has parking, a garage, and so on.
Ultimately, the weights the model attaches to each of these inputs reflect how much they influence the value of a property, based on the given dataset.
Validation loss
Validation loss is a number that tells you how well an AI model is learning during training — and lower is better. Researchers track it closely as a kind of real-time report card, using it to decide when to stop training, when to adjust hyperparameters, or whether to investigate a potential problem. One of the key concerns it helps flag is overfitting, a condition in which a model memorizes its training data rather than truly learning patterns it can generalize to new situations. Think of it as the difference between a student who genuinely understands the material and one who simply memorized last year’s exam — validation loss helps reveal which one your model is becoming.
This article is updated regularly with new information.
文章标题:你听过这些AI术语并跟着点头,现在我们来搞清楚它们到底是什么。
文章链接:https://news.qimuai.cn/?post=4040
本站文章均为原创,未经授权请勿用于任何商业用途