意图识别与提示词生成
本文档详细介绍了校园领域搜索问答系统中的意图识别与提示词生成功能模块的设计与实现。该模块旨在通过自然语言处理技术识别用户的查询意图,并基于意图生成相应的提示词,提升系统的响应准确性和用户体验。本文将涵盖功能概述、接口设计、数据库表结构、控制器实现以及服务层实现。
功能概述
意图识别与提示词生成模块主要负责以下功能:
- 意图识别:通过分析用户输入的查询内容,识别其所属的意图类别。
- 提示词生成:根据识别出的意图,生成相应的提示词,指导系统提供更精准的回答。
- 学校名称获取:在意图识别过程中,获取用户所属的学校名称,以便生成针对性的提示词。
接口设计
1. 意图识别与提示词生成接口
请求方法:
POST
接口路径:
/api/v1/chat/intent
请求参数:
参数 类型 描述 是否必填 school_id Long 学校 ID 是 query_text String 用户的查询内容 是 请求示例:
POST /api/v1/chat/intent Content-Type: application/json { "school_id": 1, "query_text": "如何注册课程,以及注册的截止日期和程序是什么?" }
响应格式:
{ "data": { "intent_id": 1, "intent_name": "course_registration_and_enrollment", "prompt": "As the intelligent assistant named Spartan, You are dedicated to helping students at San Jose State University (SJSU) with all their questions. Your role includes addressing inquiries specifically related to the university, such as academic programs, campus events, and administrative procedures. Your questions mainly come from the category course registration and enrollment, which means that You need to provide detailed step-by-step instructions to guide students through course enrollment in the corresponding university. List all available class selection choices and inform students about all possible situations they might encounter during the enrollment process." }, "code": 1, "msg": null, "ok": true }
2. 向量获取接口
请求方法:
GET
接口路径:
/api/v1/chat/vector
请求参数:
参数 类型 描述 是否必填 text String 需要获取向量的文本 是 model String 使用的模型名称 是 请求示例:
GET /api/v1/chat/vector?text=如何注册课程&model=text_embedding_3_large
响应格式:
{ "data": { "embedding": [0.1, 0.2, 0.3, ...], "model": "text_embedding_3_large" }, "code": 1, "msg": null, "ok": true }
3. llm_vector_embedding
(向量嵌入)
用于存储文本的向量表示,支持高效的相似性搜索和意图识别。
CREATE TABLE "llm_vector_embedding" (
"id" BIGINT PRIMARY KEY,
"t" VARCHAR,
"v" VECTOR,
"m" VARCHAR,
"md5" VARCHAR
);
CREATE INDEX "idx_llm_vector_embedding_md5" ON "llm_vector_embedding" USING btree (
"md5" TEXT ASC NULLS LAST
);
CREATE INDEX "idx_llm_vector_embedding_md5_m" ON "llm_vector_embedding" USING btree (
"md5" TEXT ASC NULLS LAST,
"m" TEXT ASC NULLS LAST
);
控制器实现
ApiChatHandler
类负责处理与意图识别和提示词生成相关的 HTTP 请求。以下是该类的实现代码及说明:
package com.litongjava.llm.handler;
import com.jfinal.kit.Kv;
import com.litongjava.db.TableResult;
import com.litongjava.jfinal.aop.Aop;
import com.litongjava.llm.service.SearchPromptService;
import com.litongjava.llm.service.VectorService;
import com.litongjava.llm.dao.SchoolDictDao;
import com.litongjava.model.body.RespBodyVo;
import com.litongjava.model.page.Page;
import com.litongjava.tio.boot.http.TioRequestContext;
import com.litongjava.tio.http.common.HttpRequest;
import com.litongjava.tio.http.common.HttpResponse;
import com.litongjava.tio.http.server.util.CORSUtils;
import com.litongjava.tio.utils.json.FastJson2Utils;
import com.litongjava.tio.utils.hutool.StrUtil;
import lombok.extern.slf4j.Slf4j;
import org.postgresql.util.PGobject;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import com.alibaba.fastjson2.JSONObject;
@Slf4j
public class ApiChatHandler {
/**
* 意图识别与提示词生成
*
* @param request HTTP请求对象
* @return HTTP响应对象
*/
public HttpResponse recognizeIntent(HttpRequest request) {
HttpResponse response = TioRequestContext.getResponse();
CORSUtils.enableCORS(response);
Long schoolId = request.getLong("school_id");
String queryText = request.getParam("query_text");
if (schoolId == null) {
return response.fail(RespBodyVo.fail("school_id cannot be empty"));
}
if (StrUtil.isBlank(queryText)) {
return response.fail(RespBodyVo.fail("query_text cannot be empty"));
}
SearchPromptService searchPromptService = Aop.get(SearchPromptService.class);
String prompt = searchPromptService.index(schoolId, queryText, null);
Map<String, String> data = new HashMap<>();
data.put("prompt", prompt);
return response.setJson(RespBodyVo.ok(data));
}
/**
* 获取文本向量
*
* @param request HTTP请求对象
* @return HTTP响应对象
*/
public HttpResponse getVector(HttpRequest request) {
HttpResponse response = TioRequestContext.getResponse();
CORSUtils.enableCORS(response);
String text = request.getParam("text");
String model = request.getParam("model");
if (StrUtil.isBlank(text)) {
return response.fail(RespBodyVo.fail("text cannot be empty"));
}
if (StrUtil.isBlank(model)) {
return response.fail(RespBodyVo.fail("model cannot be empty"));
}
VectorService vectorService = Aop.get(VectorService.class);
EmbeddingResponseVo embedding = vectorService.getVector(text, model);
Map<String, Object> data = new HashMap<>();
data.put("embedding", embedding.getData().get(0).getEmbedding());
data.put("model", embedding.getModel());
return response.setJson(RespBodyVo.ok(data));
}
}
控制器代码说明
recognizeIntent:
- 处理
/api/v1/chat/intent
的POST
请求。 - 从请求中获取
school_id
和query_text
参数,并进行有效性校验。 - 调用服务层
SearchPromptService.index
方法进行意图识别与提示词生成。 - 将生成的提示词封装在响应中返回。
- 处理
getVector:
- 处理
/api/v1/chat/vector
的GET
请求。 - 从请求中获取
text
和model
参数,并进行有效性校验。 - 调用服务层
VectorService.getVector
方法获取文本的向量表示。 - 将向量数据封装在响应中返回。
- 处理
服务层实现
1. VectorService
(向量服务)
负责与向量嵌入相关的操作,包括获取文本向量和存储向量数据。
package com.litongjava.llm.service;
import java.util.Arrays;
import org.postgresql.util.PGobject;
import com.litongjava.db.activerecord.Db;
import com.litongjava.db.activerecord.Row;
import com.litongjava.db.utils.PgVectorUtils;
import com.litongjava.llm.constants.TableNames;
import com.litongjava.openai.client.OpenAiClient;
import com.litongjava.openai.constants.OpenAiModels;
import com.litongjava.openai.embedding.EmbeddingResponseVo;
import com.litongjava.openai.utils.EmbeddingVectorUtils;
import com.litongjava.tio.utils.crypto.Md5Utils;
import com.litongjava.tio.utils.snowflake.SnowflakeIdUtils;
import com.litongjava.tio.utils.thread.TioThreadUtils;
public class VectorService {
private final Object vectorLock = new Object();
private final Object writeLock = new Object();
/**
* 获取文本向量
*
* @param text 需要获取向量的文本
* @param model 使用的模型名称
* @return 向量表示
*/
public synchronized EmbeddingResponseVo getVector(String text, String model) {
String md5 = Md5Utils.getMD5(text);
String sql = String.format("SELECT v FROM %s WHERE md5=? AND m=?", TableNames.llm_vector_embedding);
PGobject pgObject = Db.queryFirst(sql, md5, model);
if (pgObject != null) {
String value = pgObject.getValue();
float[] floats = EmbeddingVectorUtils.toFloats(value);
return buildEmbeddingResponse(model, floats);
} else {
EmbeddingResponseVo embeddingResponse = fetchAndStoreVector(text, model, md5);
return embeddingResponse;
}
}
/**
* 构建嵌入响应对象
*
* @param model 使用的模型
* @param vectors 向量数据
* @return 嵌入响应对象
*/
private EmbeddingResponseVo buildEmbeddingResponse(String model, float[] vectors) {
EmbeddingResponseVo embeddingResponse = new EmbeddingResponseVo();
embeddingResponse.setModel(model);
EmbeddingData data = new EmbeddingData();
data.setEmbedding(vectors);
embeddingResponse.setData(Arrays.asList(data));
return embeddingResponse;
}
/**
* 获取向量并存储到数据库
*
* @param text 需要获取向量的文本
* @param model 使用的模型名称
* @param md5 文本的MD5值
* @return 嵌入响应对象
*/
private EmbeddingResponseVo fetchAndStoreVector(String text, String model, String md5) {
EmbeddingRequestVo embeddingRequest = new EmbeddingRequestVo(text, model);
EmbeddingResponseVo embeddingResponse;
synchronized (vectorLock) {
embeddingResponse = OpenAiClient.embeddings(embeddingRequest);
}
float[] embeddingArray = embeddingResponse.getData().get(0).getEmbedding();
String vectorString = Arrays.toString(embeddingArray);
TioThreadUtils.submit(() -> {
long id = SnowflakeIdUtils.id();
PGobject pgVector = PgVectorUtils.getPgVector(vectorString);
Row saveRecord = Row.by("id", id)
.set("t", text)
.set("v", pgVector)
.set("md5", md5)
.set("m", model)
.setTableName(TableNames.llm_vector_embedding);
synchronized (writeLock) {
Db.save(saveRecord);
}
});
return embeddingResponse;
}
}
2. SchoolDictDao
(学校字典数据访问对象)
用于获取学校相关信息,支持系统的多学校环境配置。
package com.litongjava.llm.dao;
import com.litongjava.db.activerecord.Db;
import com.litongjava.db.activerecord.Row;
import com.litongjava.llm.vo.SchoolDict;
public class SchoolDictDao {
/**
* 根据ID获取学校字典记录
*
* @param id 学校ID
* @return 学校字典记录
*/
public Row getById(Long id) {
String sql = "SELECT * FROM rumi_school_dict WHERE id=?";
return Db.findFirstByCache("rumi_school_dict", id, 600, sql, id);
}
/**
* 根据ID获取学校名称信息
*
* @param id 学校ID
* @return 学校名称信息对象
*/
public SchoolDict getNameById(Long id) {
String sql = "SELECT id, full_name, abbr_name, bot_name FROM rumi_school_dict WHERE id=?";
Row record = Db.findFirstByCache("rumi_school_dict", id, 600, sql, id);
if (record == null) {
return null;
}
String fullName = record.getStr("full_name");
String abbrName = record.getStr("abbr_name");
String botName = record.getStr("bot_name");
return new SchoolDict(id, fullName, abbrName, botName);
}
}
3. SearchPromptService
(搜索提示词生成服务)
负责意图识别和提示词生成的主要逻辑。
package com.litongjava.llm.service;
import java.util.HashMap;
import java.util.Map;
import com.jfinal.template.Engine;
import com.jfinal.template.Template;
import com.litongjava.constants.ServerConfigKeys;
import com.litongjava.db.activerecord.Db;
import com.litongjava.db.activerecord.Row;
import com.litongjava.jfinal.aop.Aop;
import com.litongjava.llm.constants.AiChatEventName;
import com.litongjava.llm.dao.SchoolDictDao;
import com.litongjava.llm.vo.SchoolDict;
import com.litongjava.openai.constants.OpenAiModels;
import com.litongjava.openai.embedding.EmbeddingResponseVo;
import com.litongjava.openai.utils.EmbeddingVectorUtils;
import com.litongjava.template.SqlTemplates;
import com.litongjava.tio.core.ChannelContext;
import com.litongjava.tio.core.Tio;
import com.litongjava.tio.http.common.sse.SsePacket;
import com.litongjava.tio.utils.environment.EnvUtils;
import com.litongjava.tio.utils.json.JsonUtils;
import lombok.extern.slf4j.Slf4j;
@Slf4j
public class SearchPromptService {
/**
* 进行意图识别并生成提示词
*
* @param schoolId 学校ID
* @param textQuestion 用户查询内容
* @param channelContext 通道上下文(可选,用于实时反馈)
* @return 生成的提示词
*/
public String index(Long schoolId, String textQuestion, ChannelContext channelContext) {
Map<String, String> params = new HashMap<>();
// 获取文本向量
EmbeddingResponseVo vector = Aop.get(VectorService.class).getVector(textQuestion, OpenAiModels.text_embedding_3_large);
String vectorString = EmbeddingVectorUtils.toString(vector.getData().get(0).getEmbedding());
// 查询意图分类
String sql = SqlTemplates.get("llm_intent_classification.intent");
Row record = Db.findFirst(sql, vectorString, EnvUtils.get(ServerConfigKeys.APP_ENV));
if (record != null) {
if (channelContext != null) {
String json = JsonUtils.toJson(record.toMap());
SsePacket packet = new SsePacket(AiChatEventName.progress, "category:" + json);
Tio.bSend(channelContext, packet);
}
String category = record.getStr("name").replace('_', ' ');
String additionalInfo = record.getStr("additional_info");
params.put("category", category);
params.put("additional_info", additionalInfo);
} else {
if (channelContext != null) {
SsePacket packet = new SsePacket(AiChatEventName.progress, "default");
Tio.bSend(channelContext, packet);
}
}
// 获取学校信息
SchoolDict schoolDict = Aop.get(SchoolDictDao.class).getNameById(schoolId);
if (schoolDict == null) {
params.put("botName", "Spartan");
params.put("schoolName", "San Jose State University (SJSU)");
} else {
params.put("botName", schoolDict.getBotName());
params.put("schoolName", schoolDict.getFullName());
}
// 渲染提示词模板
Template template = Engine.use().getTemplate("search_init_prompt_v1.txt");
return template.renderToString(params);
}
}
服务层代码说明
VectorService:
- getVector:获取文本的向量表示。如果向量已存在于数据库中,则直接返回;否则,调用 OpenAI 的 API 生成向量并异步存储到数据库中。
SchoolDictDao:
- getById:根据学校 ID 获取学校字典记录。
- getNameById:根据学校 ID 获取学校名称信息,返回
SchoolDict
对象。
SearchPromptService:
- index:
- 接收学校 ID 和用户查询内容。
- 调用
VectorService.getVector
获取查询文本的向量表示。 - 使用向量进行意图识别,查询最相似的意图分类。
- 如果识别到意图,获取意图的名称和附加信息,并通过模板生成提示词。
- 获取学校信息,确保提示词中包含学校名称和机器人名称。
- 使用模板引擎渲染最终的提示词,返回给调用方。
- index:
模板设计
search_init_prompt_v1.txt
用于生成初始提示词的模板,结合意图分类和学校信息。
As the intelligent assistant named #{botName}, You are dedicated to helping students at #{schoolName} with all their questions. Your role includes addressing inquiries specifically related to the university, such as academic programs, campus events, and administrative procedures.
#if(category && additional_info)
Your questions mainly come from the category #{category}, which means that #{additional_info}
#end
数据示例
意图分类数据
-- 查询意图分类
SELECT id, name, description, action, prompt, additional_info, include_general
FROM llm_intent_classification
ORDER BY id;
id | name | description | action | prompt | additional_info | include_general |
---|---|---|---|---|---|---|
1 | course_registration_and_enrollment | You need to provide detailed step-by-step instructions to guide students through course enrollment in the corresponding university. List all available class selection choices and inform students about all possible situations they might encounter during the enrollment process. | 1 | 1 | ||
2 | financial_aid_and_tuition | you need to provide detailed step-by-step instructions to guide students through applying for financial aid, scholarships, payment plans, or any related funds in the corresponding university. Including links to the documents they need to fill out if possible. | 1 | 1 | ||
3 | campus_services_and_resources | you need to provide detailed responses for students who seek for campus facilities, health services, career services in the corresponding university, or any related resources. Provide a link to related campus website if possible. | 1 | 1 | ||
... | ... | ... | ... | ... | ||
33 | junk_questions | the questions you receive probably contains no useful information. Unless you can understand the content, you can simply answer 'The question does not seem to be valid, please use clearer words so that I can understand' | 1 | 1 |
意图问题数据
-- 查询意图问题
SELECT id, question, category_id
FROM llm_intent_question
ORDER BY id;
id | question | category_id |
---|---|---|
1 | How can I enroll in a class that is full? | 1 |
2 | What is the process for registering for summer classes? | 1 |
3 | Can I drop a course after the deadline? | 1 |
... | ... | ... |
127 | ? | 13 |
SQL 查询模板
intention_recognition.sql
用于意图识别的 SQL 查询,结合向量相似度计算。
--# llm_intent_classification.intent
SELECT
c.id,
c.name,
c.action,
c.additional_info,
c.include_general,
q.question,
(1 - (q.question_vector <=> input.input_vector)) AS similarity
FROM
llm_intent_question q
LEFT JOIN llm_intent_classification c ON q.category_id = c.id,
LATERAL (
VALUES (
?::VECTOR(3072)
)
) AS input(input_vector)
WHERE
c.env = ?
AND (1 - (q.question_vector <=> input.input_vector)) > 0.4
ORDER BY
similarity DESC
LIMIT 1;
总结
本文档详细介绍了校园领域搜索问答系统中意图识别与提示词生成功能模块的设计与实现。通过定义清晰的 API 接口、设计合理的数据库表结构以及实现高效的控制器和服务层代码,该模块能够有效地识别用户的查询意图,并基于意图生成相应的提示词,提升系统的响应准确性和用户体验。未来,可以进一步优化意图识别算法,扩展意图分类的覆盖范围,并结合用户行为数据,实现更加智能和个性化的提示词生成。