架构工程化与项目的大模型评测记录

一、架构工程化

1.项目环境

解决之前的虚拟环境pip情况，现在直接用uv 的工程化方式

后续用法：

uv init --package my_project
uv add fastapi uvicorn pytest ruff
uv run uvicorn my_project.main:app --reload
uv sync
uv build
uv publish

 uv export --format requirements-txt --no-hashes -o requirements.txt
 uv export --format requirements-txt --no-hashes --group dev -o requirements-dev.txt

清朗多了

架构工程化与项目的大模型评测记录

2.docker 工程化部署

增加一键docker-compose.yml, 前后端配置Dockfile文件，增加.env.compose，并映射端口不与本地冲突（如果想要本地多运行项目的话）

services:
  mysql:
    image: mysql:8.0
    container_name: ai-list-mysql
    restart: unless-stopped
    environment:
      TZ: ${TZ:-Asia/Shanghai}
      MYSQL_ROOT_PASSWORD: ${MYSQL_ROOT_PASSWORD:-123456}
      MYSQL_DATABASE: ${MYSQL_DATABASE:-ai_list}
    command:
      - --default-authentication-plugin=mysql_native_password
      - --character-set-server=utf8mb4
      - --collation-server=utf8mb4_unicode_ci
    ports:
      - "${MYSQL_PORT:-3307}:3306"
    volumes:
      - mysql_data:/var/lib/mysql
      - ./backend/sql/create_tables.sql:/docker-entrypoint-initdb.d/01-create_tables.sql:ro
      - ./backend/sql/insert_data.sql:/docker-entrypoint-initdb.d/02-insert_data.sql:ro
    healthcheck:
      test: ["CMD-SHELL", "mysqladmin ping -h 127.0.0.1 -uroot -p$${MYSQL_ROOT_PASSWORD} --silent"]
      interval: 10s
      timeout: 5s
      retries: 15

  redis:
    image: redis:7-alpine
    container_name: ai-list-redis
    restart: unless-stopped
    ports:
      - "${REDIS_PORT:-6380}:6379"
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 10

  embedding:
    image: ${EMBEDDING_IMAGE:-ghcr.io/huggingface/text-embeddings-inference:cuda-1.8.1}
    container_name: ai-list-embedding
    restart: unless-stopped
    command:
      - --model-id
      - ${EMBEDDING_MODEL_ID:-BAAI/bge-m3}
    ports:
      - "${EMBEDDING_PORT:-8081}:80"
    volumes:
      - ./data/tei:/data
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [ gpu ]

  backend:
    build:
      context: ./backend
      dockerfile: Dockerfile
    container_name: ai-list-backend
    restart: unless-stopped
    depends_on:
      mysql:
        condition: service_healthy
      redis:
        condition: service_healthy
      embedding:
        condition: service_started
    env_file:
      - ./backend/.env.compose
    environment:
      TZ: ${TZ:-Asia/Shanghai}
    command:
      - sh
      - -c
      - uv run uvicorn app.main:app --host 0.0.0.0 --port 1235
    ports:
      - "${BACKEND_PORT:-1235}:1235"
    volumes:
      - ./backend/logs:/app/logs

  frontend:
    build:
      context: ./frontend
      dockerfile: Dockerfile
      args:
        VITE_API_BASE_URL: http://localhost:${BACKEND_PORT:-1235}
        VITE_ACCESS_KEY: test_key
        VITE_ACCESS_SECRET: test_secret
        VITE_AI_SCENE: default
    container_name: ai-list-frontend
    restart: unless-stopped
    depends_on:
      - backend
    ports:
      - "${FRONTEND_PORT:-5173}:80"

volumes:
  mysql_data:

3.数据库xx

之前在实习中，部门主管有次分享代码编写规范，有个项目是全参数传递没有一个包含实际文本值的，流程大概是把所有大模型里用到的提示词和密钥信息以及其他的复用参数全映射成model字段建表加索引保存在数据库中而且没有冗余字段，只要对数据库的分库分表优化加载，那代码架构真的可以，当时他说他的两个徒弟都是架构师了，那时我就觉是吹牛，现在含泪唏嘘啊啊啊~

二、项目评估方案

1.项目一

AI_List_Generate

AB测试

数据集太大，个人项目暂不可取，有空在安排

LangSmith观测平台

1）翻译模型评估：

架构工程化与项目的大模型评测记录

第一轮推理评估代码：

from langsmith import evaluate, Client
from backend.app.database import get_db_instance
from backend.app.services.shop import shop_product_category_step1

client = Client()
dataset_name = "ai_list_generate_shop_step1_eval"


# 数据集里的字段传给你的一级类目预测函数，得到 3 个返回值
def target_step1(inputs: dict) -> dict:
    db = next(get_db_instance())
    try:
        candidates, raw_path, usage = shop_product_category_step1(
            site=inputs.get("site", ""),
            spu_image_url=inputs.get("spu_image_url"),
            sku_image_url_list=inputs.get("sku_image_url_list"),
            product_title=inputs.get("product_title", ""),
            category_name="",
            db_instance=db,
            scene=inputs.get("scene", "default"),
        )
        top1 = candidates[0] if candidates else {"category_id": "DEFAULT", "category_path": "General"}
        return {
            "pred_category_id": str(top1.get("category_id", "DEFAULT")),
            "pred_category_path": top1.get("category_path", "General"),
            "candidate_ids": [str(c.get("category_id", "")) for c in candidates[:3]],
            "raw_step1_category_path": raw_path or "",
        }
    finally:
        db.close()


# 预测路径是否和标注路径完全一致
def exact_match(inputs, outputs, reference_outputs):
    pred = (outputs.get("pred_category_path") or "").strip().lower()
    ref = (reference_outputs.get("ref_step1_category_path") or "").strip().lower()
    return pred == ref


# 按层级比较，算前缀命中率
def level_score(inputs, outputs, reference_outputs):
    pred = [x.strip().lower() for x in (outputs.get("pred_category_path") or "").split(">") if x.strip()]
    ref = [x.strip().lower() for x in (reference_outputs.get("ref_step1_category_path") or "").split(">") if x.strip()]
    if not ref:
        return 0.0
    hit = 0
    for p, r in zip(pred, ref):
        if p == r:
            hit += 1
        else:
            break
    return hit / len(ref)


# 正确答案的类目 ID 有没有出现在 Top3 候选里
def top3_hit(inputs, outputs, reference_outputs):
    ref_id = str(reference_outputs.get("ref_step1_category_id") or "").strip()
    candidate_ids = [str(x).strip() for x in (outputs.get("candidate_ids") or [])]
    if not ref_id:
        return False
    return ref_id in candidate_ids


# 原始生成路径不为空
def raw_path_non_empty(inputs, outputs, reference_outputs):
    return bool((outputs.get("raw_step1_category_path") or "").strip())


if __name__ == "__main__":
    results = evaluate(
        target_step1,
        data=dataset_name,
        evaluators=[exact_match, level_score, top3_hit,
                    raw_path_non_empty, ],
        experiment_prefix="ai_list_generate_shop_step1_eval_experiment",
        max_concurrency=4,
        blocking=True,
    )
    print(results)

结果：

架构工程化与项目的大模型评测记录