应用软件 | LLaMA-Factory

xwhuan

应用介绍

LLaMA-Factory是零隙智能（SeamLessAI）开源的低代码大模型训练框架，集成了业界最广泛使用的微调方法和优化技术，支持开发者使用私域数据、基于有限算力完成领域大模型的定制开发。它统一了各种高效微调方法，简化了常用的训练方法，包括生成式预训练、监督微调、强化学习和直接偏好优化。

使用指南

软件路径

/opt/app/LLaMA-Factory

环境变量

 /opt/app/anaconda3/envs/LLaMA-Factory

使用示例

申请资源
```
salloc -p <gpu-partition> --gres=gpu:1
```

加载环境

source /opt/app/anaconda3/bin/activate 
conda activate LLaMA-Factory

运行原始推理 test.py

import transformers
import torch

model_id = "/opt/app/LLaMA-Factory/models/Meta-Llama-3-8B-Instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},
]

prompt = pipeline.tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
)

terminators = [
    pipeline.tokenizer.eos_token_id,
    pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = pipeline(
    prompt,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
print(outputs[0]["generated_text"][len(prompt):])

推理模式demo

CUDA_VISIBLE_DEVICES=0 llamafactory-cli webchat  --model_name_or_path /opt/app/LLaMA-Factory/models/Meta-Llama-3-8B-Instruct  --template llama3