Merge 97cfb65630 into 5ff02b469f

fix:position error when creating segments (#10706 )
chore: Bump Alpine Linux to 3.20 in web dockerfile (#10671 )
2024-11-16 11:42:29 +08:00 · 2024-11-15 08:51:11 +08:00 · 2024-11-14 21:25:15 +08:00 · 2024-11-14 20:57:01 +08:00 · 2024-11-14 20:54:13 +08:00 · 2024-11-14 20:53:35 +08:00
37 changed files with 900 additions and 49 deletions
--- a/.github/pull_request_template.md
+++ b/.github/pull_request_template.md
@ -1,34 +1,32 @@
-# Checklist:
+# Summary
+
+Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.
+
+> [!Tip]
+> Close issue syntax: `Fixes #<issue number>` or `Resolves #<issue number>`, see [documentation](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword) for more details.
+
+
+# Screenshots
+
+<table>
+  <tr>
+  <td>Before: </td>
+  <td>After: </td>
+  </tr>
+  <tr>
+  <td>...</td>
+  <td>...</td>
+  </tr>
+</table>
+
+# Checklist

 > [!IMPORTANT]  
 > Please review the checklist below before submitting your pull request.

- [ ] Please open an issue before creating a PR or link to an existing issue
- [ ] I have performed a self-review of my own code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I ran `dev/reformat`(backend) and `cd web && npx lint-staged`(frontend) to appease the lint gods
-
-# Description
-
-Describe the big picture of your changes here to communicate to the maintainers why we should accept this pull request. If it fixes a bug or resolves a feature request, be sure to link to that issue. Close issue syntax: `Fixes #<issue number>`, see [documentation](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword) for more details.
-
-Fixes 
-
-## Type of Change
-
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
 - [ ] This change requires a documentation update, included: [Dify Document](https://github.com/langgenius/dify-docs)
- [ ] Improvement, including but not limited to code refactoring, performance optimization, and UI/UX improvement
- [ ] Dependency upgrade
-
-# Testing Instructions
-
-Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
-
- [ ] Test A
- [ ] Test B
-
-
+- [x] I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
+- [x] I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
+- [x] I've updated the documentation accordingly.
+- [x] I ran `dev/reformat`(backend) and `cd web && npx lint-staged`(frontend) to appease the lint gods

--- a/api/Dockerfile
+++ b/api/Dockerfile
@ -55,7 +55,7 @@ RUN apt-get update \
    && echo "deb http://deb.debian.org/debian testing main" > /etc/apt/sources.list \
    && apt-get update \
    # For Security
-    && apt-get install -y --no-install-recommends expat=2.6.3-2 libldap-2.5-0=2.5.18+dfsg-3+b1 perl=5.40.0-7 libsqlite3-0=3.46.1-1 zlib1g=1:1.3.dfsg+really1.3.1-1+b1 \
+    && apt-get install -y --no-install-recommends expat=2.6.4-1 libldap-2.5-0=2.5.18+dfsg-3+b1 perl=5.40.0-7 libsqlite3-0=3.46.1-1 zlib1g=1:1.3.dfsg+really1.3.1-1+b1 \
    # install a chinese font to support the use of tools like matplotlib
    && apt-get install -y fonts-noto-cjk \
    && apt-get autoremove -y \
--- a/api/core/model_runtime/model_providers/siliconflow/llm/Internvl2-26b.yaml
+++ b/api/core/model_runtime/model_providers/siliconflow/llm/Internvl2-26b.yaml
@ -0,0 +1,84 @@
+model: OpenGVLab/InternVL2-26B
+label:
+  en_US: OpenGVLab/InternVL2-26B
+model_type: llm
+features:
+  - vision
+model_properties:
+  mode: chat
+  context_size: 32768
+parameter_rules:
+  - name: temperature
+    use_template: temperature
+    type: float
+    default: 0.3
+    min: 0.0
+    max: 2.0
+    help:
+      zh_Hans: 用于控制随机性和多样性的程度。具体来说，temperature值控制了生成文本时对每个候选词的概率分布进行平滑的程度。较高的temperature值会降低概率分布的峰值，使得更多的低概率词被选择，生成结果更加多样化；而较低的temperature值则会增强概率分布的峰值，使得高概率词更容易被选择，生成结果更加确定。
+      en_US: Used to control the degree of randomness and diversity. Specifically, the temperature value controls the degree to which the probability distribution of each candidate word is smoothed when generating text. A higher temperature value will reduce the peak value of the probability distribution, allowing more low-probability words to be selected, and the generated results will be more diverse; while a lower temperature value will enhance the peak value of the probability distribution, making it easier for high-probability words to be selected. , the generated results are more certain.
+  - name: max_tokens
+    use_template: max_tokens
+    type: int
+    default: 2000
+    min: 1
+    max: 2000
+    help:
+      zh_Hans: 用于指定模型在生成内容时token的最大数量，它定义了生成的上限，但不保证每次都会生成到这个数量。
+      en_US: It is used to specify the maximum number of tokens when the model generates content. It defines the upper limit of generation, but does not guarantee that this number will be generated every time.
+  - name: top_p
+    use_template: top_p
+    type: float
+    default: 0.8
+    min: 0.1
+    max: 0.9
+    help:
+      zh_Hans: 生成过程中核采样方法概率阈值，例如，取值为0.8时，仅保留概率加起来大于等于0.8的最可能token的最小集合作为候选集。取值范围为（0,1.0)，取值越大，生成的随机性越高；取值越低，生成的确定性越高。
+      en_US: The probability threshold of the kernel sampling method during the generation process. For example, when the value is 0.8, only the smallest set of the most likely tokens with a sum of probabilities greater than or equal to 0.8 is retained as the candidate set. The value range is (0,1.0). The larger the value, the higher the randomness generated; the lower the value, the higher the certainty generated.
+  - name: top_k
+    type: int
+    min: 0
+    max: 99
+    label:
+      zh_Hans: 取样数量
+      en_US: Top k
+    help:
+      zh_Hans: 生成时，采样候选集的大小。例如，取值为50时，仅将单次生成中得分最高的50个token组成随机采样的候选集。取值越大，生成的随机性越高；取值越小，生成的确定性越高。
+      en_US: The size of the sample candidate set when generated. For example, when the value is 50, only the 50 highest-scoring tokens in a single generation form a randomly sampled candidate set. The larger the value, the higher the randomness generated; the smaller the value, the higher the certainty generated.
+  - name: seed
+    required: false
+    type: int
+    default: 1234
+    label:
+      zh_Hans: 随机种子
+      en_US: Random seed
+    help:
+      zh_Hans: 生成时使用的随机数种子，用户控制模型生成内容的随机性。支持无符号64位整数，默认值为 1234。在使用seed时，模型将尽可能生成相同或相似的结果，但目前不保证每次生成的结果完全相同。
+      en_US: The random number seed used when generating, the user controls the randomness of the content generated by the model. Supports unsigned 64-bit integers, default value is 1234. When using seed, the model will try its best to generate the same or similar results, but there is currently no guarantee that the results will be exactly the same every time.
+  - name: repetition_penalty
+    required: false
+    type: float
+    default: 1.1
+    label:
+      zh_Hans: 重复惩罚
+      en_US: Repetition penalty
+    help:
+      zh_Hans: 用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。
+      en_US: Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
+  - name: response_format
+    label:
+      zh_Hans: 回复格式
+      en_US: Response Format
+    type: string
+    help:
+      zh_Hans: 指定模型必须输出的格式
+      en_US: specifying the format that the model must output
+    required: false
+    options:
+      - text
+      - json_object
+pricing:
+  input: '21'
+  output: '21'
+  unit: '0.000001'
+  currency: RMB
--- a/api/core/model_runtime/model_providers/siliconflow/llm/Internvl2-8b.yaml
+++ b/api/core/model_runtime/model_providers/siliconflow/llm/Internvl2-8b.yaml
@ -0,0 +1,84 @@
+model: Pro/OpenGVLab/InternVL2-8B
+label:
+  en_US: Pro/OpenGVLab/InternVL2-8B
+model_type: llm
+features:
+  - vision
+model_properties:
+  mode: chat
+  context_size: 32768
+parameter_rules:
+  - name: temperature
+    use_template: temperature
+    type: float
+    default: 0.3
+    min: 0.0
+    max: 2.0
+    help:
+      zh_Hans: 用于控制随机性和多样性的程度。具体来说，temperature值控制了生成文本时对每个候选词的概率分布进行平滑的程度。较高的temperature值会降低概率分布的峰值，使得更多的低概率词被选择，生成结果更加多样化；而较低的temperature值则会增强概率分布的峰值，使得高概率词更容易被选择，生成结果更加确定。
+      en_US: Used to control the degree of randomness and diversity. Specifically, the temperature value controls the degree to which the probability distribution of each candidate word is smoothed when generating text. A higher temperature value will reduce the peak value of the probability distribution, allowing more low-probability words to be selected, and the generated results will be more diverse; while a lower temperature value will enhance the peak value of the probability distribution, making it easier for high-probability words to be selected. , the generated results are more certain.
+  - name: max_tokens
+    use_template: max_tokens
+    type: int
+    default: 2000
+    min: 1
+    max: 2000
+    help:
+      zh_Hans: 用于指定模型在生成内容时token的最大数量，它定义了生成的上限，但不保证每次都会生成到这个数量。
+      en_US: It is used to specify the maximum number of tokens when the model generates content. It defines the upper limit of generation, but does not guarantee that this number will be generated every time.
+  - name: top_p
+    use_template: top_p
+    type: float
+    default: 0.8
+    min: 0.1
+    max: 0.9
+    help:
+      zh_Hans: 生成过程中核采样方法概率阈值，例如，取值为0.8时，仅保留概率加起来大于等于0.8的最可能token的最小集合作为候选集。取值范围为（0,1.0)，取值越大，生成的随机性越高；取值越低，生成的确定性越高。
+      en_US: The probability threshold of the kernel sampling method during the generation process. For example, when the value is 0.8, only the smallest set of the most likely tokens with a sum of probabilities greater than or equal to 0.8 is retained as the candidate set. The value range is (0,1.0). The larger the value, the higher the randomness generated; the lower the value, the higher the certainty generated.
+  - name: top_k
+    type: int
+    min: 0
+    max: 99
+    label:
+      zh_Hans: 取样数量
+      en_US: Top k
+    help:
+      zh_Hans: 生成时，采样候选集的大小。例如，取值为50时，仅将单次生成中得分最高的50个token组成随机采样的候选集。取值越大，生成的随机性越高；取值越小，生成的确定性越高。
+      en_US: The size of the sample candidate set when generated. For example, when the value is 50, only the 50 highest-scoring tokens in a single generation form a randomly sampled candidate set. The larger the value, the higher the randomness generated; the smaller the value, the higher the certainty generated.
+  - name: seed
+    required: false
+    type: int
+    default: 1234
+    label:
+      zh_Hans: 随机种子
+      en_US: Random seed
+    help:
+      zh_Hans: 生成时使用的随机数种子，用户控制模型生成内容的随机性。支持无符号64位整数，默认值为 1234。在使用seed时，模型将尽可能生成相同或相似的结果，但目前不保证每次生成的结果完全相同。
+      en_US: The random number seed used when generating, the user controls the randomness of the content generated by the model. Supports unsigned 64-bit integers, default value is 1234. When using seed, the model will try its best to generate the same or similar results, but there is currently no guarantee that the results will be exactly the same every time.
+  - name: repetition_penalty
+    required: false
+    type: float
+    default: 1.1
+    label:
+      zh_Hans: 重复惩罚
+      en_US: Repetition penalty
+    help:
+      zh_Hans: 用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。
+      en_US: Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
+  - name: response_format
+    label:
+      zh_Hans: 回复格式
+      en_US: Response Format
+    type: string
+    help:
+      zh_Hans: 指定模型必须输出的格式
+      en_US: specifying the format that the model must output
+    required: false
+    options:
+      - text
+      - json_object
+pricing:
+  input: '21'
+  output: '21'
+  unit: '0.000001'
+  currency: RMB
--- a/api/core/model_runtime/model_providers/siliconflow/llm/_position.yaml
+++ b/api/core/model_runtime/model_providers/siliconflow/llm/_position.yaml
@ -1,16 +1,18 @@
+- Tencent/Hunyuan-A52B-Instruct
 - Qwen/Qwen2.5-72B-Instruct
 - Qwen/Qwen2.5-32B-Instruct
 - Qwen/Qwen2.5-14B-Instruct
 - Qwen/Qwen2.5-7B-Instruct
+- Qwen/Qwen2.5-Coder-32B-Instruct
 - Qwen/Qwen2.5-Coder-7B-Instruct
 - Qwen/Qwen2.5-Math-72B-Instruct
- Qwen/Qwen2-72B-Instruct
- Qwen/Qwen2-57B-A14B-Instruct
- Qwen/Qwen2-7B-Instruct
+- Qwen/Qwen2-VL-72B-Instruct
 - Qwen/Qwen2-1.5B-Instruct
+- Pro/Qwen/Qwen2-VL-7B-Instruct
+- OpenGVLab/InternVL2-Llama3-76B
+- OpenGVLab/InternVL2-26B
+- Pro/OpenGVLab/InternVL2-8B
 - deepseek-ai/DeepSeek-V2.5
- deepseek-ai/DeepSeek-V2-Chat
- deepseek-ai/DeepSeek-Coder-V2-Instruct
 - THUDM/glm-4-9b-chat
 - 01-ai/Yi-1.5-34B-Chat-16K
 - 01-ai/Yi-1.5-9B-Chat-16K
@ -20,9 +22,6 @@
 - meta-llama/Meta-Llama-3.1-405B-Instruct
 - meta-llama/Meta-Llama-3.1-70B-Instruct
 - meta-llama/Meta-Llama-3.1-8B-Instruct
- meta-llama/Meta-Llama-3-70B-Instruct
- meta-llama/Meta-Llama-3-8B-Instruct
 - google/gemma-2-27b-it
 - google/gemma-2-9b-it
- mistralai/Mistral-7B-Instruct-v0.2
- mistralai/Mixtral-8x7B-Instruct-v0.1
+- deepseek-ai/DeepSeek-V2-Chat
--- a/api/core/model_runtime/model_providers/siliconflow/llm/deepdeek-coder-v2-instruct.yaml
+++ b/api/core/model_runtime/model_providers/siliconflow/llm/deepdeek-coder-v2-instruct.yaml
@ -37,3 +37,4 @@ pricing:
  output: '1.33'
  unit: '0.000001'
  currency: RMB
+deprecated: true
--- a/api/core/model_runtime/model_providers/siliconflow/llm/deepseek-v2-chat.yaml
+++ b/api/core/model_runtime/model_providers/siliconflow/llm/deepseek-v2-chat.yaml
@ -37,3 +37,4 @@ pricing:
  output: '1.33'
  unit: '0.000001'
  currency: RMB
+deprecated: true
--- a/api/core/model_runtime/model_providers/siliconflow/llm/deepseek-v2.5.yaml
+++ b/api/core/model_runtime/model_providers/siliconflow/llm/deepseek-v2.5.yaml
@ -4,6 +4,8 @@ label:
 model_type: llm
 features:
  - agent-thought
+  - tool-call
+  - stream-tool-call
 model_properties:
  mode: chat
  context_size: 32768
--- a/api/core/model_runtime/model_providers/siliconflow/llm/hunyuan-a52b-instruct.yaml
+++ b/api/core/model_runtime/model_providers/siliconflow/llm/hunyuan-a52b-instruct.yaml
@ -0,0 +1,84 @@
+model: Tencent/Hunyuan-A52B-Instruct
+label:
+  en_US: Tencent/Hunyuan-A52B-Instruct
+model_type: llm
+features:
+  - agent-thought
+model_properties:
+  mode: chat
+  context_size: 32768
+parameter_rules:
+  - name: temperature
+    use_template: temperature
+    type: float
+    default: 0.3
+    min: 0.0
+    max: 2.0
+    help:
+      zh_Hans: 用于控制随机性和多样性的程度。具体来说，temperature值控制了生成文本时对每个候选词的概率分布进行平滑的程度。较高的temperature值会降低概率分布的峰值，使得更多的低概率词被选择，生成结果更加多样化；而较低的temperature值则会增强概率分布的峰值，使得高概率词更容易被选择，生成结果更加确定。
+      en_US: Used to control the degree of randomness and diversity. Specifically, the temperature value controls the degree to which the probability distribution of each candidate word is smoothed when generating text. A higher temperature value will reduce the peak value of the probability distribution, allowing more low-probability words to be selected, and the generated results will be more diverse; while a lower temperature value will enhance the peak value of the probability distribution, making it easier for high-probability words to be selected. , the generated results are more certain.
+  - name: max_tokens
+    use_template: max_tokens
+    type: int
+    default: 2000
+    min: 1
+    max: 2000
+    help:
+      zh_Hans: 用于指定模型在生成内容时token的最大数量，它定义了生成的上限，但不保证每次都会生成到这个数量。
+      en_US: It is used to specify the maximum number of tokens when the model generates content. It defines the upper limit of generation, but does not guarantee that this number will be generated every time.
+  - name: top_p
+    use_template: top_p
+    type: float
+    default: 0.8
+    min: 0.1
+    max: 0.9
+    help:
+      zh_Hans: 生成过程中核采样方法概率阈值，例如，取值为0.8时，仅保留概率加起来大于等于0.8的最可能token的最小集合作为候选集。取值范围为（0,1.0)，取值越大，生成的随机性越高；取值越低，生成的确定性越高。
+      en_US: The probability threshold of the kernel sampling method during the generation process. For example, when the value is 0.8, only the smallest set of the most likely tokens with a sum of probabilities greater than or equal to 0.8 is retained as the candidate set. The value range is (0,1.0). The larger the value, the higher the randomness generated; the lower the value, the higher the certainty generated.
+  - name: top_k
+    type: int
+    min: 0
+    max: 99
+    label:
+      zh_Hans: 取样数量
+      en_US: Top k
+    help:
+      zh_Hans: 生成时，采样候选集的大小。例如，取值为50时，仅将单次生成中得分最高的50个token组成随机采样的候选集。取值越大，生成的随机性越高；取值越小，生成的确定性越高。
+      en_US: The size of the sample candidate set when generated. For example, when the value is 50, only the 50 highest-scoring tokens in a single generation form a randomly sampled candidate set. The larger the value, the higher the randomness generated; the smaller the value, the higher the certainty generated.
+  - name: seed
+    required: false
+    type: int
+    default: 1234
+    label:
+      zh_Hans: 随机种子
+      en_US: Random seed
+    help:
+      zh_Hans: 生成时使用的随机数种子，用户控制模型生成内容的随机性。支持无符号64位整数，默认值为 1234。在使用seed时，模型将尽可能生成相同或相似的结果，但目前不保证每次生成的结果完全相同。
+      en_US: The random number seed used when generating, the user controls the randomness of the content generated by the model. Supports unsigned 64-bit integers, default value is 1234. When using seed, the model will try its best to generate the same or similar results, but there is currently no guarantee that the results will be exactly the same every time.
+  - name: repetition_penalty
+    required: false
+    type: float
+    default: 1.1
+    label:
+      zh_Hans: 重复惩罚
+      en_US: Repetition penalty
+    help:
+      zh_Hans: 用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。
+      en_US: Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
+  - name: response_format
+    label:
+      zh_Hans: 回复格式
+      en_US: Response Format
+    type: string
+    help:
+      zh_Hans: 指定模型必须输出的格式
+      en_US: specifying the format that the model must output
+    required: false
+    options:
+      - text
+      - json_object
+pricing:
+  input: '21'
+  output: '21'
+  unit: '0.000001'
+  currency: RMB
--- a/api/core/model_runtime/model_providers/siliconflow/llm/internvl2-llama3-76b.yaml
+++ b/api/core/model_runtime/model_providers/siliconflow/llm/internvl2-llama3-76b.yaml
@ -0,0 +1,84 @@
+model: OpenGVLab/InternVL2-Llama3-76B
+label:
+  en_US: OpenGVLab/InternVL2-Llama3-76B
+model_type: llm
+features:
+  - vision
+model_properties:
+  mode: chat
+  context_size: 8192
+parameter_rules:
+  - name: temperature
+    use_template: temperature
+    type: float
+    default: 0.3
+    min: 0.0
+    max: 2.0
+    help:
+      zh_Hans: 用于控制随机性和多样性的程度。具体来说，temperature值控制了生成文本时对每个候选词的概率分布进行平滑的程度。较高的temperature值会降低概率分布的峰值，使得更多的低概率词被选择，生成结果更加多样化；而较低的temperature值则会增强概率分布的峰值，使得高概率词更容易被选择，生成结果更加确定。
+      en_US: Used to control the degree of randomness and diversity. Specifically, the temperature value controls the degree to which the probability distribution of each candidate word is smoothed when generating text. A higher temperature value will reduce the peak value of the probability distribution, allowing more low-probability words to be selected, and the generated results will be more diverse; while a lower temperature value will enhance the peak value of the probability distribution, making it easier for high-probability words to be selected. , the generated results are more certain.
+  - name: max_tokens
+    use_template: max_tokens
+    type: int
+    default: 2000
+    min: 1
+    max: 2000
+    help:
+      zh_Hans: 用于指定模型在生成内容时token的最大数量，它定义了生成的上限，但不保证每次都会生成到这个数量。
+      en_US: It is used to specify the maximum number of tokens when the model generates content. It defines the upper limit of generation, but does not guarantee that this number will be generated every time.
+  - name: top_p
+    use_template: top_p
+    type: float
+    default: 0.8
+    min: 0.1
+    max: 0.9
+    help:
+      zh_Hans: 生成过程中核采样方法概率阈值，例如，取值为0.8时，仅保留概率加起来大于等于0.8的最可能token的最小集合作为候选集。取值范围为（0,1.0)，取值越大，生成的随机性越高；取值越低，生成的确定性越高。
+      en_US: The probability threshold of the kernel sampling method during the generation process. For example, when the value is 0.8, only the smallest set of the most likely tokens with a sum of probabilities greater than or equal to 0.8 is retained as the candidate set. The value range is (0,1.0). The larger the value, the higher the randomness generated; the lower the value, the higher the certainty generated.
+  - name: top_k
+    type: int
+    min: 0
+    max: 99
+    label:
+      zh_Hans: 取样数量
+      en_US: Top k
+    help:
+      zh_Hans: 生成时，采样候选集的大小。例如，取值为50时，仅将单次生成中得分最高的50个token组成随机采样的候选集。取值越大，生成的随机性越高；取值越小，生成的确定性越高。
+      en_US: The size of the sample candidate set when generated. For example, when the value is 50, only the 50 highest-scoring tokens in a single generation form a randomly sampled candidate set. The larger the value, the higher the randomness generated; the smaller the value, the higher the certainty generated.
+  - name: seed
+    required: false
+    type: int
+    default: 1234
+    label:
+      zh_Hans: 随机种子
+      en_US: Random seed
+    help:
+      zh_Hans: 生成时使用的随机数种子，用户控制模型生成内容的随机性。支持无符号64位整数，默认值为 1234。在使用seed时，模型将尽可能生成相同或相似的结果，但目前不保证每次生成的结果完全相同。
+      en_US: The random number seed used when generating, the user controls the randomness of the content generated by the model. Supports unsigned 64-bit integers, default value is 1234. When using seed, the model will try its best to generate the same or similar results, but there is currently no guarantee that the results will be exactly the same every time.
+  - name: repetition_penalty
+    required: false
+    type: float
+    default: 1.1
+    label:
+      zh_Hans: 重复惩罚
+      en_US: Repetition penalty
+    help:
+      zh_Hans: 用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。
+      en_US: Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
+  - name: response_format
+    label:
+      zh_Hans: 回复格式
+      en_US: Response Format
+    type: string
+    help:
+      zh_Hans: 指定模型必须输出的格式
+      en_US: specifying the format that the model must output
+    required: false
+    options:
+      - text
+      - json_object
+pricing:
+  input: '21'
+  output: '21'
+  unit: '0.000001'
+  currency: RMB
--- a/api/core/model_runtime/model_providers/siliconflow/llm/meta-mlama-3-70b-instruct.yaml
+++ b/api/core/model_runtime/model_providers/siliconflow/llm/meta-mlama-3-70b-instruct.yaml
@ -37,3 +37,4 @@ pricing:
  output: '4.13'
  unit: '0.000001'
  currency: RMB
+deprecated: true
--- a/api/core/model_runtime/model_providers/siliconflow/llm/meta-mlama-3-8b-instruct.yaml
+++ b/api/core/model_runtime/model_providers/siliconflow/llm/meta-mlama-3-8b-instruct.yaml
@ -37,3 +37,4 @@ pricing:
  output: '0'
  unit: '0.000001'
  currency: RMB
+deprecated: true
--- a/api/core/model_runtime/model_providers/siliconflow/llm/meta-mlama-3.1-70b-instruct.yaml
+++ b/api/core/model_runtime/model_providers/siliconflow/llm/meta-mlama-3.1-70b-instruct.yaml
@ -6,7 +6,7 @@ features:
  - agent-thought
 model_properties:
  mode: chat
-  context_size: 32768
+  context_size: 8192
 parameter_rules:
  - name: temperature
    use_template: temperature
--- a/api/core/model_runtime/model_providers/siliconflow/llm/qwen2-57b-a14b-instruct.yaml
+++ b/api/core/model_runtime/model_providers/siliconflow/llm/qwen2-57b-a14b-instruct.yaml
@ -37,3 +37,4 @@ pricing:
  output: '1.26'
  unit: '0.000001'
  currency: RMB
+deprecated: true
--- a/api/core/model_runtime/model_providers/siliconflow/llm/qwen2-72b-instruct.yaml
+++ b/api/core/model_runtime/model_providers/siliconflow/llm/qwen2-72b-instruct.yaml
@ -37,3 +37,4 @@ pricing:
  output: '4.13'
  unit: '0.000001'
  currency: RMB
+deprecated: true
--- a/api/core/model_runtime/model_providers/siliconflow/llm/qwen2-7b-instruct.yaml
+++ b/api/core/model_runtime/model_providers/siliconflow/llm/qwen2-7b-instruct.yaml
@ -37,3 +37,4 @@ pricing:
  output: '0'
  unit: '0.000001'
  currency: RMB
+deprecated: true
--- a/api/core/model_runtime/model_providers/siliconflow/llm/qwen2-vl-72b-instruct.yaml
+++ b/api/core/model_runtime/model_providers/siliconflow/llm/qwen2-vl-72b-instruct.yaml
@ -0,0 +1,84 @@
+model: Qwen/Qwen2-VL-72B-Instruct
+label:
+  en_US: Qwen/Qwen2-VL-72B-Instruct
+model_type: llm
+features:
+  - vision
+model_properties:
+  mode: chat
+  context_size: 32768
+parameter_rules:
+  - name: temperature
+    use_template: temperature
+    type: float
+    default: 0.3
+    min: 0.0
+    max: 2.0
+    help:
+      zh_Hans: 用于控制随机性和多样性的程度。具体来说，temperature值控制了生成文本时对每个候选词的概率分布进行平滑的程度。较高的temperature值会降低概率分布的峰值，使得更多的低概率词被选择，生成结果更加多样化；而较低的temperature值则会增强概率分布的峰值，使得高概率词更容易被选择，生成结果更加确定。
+      en_US: Used to control the degree of randomness and diversity. Specifically, the temperature value controls the degree to which the probability distribution of each candidate word is smoothed when generating text. A higher temperature value will reduce the peak value of the probability distribution, allowing more low-probability words to be selected, and the generated results will be more diverse; while a lower temperature value will enhance the peak value of the probability distribution, making it easier for high-probability words to be selected. , the generated results are more certain.
+  - name: max_tokens
+    use_template: max_tokens
+    type: int
+    default: 2000
+    min: 1
+    max: 2000
+    help:
+      zh_Hans: 用于指定模型在生成内容时token的最大数量，它定义了生成的上限，但不保证每次都会生成到这个数量。
+      en_US: It is used to specify the maximum number of tokens when the model generates content. It defines the upper limit of generation, but does not guarantee that this number will be generated every time.
+  - name: top_p
+    use_template: top_p
+    type: float
+    default: 0.8
+    min: 0.1
+    max: 0.9
+    help:
+      zh_Hans: 生成过程中核采样方法概率阈值，例如，取值为0.8时，仅保留概率加起来大于等于0.8的最可能token的最小集合作为候选集。取值范围为（0,1.0)，取值越大，生成的随机性越高；取值越低，生成的确定性越高。
+      en_US: The probability threshold of the kernel sampling method during the generation process. For example, when the value is 0.8, only the smallest set of the most likely tokens with a sum of probabilities greater than or equal to 0.8 is retained as the candidate set. The value range is (0,1.0). The larger the value, the higher the randomness generated; the lower the value, the higher the certainty generated.
+  - name: top_k
+    type: int
+    min: 0
+    max: 99
+    label:
+      zh_Hans: 取样数量
+      en_US: Top k
+    help:
+      zh_Hans: 生成时，采样候选集的大小。例如，取值为50时，仅将单次生成中得分最高的50个token组成随机采样的候选集。取值越大，生成的随机性越高；取值越小，生成的确定性越高。
+      en_US: The size of the sample candidate set when generated. For example, when the value is 50, only the 50 highest-scoring tokens in a single generation form a randomly sampled candidate set. The larger the value, the higher the randomness generated; the smaller the value, the higher the certainty generated.
+  - name: seed
+    required: false
+    type: int
+    default: 1234
+    label:
+      zh_Hans: 随机种子
+      en_US: Random seed
+    help:
+      zh_Hans: 生成时使用的随机数种子，用户控制模型生成内容的随机性。支持无符号64位整数，默认值为 1234。在使用seed时，模型将尽可能生成相同或相似的结果，但目前不保证每次生成的结果完全相同。
+      en_US: The random number seed used when generating, the user controls the randomness of the content generated by the model. Supports unsigned 64-bit integers, default value is 1234. When using seed, the model will try its best to generate the same or similar results, but there is currently no guarantee that the results will be exactly the same every time.
+  - name: repetition_penalty
+    required: false
+    type: float
+    default: 1.1
+    label:
+      zh_Hans: 重复惩罚
+      en_US: Repetition penalty
+    help:
+      zh_Hans: 用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。
+      en_US: Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
+  - name: response_format
+    label:
+      zh_Hans: 回复格式
+      en_US: Response Format
+    type: string
+    help:
+      zh_Hans: 指定模型必须输出的格式
+      en_US: specifying the format that the model must output
+    required: false
+    options:
+      - text
+      - json_object
+pricing:
+  input: '21'
+  output: '21'
+  unit: '0.000001'
+  currency: RMB
--- a/api/core/model_runtime/model_providers/siliconflow/llm/qwen2-vl-7b-Instruct.yaml
+++ b/api/core/model_runtime/model_providers/siliconflow/llm/qwen2-vl-7b-Instruct.yaml
@ -0,0 +1,84 @@
+model: Pro/Qwen/Qwen2-VL-7B-Instruct
+label:
+  en_US: Pro/Qwen/Qwen2-VL-7B-Instruct
+model_type: llm
+features:
+  - vision
+model_properties:
+  mode: chat
+  context_size: 32768
+parameter_rules:
+  - name: temperature
+    use_template: temperature
+    type: float
+    default: 0.3
+    min: 0.0
+    max: 2.0
+    help:
+      zh_Hans: 用于控制随机性和多样性的程度。具体来说，temperature值控制了生成文本时对每个候选词的概率分布进行平滑的程度。较高的temperature值会降低概率分布的峰值，使得更多的低概率词被选择，生成结果更加多样化；而较低的temperature值则会增强概率分布的峰值，使得高概率词更容易被选择，生成结果更加确定。
+      en_US: Used to control the degree of randomness and diversity. Specifically, the temperature value controls the degree to which the probability distribution of each candidate word is smoothed when generating text. A higher temperature value will reduce the peak value of the probability distribution, allowing more low-probability words to be selected, and the generated results will be more diverse; while a lower temperature value will enhance the peak value of the probability distribution, making it easier for high-probability words to be selected. , the generated results are more certain.
+  - name: max_tokens
+    use_template: max_tokens
+    type: int
+    default: 2000
+    min: 1
+    max: 2000
+    help:
+      zh_Hans: 用于指定模型在生成内容时token的最大数量，它定义了生成的上限，但不保证每次都会生成到这个数量。
+      en_US: It is used to specify the maximum number of tokens when the model generates content. It defines the upper limit of generation, but does not guarantee that this number will be generated every time.
+  - name: top_p
+    use_template: top_p
+    type: float
+    default: 0.8
+    min: 0.1
+    max: 0.9
+    help:
+      zh_Hans: 生成过程中核采样方法概率阈值，例如，取值为0.8时，仅保留概率加起来大于等于0.8的最可能token的最小集合作为候选集。取值范围为（0,1.0)，取值越大，生成的随机性越高；取值越低，生成的确定性越高。
+      en_US: The probability threshold of the kernel sampling method during the generation process. For example, when the value is 0.8, only the smallest set of the most likely tokens with a sum of probabilities greater than or equal to 0.8 is retained as the candidate set. The value range is (0,1.0). The larger the value, the higher the randomness generated; the lower the value, the higher the certainty generated.
+  - name: top_k
+    type: int
+    min: 0
+    max: 99
+    label:
+      zh_Hans: 取样数量
+      en_US: Top k
+    help:
+      zh_Hans: 生成时，采样候选集的大小。例如，取值为50时，仅将单次生成中得分最高的50个token组成随机采样的候选集。取值越大，生成的随机性越高；取值越小，生成的确定性越高。
+      en_US: The size of the sample candidate set when generated. For example, when the value is 50, only the 50 highest-scoring tokens in a single generation form a randomly sampled candidate set. The larger the value, the higher the randomness generated; the smaller the value, the higher the certainty generated.
+  - name: seed
+    required: false
+    type: int
+    default: 1234
+    label:
+      zh_Hans: 随机种子
+      en_US: Random seed
+    help:
+      zh_Hans: 生成时使用的随机数种子，用户控制模型生成内容的随机性。支持无符号64位整数，默认值为 1234。在使用seed时，模型将尽可能生成相同或相似的结果，但目前不保证每次生成的结果完全相同。
+      en_US: The random number seed used when generating, the user controls the randomness of the content generated by the model. Supports unsigned 64-bit integers, default value is 1234. When using seed, the model will try its best to generate the same or similar results, but there is currently no guarantee that the results will be exactly the same every time.
+  - name: repetition_penalty
+    required: false
+    type: float
+    default: 1.1
+    label:
+      zh_Hans: 重复惩罚
+      en_US: Repetition penalty
+    help:
+      zh_Hans: 用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。
+      en_US: Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
+  - name: response_format
+    label:
+      zh_Hans: 回复格式
+      en_US: Response Format
+    type: string
+    help:
+      zh_Hans: 指定模型必须输出的格式
+      en_US: specifying the format that the model must output
+    required: false
+    options:
+      - text
+      - json_object
+pricing:
+  input: '21'
+  output: '21'
+  unit: '0.000001'
+  currency: RMB
--- a/api/core/model_runtime/model_providers/siliconflow/llm/qwen2.5-coder-32b-instruct.yaml
+++ b/api/core/model_runtime/model_providers/siliconflow/llm/qwen2.5-coder-32b-instruct.yaml
@ -0,0 +1,84 @@
+model: Qwen/Qwen2.5-Coder-32B-Instruct
+label:
+  en_US: Qwen/Qwen2.5-Coder-32B-Instruct
+model_type: llm
+features:
+  - agent-thought
+model_properties:
+  mode: chat
+  context_size: 32768
+parameter_rules:
+  - name: temperature
+    use_template: temperature
+    type: float
+    default: 0.3
+    min: 0.0
+    max: 2.0
+    help:
+      zh_Hans: 用于控制随机性和多样性的程度。具体来说，temperature值控制了生成文本时对每个候选词的概率分布进行平滑的程度。较高的temperature值会降低概率分布的峰值，使得更多的低概率词被选择，生成结果更加多样化；而较低的temperature值则会增强概率分布的峰值，使得高概率词更容易被选择，生成结果更加确定。
+      en_US: Used to control the degree of randomness and diversity. Specifically, the temperature value controls the degree to which the probability distribution of each candidate word is smoothed when generating text. A higher temperature value will reduce the peak value of the probability distribution, allowing more low-probability words to be selected, and the generated results will be more diverse; while a lower temperature value will enhance the peak value of the probability distribution, making it easier for high-probability words to be selected. , the generated results are more certain.
+  - name: max_tokens
+    use_template: max_tokens
+    type: int
+    default: 8192
+    min: 1
+    max: 8192
+    help:
+      zh_Hans: 用于指定模型在生成内容时token的最大数量，它定义了生成的上限，但不保证每次都会生成到这个数量。
+      en_US: It is used to specify the maximum number of tokens when the model generates content. It defines the upper limit of generation, but does not guarantee that this number will be generated every time.
+  - name: top_p
+    use_template: top_p
+    type: float
+    default: 0.8
+    min: 0.1
+    max: 0.9
+    help:
+      zh_Hans: 生成过程中核采样方法概率阈值，例如，取值为0.8时，仅保留概率加起来大于等于0.8的最可能token的最小集合作为候选集。取值范围为（0,1.0)，取值越大，生成的随机性越高；取值越低，生成的确定性越高。
+      en_US: The probability threshold of the kernel sampling method during the generation process. For example, when the value is 0.8, only the smallest set of the most likely tokens with a sum of probabilities greater than or equal to 0.8 is retained as the candidate set. The value range is (0,1.0). The larger the value, the higher the randomness generated; the lower the value, the higher the certainty generated.
+  - name: top_k
+    type: int
+    min: 0
+    max: 99
+    label:
+      zh_Hans: 取样数量
+      en_US: Top k
+    help:
+      zh_Hans: 生成时，采样候选集的大小。例如，取值为50时，仅将单次生成中得分最高的50个token组成随机采样的候选集。取值越大，生成的随机性越高；取值越小，生成的确定性越高。
+      en_US: The size of the sample candidate set when generated. For example, when the value is 50, only the 50 highest-scoring tokens in a single generation form a randomly sampled candidate set. The larger the value, the higher the randomness generated; the smaller the value, the higher the certainty generated.
+  - name: seed
+    required: false
+    type: int
+    default: 1234
+    label:
+      zh_Hans: 随机种子
+      en_US: Random seed
+    help:
+      zh_Hans: 生成时使用的随机数种子，用户控制模型生成内容的随机性。支持无符号64位整数，默认值为 1234。在使用seed时，模型将尽可能生成相同或相似的结果，但目前不保证每次生成的结果完全相同。
+      en_US: The random number seed used when generating, the user controls the randomness of the content generated by the model. Supports unsigned 64-bit integers, default value is 1234. When using seed, the model will try its best to generate the same or similar results, but there is currently no guarantee that the results will be exactly the same every time.
+  - name: repetition_penalty
+    required: false
+    type: float
+    default: 1.1
+    label:
+      zh_Hans: 重复惩罚
+      en_US: Repetition penalty
+    help:
+      zh_Hans: 用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。
+      en_US: Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
+  - name: response_format
+    label:
+      zh_Hans: 回复格式
+      en_US: Response Format
+    type: string
+    help:
+      zh_Hans: 指定模型必须输出的格式
+      en_US: specifying the format that the model must output
+    required: false
+    options:
+      - text
+      - json_object
+pricing:
+  input: '1.26'
+  output: '1.26'
+  unit: '0.000001'
+  currency: RMB
--- a/api/core/model_runtime/model_providers/siliconflow/speech2text/funaudio-sense-voice-small.yaml
+++ b/api/core/model_runtime/model_providers/siliconflow/speech2text/funaudio-sense-voice-small.yaml
@ -0,0 +1,5 @@
+model: FunAudioLLM/SenseVoiceSmall
+model_type: speech2text
+model_properties:
+  file_upload_limit: 1
+  supported_file_extensions: mp3,wav
--- a/api/core/model_runtime/model_providers/siliconflow/speech2text/sense-voice-small.yaml
+++ b/api/core/model_runtime/model_providers/siliconflow/speech2text/sense-voice-small.yaml
@ -3,3 +3,4 @@ model_type: speech2text
 model_properties:
  file_upload_limit: 1
  supported_file_extensions: mp3,wav
+deprecated: true
--- a/api/core/tools/provider/builtin/audio/_assets/icon.svg
+++ b/api/core/tools/provider/builtin/audio/_assets/icon.svg
@ -0,0 +1,3 @@
+<svg xmlns="http://www.w3.org/2000/svg" width="200" height="200" viewBox="0 0 200 200" fill="none">
+<path d="M167.358 102.395C167.358 117.174 157.246 129.18 144.61 131.027H137.861C125.225 129.18 115.113 117.174 115.113 102.395H100.792C100.792 123.637 115.118 142.106 133.653 145.801V164.276H147.139V145.801C165.674 142.106 180 124.558 180 102.4H167.358V102.395ZM154.717 62.677C154.717 53.4397 147.979 46.9765 140.396 46.9765C138.523 46.9446 136.663 47.3273 134.924 48.1024C133.185 48.8775 131.603 50.0294 130.27 51.4909C128.936 52.9524 127.878 54.6943 127.157 56.6148C126.436 58.5354 126.066 60.5962 126.07 62.677V78.3775H154.717V70.4478V62.677ZM126.07 102.395C126.07 111.632 132.813 118.095 140.396 118.095C142.269 118.127 144.13 117.744 145.868 116.969C147.607 116.194 149.189 115.042 150.523 113.581C151.856 112.119 152.914 110.377 153.635 108.457C154.356 106.536 154.726 104.475 154.722 102.395V86.694H126.07V102.395ZM92.1297 45.8938L70.4796 21.7595L69.4235 20.5865L59.604 20L68.3674 20.5865L67.3113 21.7654L64.1429 25.2961L63.6149 25.8826L64.1429 27.0614L66.2552 29.4133L77.8723 42.3631H54.1099C35.1 43.5361 20.3146 61.1896 20.3146 81.7874V83.5527H28.2354V81.7932C28.2354 65.8992 39.8525 52.3628 54.1099 51.1899H77.8723L66.2552 64.1338L64.671 65.8992L64.1429 67.0722L63.6149 67.6645L64.1429 68.251L68.3674 72.9606L68.8954 73.5471L69.4235 72.9606L74.1759 67.6645L92.1297 47.6591L92.6578 47.0727L92.1297 45.8938ZM20 95.8496V118.213H30.033V107.034H50.099V168.821H40.066V180H70.165V168.821H60.132V107.034H80.198V118.213H90.231V95.8496H20Z" fill="#FF0099"/>
+</svg>
--- a/api/core/tools/provider/builtin/audio/audio.py
+++ b/api/core/tools/provider/builtin/audio/audio.py
@ -0,0 +1,6 @@
+from core.tools.provider.builtin_tool_provider import BuiltinToolProviderController
+
+
+class AudioToolProvider(BuiltinToolProviderController):
+    def _validate_credentials(self, credentials: dict) -> None:
+        pass
--- a/api/core/tools/provider/builtin/audio/audio.yaml
+++ b/api/core/tools/provider/builtin/audio/audio.yaml
@ -0,0 +1,11 @@
+identity:
+  author: hjlarry
+  name: audio
+  label:
+    en_US: Audio
+  description:
+    en_US: A tool for tts and asr.
+    zh_Hans: 一个用于文本转语音和语音转文本的工具。
+  icon: icon.svg
+  tags:
+    - utilities
--- a/api/core/tools/provider/builtin/audio/tools/asr.py
+++ b/api/core/tools/provider/builtin/audio/tools/asr.py
@ -0,0 +1,70 @@
+import io
+from typing import Any
+
+from core.file.enums import FileType
+from core.file.file_manager import download
+from core.model_manager import ModelManager
+from core.model_runtime.entities.model_entities import ModelType
+from core.tools.entities.common_entities import I18nObject
+from core.tools.entities.tool_entities import ToolInvokeMessage, ToolParameter, ToolParameterOption
+from core.tools.tool.builtin_tool import BuiltinTool
+from services.model_provider_service import ModelProviderService
+
+
+class ASRTool(BuiltinTool):
+    def _invoke(self, user_id: str, tool_parameters: dict[str, Any]) -> list[ToolInvokeMessage]:
+        file = tool_parameters.get("audio_file")
+        if file.type != FileType.AUDIO:
+            return [self.create_text_message("not a valid audio file")]
+        audio_binary = io.BytesIO(download(file))
+        audio_binary.name = "temp.mp3"
+        provider, model = tool_parameters.get("model").split("#")
+        model_manager = ModelManager()
+        model_instance = model_manager.get_model_instance(
+            tenant_id=self.runtime.tenant_id,
+            provider=provider,
+            model_type=ModelType.SPEECH2TEXT,
+            model=model,
+        )
+        text = model_instance.invoke_speech2text(
+            file=audio_binary,
+            user=user_id,
+        )
+        return [self.create_text_message(text)]
+
+    def get_available_models(self) -> list[tuple[str, str]]:
+        model_provider_service = ModelProviderService()
+        models = model_provider_service.get_models_by_model_type(
+            tenant_id=self.runtime.tenant_id, model_type="speech2text"
+        )
+        items = []
+        for provider_model in models:
+            provider = provider_model.provider
+            for model in provider_model.models:
+                items.append((provider, model.model))
+        return items
+
+    def get_runtime_parameters(self) -> list[ToolParameter]:
+        parameters = []
+
+        options = []
+        for provider, model in self.get_available_models():
+            option = ToolParameterOption(value=f"{provider}#{model}", label=I18nObject(en_US=f"{model}({provider})"))
+            options.append(option)
+
+        parameters.append(
+            ToolParameter(
+                name="model",
+                label=I18nObject(en_US="Model", zh_Hans="Model"),
+                human_description=I18nObject(
+                    en_US="All available ASR models",
+                    zh_Hans="所有可用的 ASR 模型",
+                ),
+                type=ToolParameter.ToolParameterType.SELECT,
+                form=ToolParameter.ToolParameterForm.FORM,
+                required=True,
+                default=options[0].value,
+                options=options,
+            )
+        )
+        return parameters
--- a/api/core/tools/provider/builtin/audio/tools/asr.yaml
+++ b/api/core/tools/provider/builtin/audio/tools/asr.yaml
@ -0,0 +1,22 @@
+identity:
+  name: asr
+  author: hjlarry
+  label:
+    en_US: Speech To Text
+description:
+  human:
+    en_US: Convert audio file to text.
+    zh_Hans: 将音频文件转换为文本。
+  llm: Convert audio file to text.
+parameters:
+  - name: audio_file
+    type: file
+    required: true
+    label:
+      en_US: Audio File
+      zh_Hans: 音频文件
+    human_description:
+      en_US: The audio file to be converted.
+      zh_Hans: 要转换的音频文件。
+    llm_description: The audio file to be converted.
+    form: llm
--- a/api/core/tools/provider/builtin/audio/tools/tts.py
+++ b/api/core/tools/provider/builtin/audio/tools/tts.py
@ -0,0 +1,90 @@
+import io
+from typing import Any
+
+from core.model_manager import ModelManager
+from core.model_runtime.entities.model_entities import ModelPropertyKey, ModelType
+from core.tools.entities.common_entities import I18nObject
+from core.tools.entities.tool_entities import ToolInvokeMessage, ToolParameter, ToolParameterOption
+from core.tools.tool.builtin_tool import BuiltinTool
+from services.model_provider_service import ModelProviderService
+
+
+class TTSTool(BuiltinTool):
+    def _invoke(self, user_id: str, tool_parameters: dict[str, Any]) -> list[ToolInvokeMessage]:
+        provider, model = tool_parameters.get("model").split("#")
+        voice = tool_parameters.get(f"voice#{provider}#{model}")
+        model_manager = ModelManager()
+        model_instance = model_manager.get_model_instance(
+            tenant_id=self.runtime.tenant_id,
+            provider=provider,
+            model_type=ModelType.TTS,
+            model=model,
+        )
+        tts = model_instance.invoke_tts(
+            content_text=tool_parameters.get("text"),
+            user=user_id,
+            tenant_id=self.runtime.tenant_id,
+            voice=voice,
+        )
+        buffer = io.BytesIO()
+        for chunk in tts:
+            buffer.write(chunk)
+
+        wav_bytes = buffer.getvalue()
+        return [
+            self.create_text_message("Audio generated successfully"),
+            self.create_blob_message(
+                blob=wav_bytes,
+                meta={"mime_type": "audio/x-wav"},
+                save_as=self.VariableKey.AUDIO,
+            ),
+        ]
+
+    def get_available_models(self) -> list[tuple[str, str, list[Any]]]:
+        model_provider_service = ModelProviderService()
+        models = model_provider_service.get_models_by_model_type(tenant_id=self.runtime.tenant_id, model_type="tts")
+        items = []
+        for provider_model in models:
+            provider = provider_model.provider
+            for model in provider_model.models:
+                voices = model.model_properties.get(ModelPropertyKey.VOICES, [])
+                items.append((provider, model.model, voices))
+        return items
+
+    def get_runtime_parameters(self) -> list[ToolParameter]:
+        parameters = []
+
+        options = []
+        for provider, model, voices in self.get_available_models():
+            option = ToolParameterOption(value=f"{provider}#{model}", label=I18nObject(en_US=f"{model}({provider})"))
+            options.append(option)
+            parameters.append(
+                ToolParameter(
+                    name=f"voice#{provider}#{model}",
+                    label=I18nObject(en_US=f"Voice of {model}({provider})"),
+                    type=ToolParameter.ToolParameterType.SELECT,
+                    form=ToolParameter.ToolParameterForm.FORM,
+                    options=[
+                        ToolParameterOption(value=voice.get("mode"), label=I18nObject(en_US=voice.get("name")))
+                        for voice in voices
+                    ],
+                )
+            )
+
+        parameters.insert(
+            0,
+            ToolParameter(
+                name="model",
+                label=I18nObject(en_US="Model", zh_Hans="Model"),
+                human_description=I18nObject(
+                    en_US="All available TTS models",
+                    zh_Hans="所有可用的 TTS 模型",
+                ),
+                type=ToolParameter.ToolParameterType.SELECT,
+                form=ToolParameter.ToolParameterForm.FORM,
+                required=True,
+                default=options[0].value,
+                options=options,
+            ),
+        )
+        return parameters
--- a/api/core/tools/provider/builtin/audio/tools/tts.yaml
+++ b/api/core/tools/provider/builtin/audio/tools/tts.yaml
@ -0,0 +1,22 @@
+identity:
+  name: tts
+  author: hjlarry
+  label:
+    en_US: Text To Speech
+description:
+  human:
+    en_US: Convert text to audio file.
+    zh_Hans: 将文本转换为音频文件。
+  llm: Convert text to audio file.
+parameters:
+  - name: text
+    type: string
+    required: true
+    label:
+      en_US: Text
+      zh_Hans: 文本
+    human_description:
+      en_US: The text to be converted.
+      zh_Hans: 要转换的文本。
+    llm_description: The text to be converted.
+    form: llm
--- a/api/core/tools/provider/builtin/podcast_generator/podcast_generator.py
+++ b/api/core/tools/provider/builtin/podcast_generator/podcast_generator.py
@ -1,6 +1,7 @@
 from typing import Any

 import openai
+from yarl import URL

 from core.tools.errors import ToolProviderCredentialValidationError
 from core.tools.provider.builtin_tool_provider import BuiltinToolProviderController
@ -10,6 +11,7 @@ class PodcastGeneratorProvider(BuiltinToolProviderController):
    def _validate_credentials(self, credentials: dict[str, Any]) -> None:
        tts_service = credentials.get("tts_service")
        api_key = credentials.get("api_key")
+        base_url = credentials.get("openai_base_url")

        if not tts_service:
            raise ToolProviderCredentialValidationError("TTS service is not specified")
@ -17,13 +19,16 @@ class PodcastGeneratorProvider(BuiltinToolProviderController):
        if not api_key:
            raise ToolProviderCredentialValidationError("API key is missing")

+        if base_url:
+            base_url = str(URL(base_url) / "v1")
+
        if tts_service == "openai":
-            self._validate_openai_credentials(api_key)
+            self._validate_openai_credentials(api_key, base_url)
        else:
            raise ToolProviderCredentialValidationError(f"Unsupported TTS service: {tts_service}")

-    def _validate_openai_credentials(self, api_key: str) -> None:
-        client = openai.OpenAI(api_key=api_key)
+    def _validate_openai_credentials(self, api_key: str, base_url: str | None) -> None:
+        client = openai.OpenAI(api_key=api_key, base_url=base_url)
        try:
            # We're using a simple API call to validate the credentials
            client.models.list()
--- a/api/core/workflow/nodes/document_extractor/node.py
+++ b/api/core/workflow/nodes/document_extractor/node.py
@ -143,14 +143,14 @@ def _extract_text_by_file_extension(*, file_content: bytes, file_extension: str)

 def _extract_text_from_plain_text(file_content: bytes) -> str:
    try:
-        return file_content.decode("utf-8")
+        return file_content.decode("utf-8", "ignore")
    except UnicodeDecodeError as e:
        raise TextExtractionError("Failed to decode plain text file") from e


 def _extract_text_from_json(file_content: bytes) -> str:
    try:
-        json_data = json.loads(file_content.decode("utf-8"))
+        json_data = json.loads(file_content.decode("utf-8", "ignore"))
        return json.dumps(json_data, indent=2, ensure_ascii=False)
    except (UnicodeDecodeError, json.JSONDecodeError) as e:
        raise TextExtractionError(f"Failed to decode or parse JSON file: {e}") from e
@ -159,7 +159,7 @@ def _extract_text_from_json(file_content: bytes) -> str:
 def _extract_text_from_yaml(file_content: bytes) -> str:
    """Extract the content from yaml file"""
    try:
-        yaml_data = yaml.safe_load_all(file_content.decode("utf-8"))
+        yaml_data = yaml.safe_load_all(file_content.decode("utf-8", "ignore"))
        return yaml.dump_all(yaml_data, allow_unicode=True, sort_keys=False)
    except (UnicodeDecodeError, yaml.YAMLError) as e:
        raise TextExtractionError(f"Failed to decode or parse YAML file: {e}") from e
@ -217,7 +217,7 @@ def _extract_text_from_file(file: File):

 def _extract_text_from_csv(file_content: bytes) -> str:
    try:
-        csv_file = io.StringIO(file_content.decode("utf-8"))
+        csv_file = io.StringIO(file_content.decode("utf-8", "ignore"))
        csv_reader = csv.reader(csv_file)
        rows = list(csv_reader)

--- a/api/services/dataset_service.py
+++ b/api/services/dataset_service.py
@ -1458,6 +1458,7 @@ class SegmentService:
            pre_segment_data_list = []
            segment_data_list = []
            keywords_list = []
+            position = max_position + 1 if max_position else 1
            for segment_item in segments:
                content = segment_item["content"]
                doc_id = str(uuid.uuid4())
@ -1475,7 +1476,7 @@ class SegmentService:
                    document_id=document.id,
                    index_node_id=doc_id,
                    index_node_hash=segment_hash,
-                    position=max_position + 1 if max_position else 1,
+                    position=position,
                    content=content,
                    word_count=len(content),
                    tokens=tokens,
@ -1490,6 +1491,7 @@ class SegmentService:
                increment_word_count += segment_document.word_count
                db.session.add(segment_document)
                segment_data_list.append(segment_document)
+                position += 1

                pre_segment_data_list.append(segment_document)
                if "keywords" in segment_item:
--- a/api/tests/unit_tests/core/workflow/nodes/test_document_extractor_node.py
+++ b/api/tests/unit_tests/core/workflow/nodes/test_document_extractor_node.py
@ -140,6 +140,17 @@ def test_extract_text_from_plain_text():
    assert text == "Hello, world!"


+def test_extract_text_from_plain_text_non_utf8():
+    import tempfile
+
+    non_utf8_content = b"Hello, world\xa9."  # \xA9 represents © in Latin-1
+    with tempfile.NamedTemporaryFile(delete=True) as temp_file:
+        temp_file.write(non_utf8_content)
+        temp_file.seek(0)
+        text = _extract_text_from_plain_text(temp_file.read())
+    assert text == "Hello, world."
+
+
@patch("pypdfium2.PdfDocument")
 def test_extract_text_from_pdf(mock_pdf_document):
    mock_page = Mock()
--- a/docker/.env.example
+++ b/docker/.env.example
@ -689,6 +689,9 @@ TEMPLATE_TRANSFORM_MAX_LENGTH=80000
 CODE_MAX_STRING_ARRAY_LENGTH=30
 CODE_MAX_OBJECT_ARRAY_LENGTH=30
 CODE_MAX_NUMBER_ARRAY_LENGTH=1000
+CODE_EXECUTION_CONNECT_TIMEOUT=10
+CODE_EXECUTION_READ_TIMEOUT=60
+CODE_EXECUTION_WRITE_TIMEOUT=10

 # Workflow runtime configuration
 WORKFLOW_MAX_EXECUTION_STEPS=500
--- a/docker/docker-compose.yaml
+++ b/docker/docker-compose.yaml
@ -244,6 +244,9 @@ x-shared-env: &shared-api-worker-env
  RESET_PASSWORD_TOKEN_EXPIRY_MINUTES: ${RESET_PASSWORD_TOKEN_EXPIRY_MINUTES:-5}
  CODE_EXECUTION_ENDPOINT: ${CODE_EXECUTION_ENDPOINT:-http://sandbox:8194}
  CODE_EXECUTION_API_KEY: ${SANDBOX_API_KEY:-dify-sandbox}
+  CODE_EXECUTION_CONNECT_TIMEOUT: ${CODE_EXECUTION_CONNECT_TIMEOUT:-10}
+  CODE_EXECUTION_READ_TIMEOUT: ${CODE_EXECUTION_READ_TIMEOUT:-60}
+  CODE_EXECUTION_WRITE_TIMEOUT: ${CODE_EXECUTION_WRITE_TIMEOUT:-10}
  CODE_MAX_NUMBER: ${CODE_MAX_NUMBER:-9223372036854775807}
  CODE_MIN_NUMBER: ${CODE_MIN_NUMBER:--9223372036854775808}
  CODE_MAX_DEPTH: ${CODE_MAX_DEPTH:-5}
--- a/web/Dockerfile
+++ b/web/Dockerfile
@ -1,5 +1,5 @@
 # base image
-FROM node:20.11-alpine3.19 AS base
+FROM node:20-alpine3.20 AS base
 LABEL maintainer="takatost@gmail.com"

 # if you located in China, you can use aliyun mirror to speed up
--- a/web/app/components/workflow/note-node/index.tsx
+++ b/web/app/components/workflow/note-node/index.tsx
@ -81,7 +81,6 @@ const NoteNode = ({
            nodeData={data}
            icon={<Icon />}
            minWidth={240}
-            maxWidth={640}
            minHeight={88}
          />
          <div className='shrink-0 h-2 opacity-50 rounded-t-md' style={{ background: THEME_MAP[theme].title }}></div>
--- a/web/app/components/workflow/style.css
+++ b/web/app/components/workflow/style.css
@ -15,4 +15,8 @@
 #workflow-container .react-flow__selection {
  border: 1px solid #528BFF;
  background: rgba(21, 94, 239, 0.05);
+}
+
+#workflow-container .react-flow__node-custom-note {
+  z-index: -1000 !important;
 }
Author	SHA1	Message	Date
非法操作	4a5fc2bcea	Merge `97cfb65630` into `5ff02b469f`	2024-11-15 08:51:11 +08:00
jarvis2f	5ff02b469f	fix:position error when creating segments (#10706 ) Some checks are pending Build and Push API & Web / build (api, DIFY_API_IMAGE_NAME, linux/amd64, build-api-amd64) (push) Waiting to run Details Build and Push API & Web / build (api, DIFY_API_IMAGE_NAME, linux/arm64, build-api-arm64) (push) Waiting to run Details Build and Push API & Web / build (web, DIFY_WEB_IMAGE_NAME, linux/amd64, build-web-amd64) (push) Waiting to run Details Build and Push API & Web / build (web, DIFY_WEB_IMAGE_NAME, linux/arm64, build-web-arm64) (push) Waiting to run Details Build and Push API & Web / create-manifest (api, DIFY_API_IMAGE_NAME, merge-api-images) (push) Blocked by required conditions Details Build and Push API & Web / create-manifest (web, DIFY_WEB_IMAGE_NAME, merge-web-images) (push) Blocked by required conditions Details	2024-11-14 21:25:15 +08:00
Bowen Liang	44f57ad9a8	chore: Bump Alpine Linux to 3.20 in web dockerfile (#10671 )	2024-11-14 20:57:01 +08:00
yihong	94fd6f6901	fix: typo in test (#10707 ) Signed-off-by: yihong0618 <zouzou0208@gmail.com>	2024-11-14 20:54:13 +08:00
SiliconFlow, Inc	e61242a337	feat: add vlm models from siliconflow (#10704 )	2024-11-14 20:53:35 +08:00
yihong	722964667f	fix: non utf8 code decode close #10691 (#10698 ) Signed-off-by: yihong0618 <zouzou0208@gmail.com>	2024-11-14 17:29:49 +08:00
Xiao Ley	fbb9c1c249	fixed the Base URL usage issue in Podcast Generator tool verification (#10697 )	2024-11-14 17:24:42 +08:00
非法操作	15f341b655	feat: add the audio tool (#10695 )	2024-11-14 16:37:15 +08:00
crazywoola	b358490607	chore: update issue template (#10693 )	2024-11-14 16:12:27 +08:00
crazywoola	f9e4196fd5	Update pull_request_template.md (#10692 )	2024-11-14 15:56:37 +08:00
crazywoola	751525802d	feat: update pr template (#10690 )	2024-11-14 15:52:15 +08:00
lz	2abacd2a2d	export configuration 'CODE_EXECUTION_TIMEOUT' to .env (#10688 ) Co-authored-by: liuzhu <liuzhu@fridaycloud.com.cn>	2024-11-14 15:34:34 +08:00
Nam Vu	a3155e0613	Update expat version (#10686 )	2024-11-14 15:30:55 +08:00
hejl	97cfb65630	enhance the custom note	2024-09-29 15:41:14 +08:00