视频合并

本方案的主要思路是调用 FFmpeg 的开发库，在 C 层实现“流复制（stream copy）”模式下的多个视频和音频文件的合并操作，并通过 JNI 供 Java 层调用。视频和音频合并过程中针对不同输入文件时长不一致的问题，采用了统一全局时间轴（global offset）的策略，确保合并后各个媒体流在播放时不会出现音视频不同步或音频播放速度异常的问题。

一、Java 接口定义

在 Java 层通过 JNI 声明 native 方法 merge，该方法接收多个输入文件路径和输出文件路径，调用底层 C 函数实现视频合并。代码如下：

/**
 * Merges multiple video/audio files into a single output file using stream copy.
 * This method calls a native C function that utilizes the FFmpeg command-line tool.
 * The input files should ideally have compatible stream parameters (codec, resolution, etc.)
 * for stream copy to work reliably and efficiently.
 *
 * @param inputPaths An array of absolute paths to the input media files.
 * @param outputPath The absolute path for the merged output media file.
 * @return true if the merging process initiated by FFmpeg completes successfully (exit code 0), false otherwise.
 * @throws NullPointerException if inputPaths or outputPath is null, or if inputPaths contains null elements.
 * @throws IllegalArgumentException if inputPaths contains fewer than 2 files.
 */
public static native boolean merge(String[] inputPaths, String outputPath);

二、Java 测试代码

下面的测试代码演示了如何读取指定目录下的所有 .mp4 文件，并调用 NativeMedia.merge 方法合并到指定输出文件。测试代码利用了 Java 的文件操作和 JSON 格式输出工具来展示文件集合。

package com.litongjava.linux.service;

import java.io.File;
import java.util.ArrayList;
import java.util.List;

import org.junit.Test;

import com.litongjava.media.NativeMedia;
import com.litongjava.tio.utils.json.JsonUtils;

import lombok.extern.slf4j.Slf4j;

@Slf4j
public class VideoMergeTest {

  @Test
  public void testSession() {

    String folderPath = "C:\\Users\\Administrator\\Downloads";
    File folderFile = new File(folderPath);
    File[] listFiles = folderFile.listFiles();

    // 使用 ArrayList 来存储符合条件的文件路径
    List<String> videoPaths = new ArrayList<>();
    if (listFiles != null) {
      for (File file : listFiles) {
        if (file != null && file.getName().endsWith(".mp4")) {
          videoPaths.add(file.getAbsolutePath());
        }
      }
    }

    // 输出 JSON 格式的结果，这里不会出现 null 元素
    System.out.println(JsonUtils.toJson(videoPaths));

    // 如果 NativeMedia.merge 方法需要数组，可以通过 toArray 方法转换
    NativeMedia.merge(videoPaths.toArray(new String[0]), "main.mp4");
  }
}

说明

输入文件要求
代码中通过遍历指定目录，获取所有扩展名为 .mp4 的文件并存储在 List 中，最终转换为字符串数组传递给 JNI 方法。
异常处理
文档注释中规定了当输入数组为空、含有 null 值或文件数不足时，会抛出相应的异常。因此在实际使用中，请保证输入文件数组不为空且至少包含两个有效文件。

三、C 端 JNI 实现

下面是 JNI 的 C 实现代码，文件名假设为 jni_merge.c。该代码完整实现了如下功能：

支持中文文件名
- 在 Windows 平台下利用 WideCharToMultiByte 进行编码转换。
- 在非 Windows 平台直接调用 GetStringUTFChars。
输出流上下文创建及模板构建
- 以第一个输入文件为模板，将视频和音频流复制到输出上下文，采用 “copy” 模式（直接拷贝数据，不重新编码）。
全局时间轴统一处理
- 对每个输入文件计算音视频的最大时长（转换到统一的 AV_TIME_BASE 单位）。
- 采用全局时间偏移量 global_offset，在对每个包进行时间戳转换时保证时间连续性，解决多个文件间因转场引起的音视频不同步问题。
逐包读取和写入输出文件
- 通过 av_read_frame 逐包读取各输入文件，并根据流类型进行时间戳和包数据调整后写入输出文件。

完整代码如下：

#include "com_litongjava_media_NativeMedia.h"
#include <jni.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <libavformat/avformat.h>
#include <libavcodec/avcodec.h>
#include <libavutil/timestamp.h>
#include <libavutil/error.h>

#ifdef _WIN32

#include <windows.h>
#include <stringapiset.h>

/**
 * 将 Java 的 jstring 转换为 UTF-8 编码的 C 字符串（支持中文路径）
 */
char *jstringToChar(JNIEnv *env, jstring jStr) {
  const jchar *raw = (*env)->GetStringChars(env, jStr, NULL);
  jsize len = (*env)->GetStringLength(env, jStr);
  int size_needed = WideCharToMultiByte(CP_UTF8, 0, (LPCWCH) raw, len, NULL, 0, NULL, NULL);
  char *strTo = (char *) malloc(size_needed + 1);
  WideCharToMultiByte(CP_UTF8, 0, (LPCWCH) raw, len, strTo, size_needed, NULL, NULL);
  strTo[size_needed] = '\0';
  (*env)->ReleaseStringChars(env, jStr, raw);
  return strTo;
}

#else
/**
 * 非 Windows 平台直接返回 UTF-8 编码的字符串
 */
char* jstringToChar(JNIEnv* env, jstring jStr) {
    const char* chars = (*env)->GetStringUTFChars(env, jStr, NULL);
    char* copy = strdup(chars);
    (*env)->ReleaseStringUTFChars(env, jStr, chars);
    return copy;
}
#endif

/*
 * 说明：
 * 1. 根据输出文件名创建输出格式上下文，并以第一个输入文件为模板建立输出流（仅复制视频和音频流参数）。
 * 2. 为每个输入文件计算其总体时长（按 AV_TIME_BASE 统一计算，取视频和音频最大值），并使用统一的 global_offset 作为所有流包的时间补偿。
 * 3. 每个包在转换时间戳时先使用 av_rescale_q 将其 pts/dts 从输入流的 time_base 转换到输出流 time_base，再加上统一偏移量。
 *
 * 这样处理后，各输入文件无论音视频各自时长是否一致，都按同一全局时间轴排列，解决了音频播放速度快于视频的问题。
 */
JNIEXPORT jboolean JNICALL Java_com_litongjava_media_NativeMedia_merge
  (JNIEnv *env, jclass clazz, jobjectArray jInputPaths, jstring jOutputPath) {
  int ret = 0;
  int nb_inputs = (*env)->GetArrayLength(env, jInputPaths);
  if (nb_inputs < 1) {
    return JNI_FALSE;
  }
  // 输出文件名（支持中文）
  char *output_filename = jstringToChar(env, jOutputPath);

  // 创建输出格式上下文
  AVFormatContext *ofmt_ctx = NULL;
  ret = avformat_alloc_output_context2(&ofmt_ctx, NULL, NULL, output_filename);
  if (ret < 0 || !ofmt_ctx) {
    free(output_filename);
    return JNI_FALSE;
  }
  AVOutputFormat *ofmt = ofmt_ctx->oformat;

  // 采用第一个输入文件作为模板构造输出流
  char *first_input = jstringToChar(env, (*env)->GetObjectArrayElement(env, jInputPaths, 0));
  AVFormatContext *ifmt_ctx1 = NULL;
  if ((ret = avformat_open_input(&ifmt_ctx1, first_input, NULL, NULL)) < 0) {
    free(first_input);
    free(output_filename);
    return JNI_FALSE;
  }
  if ((ret = avformat_find_stream_info(ifmt_ctx1, NULL)) < 0) {
    avformat_close_input(&ifmt_ctx1);
    free(first_input);
    free(output_filename);
    return JNI_FALSE;
  }
  free(first_input);

  // 建立输出流（仅复制视频、音频流）
  int video_out_index = -1, audio_out_index = -1;
  for (unsigned int i = 0; i < ifmt_ctx1->nb_streams; i++) {
    AVStream *in_stream = ifmt_ctx1->streams[i];
    if (in_stream->codecpar->codec_type == AVMEDIA_TYPE_VIDEO ||
        in_stream->codecpar->codec_type == AVMEDIA_TYPE_AUDIO) {

      AVStream *out_stream = avformat_new_stream(ofmt_ctx, NULL);
      if (!out_stream) {
        avformat_close_input(&ifmt_ctx1);
        free(output_filename);
        return JNI_FALSE;
      }
      ret = avcodec_parameters_copy(out_stream->codecpar, in_stream->codecpar);
      if (ret < 0) {
        avformat_close_input(&ifmt_ctx1);
        free(output_filename);
        return JNI_FALSE;
      }
      out_stream->codecpar->codec_tag = 0;
      if (in_stream->codecpar->codec_type == AVMEDIA_TYPE_VIDEO)
        video_out_index = out_stream->index;
      else if (in_stream->codecpar->codec_type == AVMEDIA_TYPE_AUDIO)
        audio_out_index = out_stream->index;
    }
  }
  avformat_close_input(&ifmt_ctx1);

  // 打开输出文件
  if (!(ofmt->flags & AVFMT_NOFILE)) {
    if ((ret = avio_open(&ofmt_ctx->pb, output_filename, AVIO_FLAG_WRITE)) < 0) {
      avformat_free_context(ofmt_ctx);
      free(output_filename);
      return JNI_FALSE;
    }
  }

  // 写入输出文件头
  ret = avformat_write_header(ofmt_ctx, NULL);
  if (ret < 0) {
    if (!(ofmt->flags & AVFMT_NOFILE))
      avio_close(ofmt_ctx->pb);
    avformat_free_context(ofmt_ctx);
    free(output_filename);
    return JNI_FALSE;
  }

  // 定义统一全局时间偏移量（单位：AV_TIME_BASE，AV_TIME_BASE_Q= {1,AV_TIME_BASE}）
  int64_t global_offset = 0;

  // 遍历每个输入文件
  for (int i = 0; i < nb_inputs; i++) {
    char *input_filename = jstringToChar(env, (*env)->GetObjectArrayElement(env, jInputPaths, i));
    AVFormatContext *ifmt_ctx = NULL;
    if ((ret = avformat_open_input(&ifmt_ctx, input_filename, NULL, NULL)) < 0) {
      free(input_filename);
      continue;  // 无法打开的文件跳过
    }
    if ((ret = avformat_find_stream_info(ifmt_ctx, NULL)) < 0) {
      avformat_close_input(&ifmt_ctx);
      free(input_filename);
      continue;
    }

    // 计算该输入文件中音视频的最大时长（统一转换到 AV_TIME_BASE 单位）
    int64_t file_duration = 0;
    for (unsigned int j = 0; j < ifmt_ctx->nb_streams; j++) {
      AVStream *in_stream = ifmt_ctx->streams[j];
      if (in_stream->codecpar->codec_type == AVMEDIA_TYPE_VIDEO ||
          in_stream->codecpar->codec_type == AVMEDIA_TYPE_AUDIO) {
        if (in_stream->duration > 0) {
          int64_t dur = av_rescale_q(in_stream->duration, in_stream->time_base, AV_TIME_BASE_Q);
          if (dur > file_duration)
            file_duration = dur;
        }
      }
    }

    // 逐包读取处理
    AVPacket pkt;
    while (av_read_frame(ifmt_ctx, &pkt) >= 0) {
      AVStream *in_stream = ifmt_ctx->streams[pkt.stream_index];
      int out_index = -1;
      if (in_stream->codecpar->codec_type == AVMEDIA_TYPE_VIDEO && video_out_index >= 0) {
        out_index = video_out_index;
      } else if (in_stream->codecpar->codec_type == AVMEDIA_TYPE_AUDIO && audio_out_index >= 0) {
        out_index = audio_out_index;
      } else {
        av_packet_unref(&pkt);
        continue;
      }

      AVStream *out_stream = ofmt_ctx->streams[out_index];
      // 将统一偏移量转换到输出流的 time_base 单位
      int64_t offset = av_rescale_q(global_offset, AV_TIME_BASE_Q, out_stream->time_base);

      // 将 pts、dts 和 duration 从输入流 time_base 转换到输出流 time_base 后加上偏移量
      pkt.pts = av_rescale_q(pkt.pts, in_stream->time_base, out_stream->time_base) + offset;
      pkt.dts = av_rescale_q(pkt.dts, in_stream->time_base, out_stream->time_base) + offset;
      if (pkt.duration > 0)
        pkt.duration = av_rescale_q(pkt.duration, in_stream->time_base, out_stream->time_base);
      pkt.pos = -1;
      pkt.stream_index = out_index;

      ret = av_interleaved_write_frame(ofmt_ctx, &pkt);
      if (ret < 0) {
        av_packet_unref(&pkt);
        break;
      }
      av_packet_unref(&pkt);
    }

    avformat_close_input(&ifmt_ctx);
    free(input_filename);

    // 当前输入文件处理完后，将全局偏移量更新为之前的 global_offset + 本文件最大时长（确保所有流时间统一）
    global_offset += file_duration;
  }

  // 写入 trailer，并关闭上下文
  av_write_trailer(ofmt_ctx);
  if (!(ofmt->flags & AVFMT_NOFILE))
    avio_close(ofmt_ctx->pb);
  avformat_free_context(ofmt_ctx);
  free(output_filename);
  return JNI_TRUE;
}

关键点解释

中文路径支持
针对不同平台采用不同的方式将 Java 的 jstring 转换成 UTF-8 编码的 C 字符串，以确保中文路径能够正确处理。
- Windows 平台使用 WideCharToMultiByte。
- 非 Windows 平台直接使用 GetStringUTFChars。
输出格式上下文创建与模板构建
- 调用 avformat_alloc_output_context2 创建输出格式上下文。
- 以第一个输入文件为模板调用 avformat_open_input 和 avformat_find_stream_info，获取第一个输入文件的视频和音频流参数。
- 使用 avcodec_parameters_copy 将视频和音频流参数复制到输出流中，使输出文件能够直接采用“流复制”模式。
全局时间偏移量的处理
- 为了解决由于转场及各文件内音视频时长不一致而引起的音视频同步问题，在写入每个包之前均将其时间戳进行转换，并加上全局偏移量。
- 每个输入文件处理完毕后，将该文件最大时长（视频流与音频流中较大者）累加到全局偏移量 global_offset，确保后续包在全局时间轴上连续排列。
逐包读取与写入
- 采用 av_read_frame 循环读取每个输入文件中的所有包，根据包的类型选择对应的输出流。
- 对每个包：
  - 使用 av_rescale_q 将输入的 pts、dts、duration 从输入流的 time_base 转换到输出流的 time_base。
  - 加上转换后的全局时间偏移量，保证包在全局统一时间轴上连续排列。
  - 使用 av_interleaved_write_frame 写入输出文件，确保多路流的交错顺序正确。
错误处理与资源释放
- 各处有错误检查，并在错误出现时及时释放相关上下文和动态内存。
- 最后写入 trailer 后关闭输出文件与释放输出上下文。

四、小结

本文档详细介绍了通过 JNI 调用 FFmpeg 库实现视频和音频合并的完整过程。

Java 层提供简单的接口，负责传递文件路径参数。
C 层则利用 FFmpeg 的 API 逐步建立输出上下文、设置流参数、处理时间戳转换，并通过全局时间偏移量确保音视频同步，最终将所有包写入合并后的文件中。
该方案支持中文文件路径，并保证在转场动画和时长不一致的场景下音视频同步正确。

请根据实际需求进行扩展和完善，例如加入更详细的日志输出、错误码打印及异常处理等。希望这份文档对你的开发有所帮助！