Diven's blog

问题描述

我们开发了一个 Google Slide 的插件，有个功能是从 Google Drive 中选择某个文件，插入到 PPT 的中。如果选择的文件过大，就会出现 exceeds the maximum file size 报错。根本原因是 Google 限制了 DriveApp 下载文件的大小 (如下图)，具体参考这里。

当前的代码：

function insertDriveImageById(id) {
  var fileItem = DriveApp.getFileById(id);

  var item = new Object();
  item.bytes = fileItem.getBlob().getBytes();
  item.type = fileItem.getMimeType();
  item.name = fileItem.getName();

  return item;
}

文件大小超限以后的报错：

Exception: File 1.STEAM活动集锦（2018）.mp4 exceeds the maximum file size.

解决方案

思路一

获取到目标文件的下载链接，然后直接使用 UrlFetchApp 下载。
测试代码：

function insertDriveImageById(id) {
  var fileItem = DriveApp.getFileById(id);
  var accessToken = ScriptApp.getOAuthToken();

  var params = {
    method: "get",
    headers: {Authorization: "Bearer " + accessToken},
  };

  var response = UrlFetchApp.fetch(fileItem.getDownloadUrl(), params);
  var rc = response.getResponseCode();

  if (rc == 200) {
    var fileBlob = response.getBlob()
    console.log(fileBlob.getContentType());
  }
}

目标下载连接是 https://drive.google.com/uc?id=1sC7_2aLYoN0nSGcBTNT4fQh7uWvmNcUG&export=download , 用 chrome 可以正常下载。

但是用这段测试代码下载到的 ContentType 是 text/html ，实际上是没有成功下载的。分析了一下有两个原因：

没有 drive 的访问权限。可以用浏览器其他账户访问上面的链接证实这个猜想。

因此，使用链接下载文件涉及身份验证流程；分析了一下网络请求，中间有几次重定向请求：

UrlFetchApp 也受限于单次 URL Fetch response size 最大 50MB 的限制，详细参考：Quotas for Google Services

因此，思路一 这种方案不可行。

思路二

~~切割目标文件，分块下载。~~
分块下载，合并 chunks
具体流程

~~把目标大文件切割成指定 chunk size 的小文件，存储在临时文件夹下~~
~~逐个下载小文件~~，Partial download, 然后合并成一个 blob 或者 byte 数组中

TODO

获取文件大小
Partial download
blob & bytes 合并

Code

function concatChunks(chunksArray) {
  // sum of individual chunk lengths
  let totalLength = chunksArray.reduce((acc, value) => acc + value.length, 0);

  if (!chunksArray.length) return null;

   let result = new Uint8Array(totalLength);

      // for each chunk - copy it over result
      // next chunk is copied right after the previous one
      let length = 0;
      for(let chunk of chunksArray) {
            result.set(chunk, length);
            length += chunk.length;
      }

      return result;
}

function partialDownload(params) {
  var accessToken = ScriptApp.getOAuthToken();
  var baseUrl = "https://www.googleapis.com/drive/v3/files/";

  var fileSize = params.fileSize;

  // partial download by chunk size (bytes).
  var downloadUrl = baseUrl + params.fileId + "?alt=media";

  // calculate the number of chunks
  var chunkNumber = fileSize / params.chunkSize;

  console.log("fileSize is " + fileSize + ", chunkSize is " + params.chunkSize + ", chunkNumber is " + chunkNumber)

  // get the ceil value of chunkNumber
  chunkNumber = Math.ceil(chunkNumber)

  // download first chunk
  var chunkStartIdx = 0;
  var chunkEndIdx = params.chunkSize;

  var downloadedChunks = [];

  for (let idx = 0; idx < chunkNumber; idx++) {
    var partialParams = {
        method: "get",
        headers: {
          Authorization: "Bearer " + accessToken,
          Range: "bytes=" + chunkStartIdx + "-" + chunkEndIdx,
        },
    };
    
    console.log("start to download chunk " + idx + " , start index is " + chunkStartIdx + " , end index is " + chunkEndIdx)

    var res = UrlFetchApp.fetch(downloadUrl, partialParams).getBlob();
    var chunkBytes = res.getBytes();

    console.log("successfully download chunk " + idx + ", ContentType is " + res.getContentType() + ", chunk size is " + chunkBytes.length);

    downloadedChunks.push(chunkBytes);

    console.log("concat downloaded chunks, total chunk number is " + downloadedChunks.length)

    // update chunk index
    chunkStartIdx = chunkEndIdx+1
    chunkEndIdx = chunkEndIdx + params.chunkSize

    if(chunkEndIdx > fileSize)
      chunkEndIdx = fileSize;
  }

  var fullBytes = concatChunks(downloadedChunks)

  console.log("successfully merge all chunks to one, full bytes is " + fullBytes.length)

  return fullBytes
}

function insertDriveImageById(id) {
  var fileItem = DriveApp.getFileById(id);

  var fileSize = fileItem.getSize()
  var mimeType = fileItem.getMimeType()
  var fileName = fileItem.getName()

  console.log(fileSize, mimeType, fileName);

  var item = new Object();
  item.type = mimeType;
  item.name = fileName;

  if (fileSize < 52528162 /* magic number for the maximum downoad size of UrlFetchApp*/) {
    // get blob directly
    item.bytes = fileItem.getBlob().getBytes();
  } else {
    // partial download
    var params = {
      fileId: id, // File ID of the large file.
      chunkSize: 10485760, // 10MB
      fileSize: fileSize
    };
    item.bytes = partialDownload(params);
  }

  console.log("insertDriveImageById, return bytes " + item.bytes.length)

  return item;
}

思路三

方案二已经解决了大文件下载的问题，但是实测下载 100 M 文件居然需要好几分钟！
有两个可能性：

下载速率受限
1. 先排除第一个可能性，我用 curl 命令行下载同一个文件，一分钟内就可以完成。
blob & bytes 合并时内存拷贝的损耗

如何避免 blob 合并和内存拷贝呢？因为UrlFetchApp 下载文件有大小限制，必须分片下载，能否直接用 js 的 fetch API 下载呢？
Slide 插件是运行在的沙盒环境，虽然也支持常规的 html,css 和 js，但是有一定的区别和限制：

根据我们以往的经验，我们通常是把业务逻辑的代码放在一个单独的 js 文件中，在 Slide 插件工程中这个文件后缀是 .gs
htlm 中不能直接调用 .gs 里面的 fuction ，需要通过 google.script.run 接口，它是 html 和 gs 的桥梁。官方文档对这个接口解释的比较清楚

google.script.run> is an asynchronous client-side JavaScript API available in HTML-service pages> that can call server-side Apps Script functions.

也就是说，Slide 插件项目中，js 代码存在的方式可以是传统的 <script> 标签形式，也可以单独放到 .gs 文件中。放到 .gs 文件的 js 方法必须是用 google.script.run 调用。那么何时用 <script> 形式，何时需要放到单独的 .gs 文件呢 ?

通过代码测试，最终结论是：Slide 相关 API 比如获取 Active Page，插入元素到制定 Page 等等，需要通过 google.sciprt.run 方式，其他的都可以用 <script> 标签。通过以上分析，我把下载 Drive 文件的接口替换成 fetch API 实现，这样规避了 UrlFetchApp 对下载文件大小的限制，不用处理分片和合并逻辑了。

经测试，通一个文件下载时间从原来的8分钟左右缩短到 30秒。

最终代码如下：

jsDownloadVideoFromGD(param) {
	let _this = this;

	var url = 'https://www.googleapis.com/drive/v3/files/' + param.id + '?alt=media'
	//'https://www.googleapis.com/drive/v3/files/fileid?alt=media'

	var token = 'Bearer ' + param.accessToken;
	fetch(url, {
		headers: {
				'Authorization': token
			}
	})
		.then(response => response.blob())
		.then(function (myBlob) {
				console.log("download " + item.name + " success...");
				return true;
		}).catch(function (error) {
				console.log(error.message);
				return false;
		});
}

如何在 Google Slide 插件中下载 Drive 大文件

问题描述

解决方案

思路一

思路二

思路三

References