Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
sourav-das
Explorer


Introduction


Let's challenge ourselves and create a server that can upload and download files in hundreds of GBs.

Coping with limited server resources while managing massive file uploads is a significant challenge faced by many. In this comprehensive guide, we delve into a project to handle the seemingly impossible: uploading a staggering 100 GB of files to a Document Management System (DMS) using a server equipped with just 512 MB of capacity. Join me as we explore innovative strategies, streamlined techniques, and ingenious optimizations that redefine the boundaries of file upload capabilities in the digital landscape.

For more context , please visit my previous blogs:

1.SAP Document Management Service Node Js Client

2.Best way to upload & download documents using SAP Document Management Service

Setting the Stage


To upload 100 GB of data we need to have this data in our local system, i.e. laptop or desktop.

Let's create a proper file generator that can generate a specific number of files in specific sizes.
import crypto from "crypto";
import fs from "fs";
import * as path from "path";

// total no of files
const totalFiles = 20;
// combined size of all file.
const totalSize = 1 * 1024 * 1024 * 1024; // 1 GB

// generate random string of specific length
function generateRandomString(length) {
const characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
let randomString = '';
// randomizing to eleminate server side caching
for (let i = 0; i < length; i++) {
const randomIndex = crypto.randomInt(characters.length);
randomString += characters.charAt(randomIndex);
}
return randomString;
}

// generate a file with the given name, with the fixed size.
async function writeInChunks(filename, fileSize) {
// writing in chunks to reduce memory usage
const chunkSize = Math.min(highWaterMark, fileSize);
for (let j = 0; j < fileSize; j += chunkSize) {
fs.appendFileSync(filename, generateRandomString(chunkSize));
console.log(`${filesize(j)} written in ${filename} `);
}
}

async function generateFiles() {
fs.mkdirSync("file-content");
// calculate each file size
const fileSize = Math.ceil(totalSize / totalFiles);
for (let i = 0; i < totalFiles; i++) {
const filename = path.join("file-content", `test-file-${i}.txt`);
// write to all files in chunks concurrently
writeInChunks(filename, fileSize);
}
}


Client to Upload the file in Chunks


Let's create a client script that will break the files in different chunks and use our optimized server to upload it to DMS. If the server is busy, it has a retry functionality.
const axios = require("axios");
const fs = require("fs");
const {join} = require("path");

const chunkSize = 8 * 1024 * 1024 // 8 MB

async function uploadFileInChunks(file, stream) {
let i = 0;
// loop through chuck by chunk, only load 1 chunk at a time to memory
for await (const chunk of stream) {
console.log('Chunk :', file.name, i, chunk.length);
// first we should create the file then append each of them.
const operation = i === 0 ? "create" : "append";
let config = {
method: 'post',
url: 'http://localhost:3000/upload-optimised/',
headers: {
'cs-filename': file.name,
'cs-operation': operation,
'Content-Type': file.type
},
data: chunk
};

let status = 0;
let response;

// loop with delay until the server accepts the file.
while (status !== 200) {
try {
response = await axios.request(config);
status = response.status;
} catch (e) {
status = e.status;
}
// wait for 3s then try again
await sleep(3000);
}

console.log(response.data);
i++;
}
}

async function uploadFiles() {
for (const name of fs.readdirSync("file-content")) {
const filePath = join("file-content", name);
const file = {name: name, type: "plain/txt"};
// create stream and set chuck size
const stream = fs.createReadStream(filePath, {highWaterMark: chunkSize});
// concurrently upload all files
uploadFileInChunks(file, stream);
}
}

Server (Memory Optimised)


It is a simple express node js server which streams the REQ to the DMS, thus making it extremely memory efficient, creating and appending the file depending on the request.

Also, the script can limit the no of api calls made to DMS parallelly.
// configurable paramter: allow only specific no of api calls to DMS
const MAX_DMS_API_CALLS_ALLOWED = 22;

let docStoreCalls = 0;
app.post('/upload-optimised', async (req, res) => {
console.log("new request ", docStoreCalls);
// at max it allows 22 parallel DMS upload Request
if(docStoreCalls < MAX_DMS_API_CALLS_ALLOWED) {
// increment as api call is going to be made
// this is thread safe as Node.js is single threaded
docStoreCalls++;
try {
const fileName = req.headers["cs-filename"];
const opType = req.headers["cs-operation"];
const mimeType = req.headers["content-type"];

let session = await sm.getOrCreateConnection(REPOSITORY_ID, "provider");
let response = {success: "false"};

if (opType === "create") {
response = await session.createDocumentFromStream("/temp", req, fileName);
}

if (opType === "append") {
const obj = await session.getObjectByPath("/temp/" + fileName);
const objId = obj.succinctProperties["cmis:objectId"];
response = await session.appendContentFromStream(objId, req);
}
res.json(response);
} catch (e) {
console.log()
} finally {
// this is thread safe as Node.js is single threaded
docStoreCalls--;
}
} else{
res.status(429).send("try again later");
}
});

Results


The file generator is ready, the uploading client & the server. Let's upload some documents.

Let's test with 1 GB of data first,
-> Generate 1 GB across 20 files, individual file Size is 50 Mb.
uploading all the file(s) parallely, in 8 Mb chunks.

Total Uplaod Size : 8 Mb * 20 = 160 Mb uploaded parallely.
Memory Usage : 400 Mb ( Node Js Server )

Total Time taken: 7 Mins (depends on DMS Server)
Upload Speed: 1 GB (20) in 7 mins ≈ 200 Kbps

Let's go up to 12 GB.
-> Generate 12 GB across 20 files, individual file Size is 600 Mb.
uploading all the file(s) parallely, in 8 Mb chunks.

Total Uplaod Size : 8 Mb * 20 = 160 Mb uploaded parallely
Memory Usage : 400 Mb ( Node Js Server )

Total Time taken: 44 Mins (depends on DMS Server)
Upload Speed: 12 GB (20) in 44 mins = 600 MB / 1800 seconds ≈ 200 Kbps

We can go up to 100 GB also,  but there is a catch as the level of parallel API calls is restricted, the server will not allow all 200 files to be uploaded concurrently. Consequently, some files will need to wait their turn before being uploaded due to this limitation.
-> Generate 100 GB across 200 files, individual file Size is 500 Mb.
uploading all the file(s) parallely, in 8 Mb chunks.
but the server only serve 22 upload request in parallel so.

Uplaod Size : 8 Mb * 22 = 175 Mb upload parallely
Memory Usage : ≈ 400 Mb ( Node Js Server )

Total Time taken: 2 hour (depends on DMS Server)

Metrics


After analyzing the results we can determine the following scalable metrics.

No of Parallel Upload:


The no of parallel uploads can be increased by

  • The MAX_DMS_API_CALLS_ALLOWED  variable.
    It can be used to increase the amount of concurrent calls, but each request is around 8 Mb, yet it seems that in the server it occupies 14 - 16 Mb, thus with a 512 Mb server we can serve, 32 request at max,


But if we increase the memory.

  • 512 Mb Server - 32 requests in parallel.

  • 2 Gb Server - 128 requests in parallel.

  • 4 Gb Server - 512 Requests in parallel.


we can

  • Horizontally scale the instance in BTP to increase the parallel upload OR

  • Vertically scale the memory


Memory Usage:


The Memory Size can be optimized by

  • Right now 8 Mb Chunks are used, supporting 256 parallel uploads.

  • We can reduce the chunk size to 4 Mb, giving us 512 parallel uploads.


Upload Time:


Upload Time can be reduce by the combination of increasing the memory & allowing concurrent connection.

Conclusion


Throughout our tests, ranging from 1 GB to an impressive 100 GB upload, the server exhibited consistent efficiency despite resource limitations. It's capability to manage diverse file sizes and simultaneous uploads while maintaining respectable speed highlighting its resilience.

 
Labels in this area