Migrating a Long-Running Hangfire Job to Azure Batch for Scalability and Reliability

If you have an existing C# Web API integrated with MS CRM and rely on Hangfire for long-running jobs, running these jobs on Azure Batch can provide scalability, fault tolerance, and cost optimization. This comprehensive guide will walk you through the end-to-end process, from setting up Azure Batch to refactoring your API to leverage Azure Batch for job execution.

Why Azure Batch for Hangfire Jobs?

Long-running jobs (like data-heavy operations in MS CRM) can overwhelm on-premise resources or your application hosting environment. Azure Batch can help:

Parallel Execution: Break down jobs into multiple tasks and process them simultaneously.
Autoscaling: Dynamically scale resources based on demand.
Fault Tolerance: Recover from VM or task failures seamlessly.
Cost-Effectiveness: Use Spot VMs to reduce costs for non-critical workloads.

Architecture Overview

C# Web API: Orchestrates the workflow and calls Azure Batch.
Hangfire: Triggers and monitors the job but delegates the processing to Azure Batch.
Azure Batch: Executes the long-running tasks in parallel.
Azure Storage: Stores input, intermediate, and output data.
MS CRM: The final destination for processed data.

Step 1: Set Up Azure Batch

1.1 Create Azure Batch Account

Log in to the Azure Portal.
Search for Batch Accounts and click Create.
Provide the following details:
- Resource Group: Use an existing one or create a new one.
- Account Name: Provide a unique name.
- Region: Select a region close to your CRM resources.
- Storage Account: Link an Azure Storage account to the Batch account.

1.2 Create a Batch Pool

Navigate to the Pools section in your Batch account.
Click Add to create a pool:
- VM Size: Choose based on processing needs, e.g., Standard_D4s_v3.
- Node Count: Start with 2-5 nodes and scale up later.
- Image: Select a suitable image, e.g., Windows Server or Ubuntu.
- Autoscaling: Enable autoscaling to optimize costs.

Step 2: Refactor C# Web API

2.1 Current API Overview

Suppose your current API handles Hangfire jobs like this:

[HttpPost("process-data")]  
public async Task<IActionResult> ProcessDataAsync()  
{  
    // Fetch data from MS CRM  
    var data = await _crmService.GetDataAsync();  
  
    // Perform heavy data processing  
    var processedData = _dataProcessor.Process(data);  
  
    // Update data in MS CRM  
    await _crmService.UpdateDataAsync(processedData);  
  
    return Ok();  
}

2.2 Transform the API to Use Azure Batch

Update the API to offload processing to Azure Batch:

Install Azure Batch SDK:
Add the following NuGet packages:

Install-Package Microsoft.Azure.Batch  
Install-Package Azure.Storage.Blobs

Upload Input Data to Azure Blob Storage:
Add a helper method to upload data to Azure Blob Storage:

private async Task<string> UploadToBlobAsync(string containerName, string fileName, string content)  
{  
    var blobServiceClient = new BlobServiceClient("<connection-string>");  
    var blobContainerClient = blobServiceClient.GetBlobContainerClient(containerName);  
    await blobContainerClient.CreateIfNotExistsAsync();  

    var blobClient = blobContainerClient.GetBlobClient(fileName);  
    await blobClient.UploadAsync(new BinaryData(content), overwrite: true);  

    return blobClient.Uri.ToString();  
}

Submit Tasks to Azure Batch:
Refactor the processing logic to submit tasks to Azure Batch:

private async Task SubmitBatchJobAsync(string jobId, string inputBlobUrl, string outputContainerName)  
{  
    var batchClient = BatchClient.Open(new BatchSharedKeyCredentials("<batch-url>", "<account-name>", "<key>"));  

    // Create Job  
    var cloudJob = batchClient.JobOperations.CreateJob(jobId, new PoolInformation { PoolId = "MyPool" });  
    cloudJob.Commit();  

    // Create Task  
    var task = new CloudTask("ProcessDataTask", $"cmd /c MyProcessor.exe --input {inputBlobUrl} --output {outputContainerName}");  
    await batchClient.JobOperations.AddTaskAsync(jobId, task);  
}

Monitor Job Status:
Add logic to check task progress and update Hangfire:

private async Task<bool> IsJobCompleteAsync(string jobId)  
{  
    var tasks = await _batchClient.JobOperations.ListTasks(jobId).ToListAsync();  
    return tasks.All(t => t.State == TaskState.Completed);  
}

Step 3: Integrate with Hangfire

Modify your Hangfire job to call the refactored API:

public async Task RunLongJobAsync()  
{  
    // Prepare data  
    var data = await _crmService.GetDataAsync();  
    var inputBlobUrl = await UploadToBlobAsync("input-container", "input.json", data);  
  
    // Submit batch job  
    await SubmitBatchJobAsync("MyCRMJob", inputBlobUrl, "output-container");  
  
    // Poll for completion  
    while (!await IsJobCompleteAsync("MyCRMJob"))  
    {  
        await Task.Delay(TimeSpan.FromMinutes(5));  
    }  
  
    // Retrieve results and update CRM  
    var output = await DownloadFromBlobAsync("output-container", "output.json");  
    await _crmService.UpdateDataAsync(output);  
}

Step 4: Cost Considerations

Azure Batch Costs
- VM type and count determine compute costs. For example:
  - Standard_D4s_v3: ~$0.20/hour.
  - 5 nodes for 72 hours: 5 x 72 x 0.20 = ~$72.
Storage Costs
- Input/output blobs: ~$0.02/GB/month.
Networking Costs
- Data egress (if applicable): ~$0.087/GB.

Best Practices

Task Partitioning
- Split large data sets into smaller chunks for parallel execution.
Retry Logic
- Configure automatic retries for failed tasks in Azure Batch.
Spot VMs for Cost Optimization
- Use Spot VMs for non-critical workloads, saving up to 90%.
Security
- Use Azure Managed Identities to securely access resources.

Outcome

By migrating your long-running Hangfire job to Azure Batch, you achieve:

Faster execution through parallelism.
Fault-tolerant, scalable infrastructure.
Significant cost savings, especially with autoscaling and Spot VMs.

Start small, experiment with different configurations, and optimize as you scale. This approach modernizes your CRM job pipeline and sets the stage for handling larger, more complex workloads efficiently.

Search This Blog

My Reference Tech