# Wednesday, December 2, 2020

Recently, I have been working with MinIO – a container-based object storage service. I’ve recorded much of what I’ve learned. The links below will get you up to speed if you want to learn about this technology.

Blog Posts

Getting Started with MinIO Server

Creating and using a MinIO Gateway for Azure

Using the MinIO Java SDK

Managing MinIO with the Amazon S3 SDK

Videos

Creating a MinIO Server

Creating a MinIO Agent for Azure Blob Storage

Using the MinIO Java Client SDK

Accessing MinIO with the AWS S3 SDK

Wednesday, December 2, 2020 9:15:00 AM (GMT Standard Time, UTC+00:00)
# Tuesday, December 1, 2020

Intro

In my last article, I showed how to manage buckets and objects in MinIO using the MinIO Java SDK.

However, MinIO has the advantage that one can also access it using the Amazon S3 Java API. This is helpful if you are migrating from S3 (a comparable object store hosted by Amazon Web Services) to MinIO.

The code below assumes that the following values are declared and initialized appropriately:

private String endPoint;  // The MinIO endpoint (e.g., "http://127.0.0.1:9000") 
private String accessKey; // The MinIO Access Key 
private String secretKey; // The MinIO Secret Key 
 private String bucketName; // A MinIO bucket in which to store objects (e.g., "mybucket") 
 private String localFileFolder; // A local folder on your file system to upload/download files to/from MinIO (e.g., "c:\files\")
  

In order to use the S3 SDK, your app must have a reference to it. In a Maven project, this is done by adding the following to the <dependencies> section of the project's POM.XML:

<dependency>
    <groupId>com.amazonaws</groupId>
    <artifactId>aws-java-sdk-s3</artifactId>
    <version>1.11.858</version>
</dependency>
  

AmazonS3 Object

In your code, the first thing you will need is an AmazonS3 object, which has methods for managing your MinIO objects.

Here is the code for creating this object.

public static AmazonS3 getAmazonS3Client(String accessKey, String secretKey, String endPoint) { 
    ClientConfiguration clientConfig = new ClientConfiguration(); 
    clientConfig.setProtocol(Protocol.HTTP); 
    AWSCredentials credentials = new BasicAWSCredentials(accessKey, secretKey); 
    AmazonS3 s3client = AmazonS3ClientBuilder 
            .standard() 
            .withEndpointConfiguration(new AwsClientBuilder.EndpointConfiguration(endPoint, Regions.US_EAST_1.name())) 
            .withPathStyleAccessEnabled(true) 
            .withClientConfiguration(clientConfig) 
             .withCredentials(new AWSStaticCredentialsProvider(credentials)) 
            .build();

    return s3client; 
}
  

Once you have an AmazonS3 object, you can use it to manage MinIO objects.

Uploading a File

For example, here is code to upload a file to a MinIO bucket:

public void UploadWithS3Client(String fileName) throws IOException { 
     AmazonS3 s3Client = getAmazonS3Client(accessKey, secretKey, endPoint); 
    String fileToUpload = localFileFolder + fileName; 
     try { 
        File file = new File(fileToUpload); 
        PutObjectRequest putObjectRequest = new PutObjectRequest(bucketName, fileName, file); 
        s3Client.putObject(putObjectRequest); 
    } catch (AmazonServiceException ase) { 
        System.out.println("Error Message:    " + ase.getMessage());

    } catch (AmazonClientException ace) { 
        System.out.println("Error Message: " + ace.getMessage()); 
    } 
}
  

List Objects

and code to get a list all the objects in a bucket

public List<String> ListS3Objects() { 
     List<String> blobList = new ArrayList<String>(); 
    System.out.format("Objects in S3 bucket %s:\n", bucketName); 
    AmazonS3 s3Client = getAmazonS3Client(accessKey, secretKey, endPoint); 
     ListObjectsV2Result result = s3Client.listObjectsV2(bucketName); 
    List<S3ObjectSummary> blobs = result.getObjectSummaries(); 
    for (S3ObjectSummary blob : blobs) { 
        blobList.add(blob.getKey()); 
        System.out.println("* " + blob.getKey()); 
    } 
    return blobList; 
 }
  

Download a File

and to download one of those objects to your local file system:

public void DownloadFromMinIOWithS3Client(String objectName) { 
    System.out.format("Downloading %s from S3 bucket %s...\n", objectName, bucketName); 
    AmazonS3 s3Client = getAmazonS3Client(accessKey, secretKey, endPoint); 
    try { 
        S3Object o = s3Client.getObject(bucketName, objectName); 
        S3ObjectInputStream s3is = o.getObjectContent(); 
        String downloadedFile = localFileFolder + "D_" + objectName; 
        FileOutputStream fos = new FileOutputStream(new File(downloadedFile)); 
        byte[] read_buf = new byte[1024]; 
        int read_len = 0; 
        while ((read_len = s3is.read(read_buf)) > 0) { 
            fos.write(read_buf, 0, read_len); 
        } 
        s3is.close(); 
        fos.close(); 
    } catch (AmazonServiceException e) { 
        System.err.println(e.getErrorMessage()); 
        System.exit(1); 
    } catch (FileNotFoundException e) { 
        System.err.println(e.getMessage()); 
         System.exit(1); 
    } catch (IOException e) { 
        System.err.println(e.getMessage()); 
        System.exit(1); 
        } 
    }
  

As you can see, once you have a reference to the object, the rest is just Java IO code.

Print File Contents

Finally, here is code to print the contents of a text object stored in MinIO. Again, it is simple Java IO once you have a reference to the object.

    public void PrintObjectContents(String objectName) throws IOException {
        AmazonS3 s3Client = getAmazonS3Client(accessKey, secretKey, endPoint);
        GetObjectRequest rangeObjectRequest = new GetObjectRequest(bucketName, objectName);
        S3Object objectPortion = s3Client.getObject(rangeObjectRequest);
        System.out.println("Printing bytes retrieved:");
        displayTextInputStream(objectPortion.getObjectContent());
    }

    private static void displayTextInputStream(InputStream input) throws IOException {
        BufferedReader reader = new BufferedReader(new InputStreamReader(input));
        while (true) {
            String line = reader.readLine();
            if (line == null)
                break;

            System.out.println("    " + line);
        }
        System.out.println();
    }
  

Conclusion

Here is the full code that you can find at https://github.com/DavidGiard/MinIO_Java_Demo:

package com.gcast.gcastminio.services;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.List;

import com.amazonaws.AmazonClientException;
import com.amazonaws.AmazonServiceException;
import com.amazonaws.ClientConfiguration;
import com.amazonaws.Protocol;
import com.amazonaws.auth.AWSCredentials;
import com.amazonaws.auth.AWSStaticCredentialsProvider;
import com.amazonaws.auth.BasicAWSCredentials;
import com.amazonaws.client.builder.AwsClientBuilder;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
import com.amazonaws.services.s3.model.GetObjectRequest;
import com.amazonaws.services.s3.model.ListObjectsV2Result;
import com.amazonaws.services.s3.model.PutObjectRequest;
import com.amazonaws.services.s3.model.S3Object;
import com.amazonaws.services.s3.model.S3ObjectInputStream;
import com.amazonaws.services.s3.model.S3ObjectSummary;

import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Service;

@Service
public class S3Service {

    // The following are set in application.properties
    @Value("${minio.endPoint}")
    private String endPoint;
    @Value("${minio.accessKey}")
    private String accessKey;
    @Value("${minio.secretKey}")
    private String secretKey;
    @Value("${minio.bucketName}")
    private String bucketName;
    @Value("${localFileFolder}")
    private String localFileFolder;

    public void UploadWithS3Client(String fileName) throws IOException {
        AmazonS3 s3Client = getAmazonS3Client(accessKey, secretKey, endPoint);
        String fileToUpload = localFileFolder + fileName;
        try {
            File file = new File(fileToUpload);

            PutObjectRequest putObjectRequest = new PutObjectRequest(bucketName, fileName, file);
            s3Client.putObject(putObjectRequest);
        } catch (AmazonServiceException ase) {
            System.out.println("Error Message:    " + ase.getMessage());

        } catch (AmazonClientException ace) {
            System.out.println("Error Message: " + ace.getMessage());
        }
    }

    public List<String> ListS3Objects() {
        List<String> blobList = new ArrayList<String>();
        System.out.format("Objects in S3 bucket %s:\n", bucketName);
        AmazonS3 s3Client = getAmazonS3Client(accessKey, secretKey, endPoint);
        ListObjectsV2Result result = s3Client.listObjectsV2(bucketName);
        List<S3ObjectSummary> blobs = result.getObjectSummaries();
        for (S3ObjectSummary blob : blobs) {
            blobList.add(blob.getKey());
            System.out.println("* " + blob.getKey());
        }
        return blobList;
    }

    public void PrintObjectContents(String objectName) throws IOException {
        AmazonS3 s3Client = getAmazonS3Client(accessKey, secretKey, endPoint);
        GetObjectRequest rangeObjectRequest = new GetObjectRequest(bucketName, objectName);
        S3Object objectPortion = s3Client.getObject(rangeObjectRequest);
        System.out.println("Printing bytes retrieved:");
        displayTextInputStream(objectPortion.getObjectContent());
    }

    private static void displayTextInputStream(InputStream input) throws IOException {
        BufferedReader reader = new BufferedReader(new InputStreamReader(input));
        while (true) {
            String line = reader.readLine();
            if (line == null)
                break;

            System.out.println("    " + line);
        }
        System.out.println();
    }

	public void DownloadFromMinIOWithS3Client(String objectName) {
		System.out.format("Downloading %s from S3 bucket %s...\n", objectName, bucketName);
		AmazonS3 s3Client = getAmazonS3Client(accessKey, secretKey, endPoint);
		try {
			S3Object o = s3Client.getObject(bucketName, objectName);
			S3ObjectInputStream s3is = o.getObjectContent();
			String downloadedFile = localFileFolder + "D_" + objectName;
			FileOutputStream fos = new FileOutputStream(new File(downloadedFile));
			byte[] read_buf = new byte[1024];
			int read_len = 0;
			while ((read_len = s3is.read(read_buf)) > 0) {
				fos.write(read_buf, 0, read_len);
			}
			s3is.close();
			fos.close();
		} catch (AmazonServiceException e) {
			System.err.println(e.getErrorMessage());
			System.exit(1);
		} catch (FileNotFoundException e) {
			System.err.println(e.getMessage());
			System.exit(1);
		} catch (IOException e) {
			System.err.println(e.getMessage());
			System.exit(1);
			}
		}


    public static AmazonS3 getAmazonS3Client(String accessKey, String secretKey, String endPoint) {
        ClientConfiguration clientConfig = new ClientConfiguration();
        clientConfig.setProtocol(Protocol.HTTP);
        AWSCredentials credentials = new BasicAWSCredentials(accessKey, secretKey);
        AmazonS3 s3client = AmazonS3ClientBuilder
                .standard()
                .withEndpointConfiguration(new AwsClientBuilder.EndpointConfiguration(endPoint, Regions.US_EAST_1.name()))
                .withPathStyleAccessEnabled(true)
                .withClientConfiguration(clientConfig)
                .withCredentials(new AWSStaticCredentialsProvider(credentials))
                .build();

        return s3client;
	}
}
  

In this article, you learned how to use the Amazon S3 Java SDK to manage objects in MinIO.

Tuesday, December 1, 2020 7:23:00 AM (GMT Standard Time, UTC+00:00)
# Tuesday, November 24, 2020

What is MinIO?


MinIO is an object storage system, similar to Amazon S3 or Azure Blob storage. It is built on top of Docker containers, which makes it easy to scale.

In a previous article, I showed you how to create and use a MinIO Server.

In this article, I will show how to create and use a MinIO Gateway for Azure Blob Storage.

MinIO Gateway

A MinIO Server stores files and objects. By contrast, a MinIO points to some other storage repository where the files are stored. However, it allows you to interact with those files as if they were stored in MinIO.

Prerequisites

Before you begin, you will need to install Docker Desktop, which you can download for either Windows or Mac.

You will also need an Azure Storage Account. This article explains how to create an Azure Storage Account.

Azure Blob Storage

You will need two pieces of information from your Azure Storage Account: the name of the storage account and the access key.

In the Azure Portal (https://portal.azure.com), you can find the storage account name at the top of the Resource page, as shown in Fig. 1.

mga01-StorageAccountName
Fig. 1

You can find the key on the "Access Keys" blade, as shown in Fig. 2.

mga02-StorageAccountKeys
Fig. 2

Note that there are two keys. Either one will work. Click the [Show Keys] button to view the keys and allow copying to your clipboard.

Creating a MinIO Gateway

A MinIO Gateway for Azure is created with the following command:

docker run -p 9000:9000 --name azure-s3 -e "MINIO_ACCESS_KEY=azurestorageaccountname" -e "MINIO_SECRET_KEY=azurestorageaccountkey" minio/minio gateway azure

where

azurestorageaccountname is the name of the Azure storage account and azurestorageaccountkey is an account key from that same storage account.

You can now log into the MinIO Gateway by opening a browser and navigating to http://127.0.0.1:9000/.

When prompted for your login credentials (Fig. 3), enter the storage account name in the "Access key" field and enter the storage account key in the "Secret Key" field.

mga03-Login
Fig. 3

After a successful login, the MinIO Gateway user interface will display, as shown in Fig. 4.

mga04-MinIO

Fig. 4

Note that this looks exactly like the MinIO Server user interface, described in this article.

In fact, you can create buckets and manage files in a MinIO Gateway exactly as you would in a MinIO server. The only difference is that the objects you manipulate are stored in the corresponding Azure Blob storage, rather than in MinIO. Each bucket is mapped to a Blob Storage container and each file is mapped to a blob.

Conclusion

In this article, you learned how to create a MinIO Gateway for Azure.

Tuesday, November 24, 2020 9:31:00 AM (GMT Standard Time, UTC+00:00)
# Monday, November 23, 2020

Episode 636

Omkar Naik on Microsoft Cloud for Health Care

Microsoft Cloud Solution Architect Omkar Naik describes what Microsoft is doing for health care solutions with Azure, Dynamics, Office 365, and other tools and services.

Links:
http://aka.ms/smarterhealth
http://aka.ms/microsoftcloudforhealthcare
http://aka.ms/azure

Monday, November 23, 2020 9:15:00 AM (GMT Standard Time, UTC+00:00)
# Thursday, November 19, 2020

What is MinIO?

MinIO is an object storage system, similar to Amazon S3 or Azure Blob storage. It is built on top of Docker containers, which makes it easy to scale.

Because MinIO runs in a Docker container, it requires the installation of Docker.

You can either install the Docker engine here or install Docker Desktop at one of the following links:

Starting a MinIO Server

Once Docker is installed, use the following command to start a MinIO server:

docker run -p 9000:9000 -e "MINIO_ACCESS_KEY=myAccessKey" -e "MINIO_SECRET_KEY=mySecretKey" minio/minio server /data

You can replace myAccessKey and mySecretKey with just about any string you like. These will be used to log into the MinIO server. Write down these values and keep them in a safe place! You will need them in order to access your server.

After you run the above command, you can access the server's UI by opening a web browser and navigating to

http://127.0.0.1:9000/

(NOTE: Of course, you may choose to run your server on a different port than 9000. If so, modify the “Docker” command above.)

You will be prompted to log in, as shown in Fig. 1

mio01Login
Fig. 1

Enter the access key and secret key you selected in the Docker command above.

After successfully logging in, you will see the MinIO user interface, as shown in Fig. 2.

mio02-MinIOServer
Fig. 2

MinIO organizes objects into buckets, which are analogous to folders in a file system or containers in Azure blob storage. To create a new bucket, click the [+] icon (Fig. 3) in the lower right of the screen.

mio03-PlusButton
Fig. 3

A popup menu will display, as shown in Fig. 4.

mio04-Menu
Fig. 4

Click the [Create Bucket] icon (Fig. 5) to display the "New Bucket" dialog, as shown in Fig. 6.

mio05-CreateBucketIcon
Fig. 5

mio06-BucketName
Fig. 6

In the "New Bucket" dialog, enter a name for your bucket, as shown in Fig. 7. This name must be unique within this MinIO server, must be at least 3 characters long and may consist only of numbers, periods, hyphens, and lower-case letters. 

mio07-BucketName
Fig. 7

Press ENTER to create the bucket. The Bucket will now be listed along the left side, as shown in Fig. 8.

mio08-Bucket
Fig. 8

You can add files to this bucket by again clicking the lower-left [+] icon to display the popup menu shown in Fig. 9.

mio09-Menu
Fig. 9

Click the "Upload File" icon (Fig. 10) to open a File Selection dialog, as shown in Fig. 11.

mio10-UploadFile
Fig. 10

mio11-SelectFile
Fig. 11

Navigate to and select a file on your local drive and click the [Open] button. The file will be listed within the bucket, as shown in Fig. 12.

mio12-ListFiles
Fig. 12

You can click the […] at the right of the file listing row to expand a menu with options to share, preview, download, or delete the file from MinIO, as shown in Fig. 13

mio13-Menu
Fig. 13

In this article, I introduced MinIO Server and showed the basics of getting started and using this object storage tool.

Thursday, November 19, 2020 9:02:00 AM (GMT Standard Time, UTC+00:00)
# Monday, November 16, 2020

Episode 635

Rik Hepworth on Azure Governance

Many of the issues around cloud computing have nothing to do with writing code. Asking questions early about expected costs, geographic issues, and technologies to choose can save headaches later.

Rik Hepworth describes this governance - the rules by which we operate the cloud - and how we can better prepare to develop for the cloud.

Links:

http://aka.ms/governancedocs
http://aka.ms/GovernanceDocs
https://github.com/Azure/azure-policy

Monday, November 16, 2020 10:18:00 AM (GMT Standard Time, UTC+00:00)
# Monday, October 26, 2020

Episode 632

Magnus Martensson on the Cloud Adoption Framework

Magnus Martensson on the Cloud Adoption Framework Magnus Martensson describes the Cloud Adoption Framework - a collective set of guidance from Microsoft - and how you can use it to migrate or create applications in the cloud.

https://docs.microsoft.com/en-us/azure/cloud-adoption-framework

Monday, October 26, 2020 8:13:00 AM (GMT Standard Time, UTC+00:00)
# Thursday, September 24, 2020

GCast 95:

Creating a MinIO Agent for Azure Blob Storage

Learn how to use MinIO to manage blobs in an Azure Storage Account

Thursday, September 24, 2020 12:25:40 PM (GMT Daylight Time, UTC+01:00)
# Thursday, September 17, 2020

GCast 94:

Creating a MinIO Server

Learn how to create a MinIO server, organize into buckets; then, read and write files to the server.

Thursday, September 17, 2020 9:21:00 AM (GMT Daylight Time, UTC+01:00)
# Monday, September 7, 2020

Episode 625

Peter de Tender on Azure Certification

Azure trainer Peter de Tender talks about what it takes to acheive Azure certification.

Links:

https://microsoft.com/learn
https://www.007ffflearning.com
https://twitter.com/pdtit

Monday, September 7, 2020 1:04:09 PM (GMT Daylight Time, UTC+01:00)
# Monday, August 10, 2020

Episode 621

Donovan Brown on App Innovations

App Innovations is a concept in new and existing applications are designed to take advantage of what the cloud offers. Donovan Brown talks about some of these advantages and decisions around this strategy.

Links:

https://www.donovanbrown.com/

Monday, August 10, 2020 8:04:00 AM (GMT Daylight Time, UTC+01:00)
# Monday, July 27, 2020

Episode 619

Mete Atamel on Serverless Containers

Google Cloud Developer Advocate Mete Atamel describes some of the tools for managing serverless containers in the cloud. He discusses the advantages of K Native, Tekton pipelines, Build Packs, and Cloud Run.

Links:

https://knative.dev/

https://cloud.google.com/tekton/

https://buildpacks.io/

https://atamel.dev/

https://cloud.google.com/run/

Monday, July 27, 2020 9:03:00 AM (GMT Daylight Time, UTC+01:00)
# Monday, June 1, 2020

Episode 611

Nik Molnar on Visual Studio Codespaces

Visual Studio Codespaces (formerly Visual Studio Online) is a cloud-based development environment that you can connect to from Visual Studio Code, within a browser, and from Visual Studio (in private preview). PM Nik Molnar describes the capabilities and how it works.

Links:

https://online.visualstudio.com/

https://github.com/nikmd23/ballpark-tracker

Monday, June 1, 2020 9:01:00 AM (GMT Daylight Time, UTC+01:00)
# Monday, March 16, 2020

Episode 602

Jaidev Kunjur on Azure Integration Tools

Jaidev Kunjur of Enkay Technology Solutions discusses some of the integration tools available in Microsoft Azure, such as Logic Apps, API Management, Azure Functions, and Event Grid.

He describes the capabilities of these tools and how his company is using them to solve integration problems for their customers.

https://enkaytech.com/
Monday, March 16, 2020 9:44:52 AM (GMT Standard Time, UTC+00:00)
# Thursday, March 12, 2020

GCast 77:

Connecting Azure Synapse to External Data

Azure Data Warehouse has been re-branded as Azure Synapse. Learn how to add data from an external system to an Azure Synapse database.

Thursday, March 12, 2020 10:07:09 AM (GMT Standard Time, UTC+00:00)
# Thursday, February 6, 2020

GCast 72:

Creating an Azure DevOps Build Pipeline

Learn how to automate a build and test process with an Azure DevOps Build pipeline.

Thursday, February 6, 2020 8:52:00 AM (GMT Standard Time, UTC+00:00)
# Monday, January 27, 2020

Episode 595

Tibi Covaci on Migrating to the Cloud

Tibi Covaci discusses strategies and factors companies need to consider when migrating their applications to the cloud.

Monday, January 27, 2020 8:02:00 AM (GMT Standard Time, UTC+00:00)
# Thursday, June 20, 2019

GCast 53:

Creating a Data Warehouse in Azure

Learn how to create a new SQL Sever data warehouse in Microsoft Azure.

Thursday, June 20, 2019 9:24:00 AM (GMT Daylight Time, UTC+01:00)
# Monday, May 20, 2019

Episode 564

David Makogon on Streaming Data

David Makogon talks about streaming data and the tools to help you make it happen.

David on Twitter

Monday, May 20, 2019 9:10:00 AM (GMT Daylight Time, UTC+01:00)
# Monday, April 22, 2019

Episode 560

Frank Gill on Azure SQL Database Managed Instances

DBA Frank Gill discusses Azure SQL Database Managed Instances - a cloud-based managed database service. He describes what they are, how they differ from Azure SQL Databases, and when it is appropriate to consider them.

Links:

https://skreebydba.com/
https://twitter.com/skreebydba

Monday, April 22, 2019 9:49:00 AM (GMT Daylight Time, UTC+01:00)
# Monday, April 1, 2019

Episode 557

Brent Stineman on the Evolution of Serverless

Brent Stineman describes Serverless cloud technologies and how they have evolved to make applications more flexible.

Monday, April 1, 2019 9:22:00 AM (GMT Daylight Time, UTC+01:00)
# Wednesday, March 6, 2019

The Internet of Things (or IoT) has revolutionized the way we think of computing.

In the past, computers were self-contained, general purpose machines that could load complex operating systems, run multiple applications, and perform a wide variety of tasks. They could communicate with one another in order to either share data or distribute workloads.

Now, tiny computers can be found in a huge number of devices around one's home or workplace. When these devices are connected to the cloud, they become far more powerful because much of the processing and storage traditionally done on the computer is moved to the massively-scalable cloud.

At home, refrigerators, thermostats, and automobile contain computers that send and receive information, making them better able to adapt to the world around them.

Businesses take advantage of devices connected to manufacturing machines or vehicles or weather detectors to monitor local conditions and productivity. Capturing data from these devices allows them to respond to anomalies in the data that may indicate a need for action. Imagine a monitor on a factory floor that monitors the health of an assembly line and sends an alert to a repair team if the line breaks down. Or, better still, if the data indicates a strong probability it will break down soon. Imagine a shipping company being able to track the exact location and health of every one of their trucks and to re-route them as necessary.

Industries as disparate as transportation, clothing, farming, and healthcare have benefited from the IoT revolution.

Cloud tools, such as Microsoft Azure IoT Hub allow businesses to capture data from many devices, store that data, analyze, and route it to a particular location or application. As applications become more complex, cloud tools become both more powerful and simpler to create.

These tools offer things like real-time analytics, message routing, data storage, and automatic scalability.

This IoT revolution has enabled companies to capture huge amounts of data. Tools like Machine Learning allow these same companies to find patterns in that data to facilitate things like predictive analysis.

The cost of both hardware and cloud services has fallen dramatically, which has accelerated this trend.

The trend shows no signs of slowing and companies continue to think of new ways to connect devices to the cloud and use the data collected.

The next series of articles will explore how to process IoT data using the tools in Microsoft Azure.

Wednesday, March 6, 2019 9:46:00 AM (GMT Standard Time, UTC+00:00)
# Sunday, August 12, 2018

Here is my presentation "How Cloud Computing Empowers a Data Scientist" that I delivered in June at IT Camp in Cluj-Napoca, Romania.

ITCamp 2018 - David Giard - How Cloud Computing Empowers a Data Scientist from ITCamp on Vimeo.

Sunday, August 12, 2018 9:14:00 AM (GMT Daylight Time, UTC+01:00)
# Thursday, July 19, 2018

GCast 7:

Azure SQL Database

Azure MySQL Database

Thursday, July 19, 2018 8:46:00 AM (GMT Daylight Time, UTC+01:00)
# Thursday, July 12, 2018

GCast 6:

Azure SQL Database

How to create an Azure SQL database in the Azure portal.

Thursday, July 12, 2018 9:16:00 AM (GMT Daylight Time, UTC+01:00)
# Saturday, June 23, 2018

On May 19, I delivered a presentation titled "How Cloud Computing Empowers a Data Scientist" at the Chicago AI & Data Science Conference.

I described ways that the cloud has accelerated the fields of data science, machine learning, and artificial intelligence; and I gave examples of Azure tools that facilitate development in these fields.

You can watch the video below or at https://youtu.be/H19IW6nykZo

Saturday, June 23, 2018 8:17:00 AM (GMT Daylight Time, UTC+01:00)
# Thursday, June 7, 2018

GCast 1:

A Lap Around Cloud and Azure

What is cloud computing, why is it important, and how does Microsoft Azure fit in?

Thursday, June 7, 2018 5:22:00 PM (GMT Daylight Time, UTC+01:00)
# Monday, April 16, 2018
Monday, April 16, 2018 11:03:00 AM (GMT Daylight Time, UTC+01:00)
# Monday, June 26, 2017
Monday, June 26, 2017 12:55:57 PM (GMT Daylight Time, UTC+01:00)
# Monday, June 19, 2017
# Monday, April 24, 2017
Monday, April 24, 2017 12:54:00 PM (GMT Daylight Time, UTC+01:00)
# Monday, April 3, 2017
Monday, April 3, 2017 4:51:06 PM (GMT Daylight Time, UTC+01:00)
# Tuesday, February 7, 2017

Managing Big Data takes a lot of process power. Data often needs to be captured, scrubbed, merged, and queried and each of these things can take many hours of compute time. But often they can be performed in parallel - reducing the amount of time, but increasing the number of computers required.

You could buy a bunch of computers, cluster them, and process your data on this process. But this is expensive and these computers are likely to sit idle most of the time.

Cloud Computing tends to be an ideals solution for most Big Data processing because you can rent the servers you need and only pay for them while they are running.

Microsoft Azure offers a full suite of Big Data tools. These tools are based on the popular Hadoop open source project and are collectively known as "HD Insight".

HBase

HBase is a NoSQL data store that is optimized for big data. Unlike SQL Server and other relational databases, the database does not enforce referential integrity, pre-defined schemas, or auto-generated keys. The developer must code these features into the client application. Because the database doesn't need to worry about these things, inputting data tends to be much faster than in a relational database.

HBase also can be scaled to store petabytes of data.

Storm

Apache Storm is a framework that allows you to build workflow engines against real-time data. This is ideal for scenarios like collecting IoT data. The Storm topology consists of a Stream, which is a container that holds a Spout and one or more Bolts. A Spout is a component that accepts data into the Stream and hands it off to Bolts. Each Bolt takes in data; preforms some discrete actions, such as cleaning up the data or looking up values from IDs; and passes data onto one or more other Bolts. Data is passed as "Tuples", which are sets of name-value pairs formatted as JSON. You can write your code in C#, Java, or Python and a Visual Studio template helps you create these components.

Hive

Hive is a data warehouse. With it, you can query NoSQL data (such as Hive) and relational data (such as SQL Server). Hive ships with a query language - HiveQL - that is similar to SQL. Where HiveQL falls short, you can even write user-defined functions to perform more complex calculations.

Spark

Spark is a visualization tool. In Spark, you can write code in R, Python, or Scala. Jupyter notebooks are a popular interactive tools that allow you to create templates consisting of text and code, so that you can generate real-time reports. Jupyter notebooks support both Python and Scala. Spark also ships with a number of libraries that make it easier to connect to data, create graphs, and perform a number of other tasks.

Clusters

Each of the services described above supports running in clusters of servers. In a cluster, these servers process in parallel, greatly reducing the amount of time required to process the data.  You can easily create a cluster in the portal or you can write a script in PowerShell or CLI.

The ease of creating clusters is a big advantage of running HD Insight over deploying your own Hadoop servers and clustering them yourself. Of course, the other advantage is that you do not have to purchase and maintain servers that are only being used occasionally, which can be a big cost saving.

Warning

One word of caution about using these services. You pay for each server in a cluster by the minute. This can quickly add up. Typically, you don't need to have your cluster running for very long in order to complete tasks, so it is a good idea to shut them down when they are finished. Because of this, it's a good idea to script the creation and deletion of your cluster to make it easy to perform these tasks.

Tuesday, February 7, 2017 6:08:01 PM (GMT Standard Time, UTC+00:00)
# Monday, January 23, 2017
Monday, January 23, 2017 11:43:00 AM (GMT Standard Time, UTC+00:00)
# Monday, October 24, 2016
Monday, October 24, 2016 9:53:00 AM (GMT Daylight Time, UTC+01:00)
# Tuesday, October 11, 2016

Microsoft Cognitive Services provides a number of APIs to take advantage of Machine Learning. One of the simplest APIs to use is Sentiment Analysis.

Sentiment Analysis examines one or more text entries and determines whether each text reflects a positive or negative sentiment. It returns a number between 0 and 1: A higher number indicates a more positive sentiment, while a lower number indicates a more negative sentiment.

To use this service, POST a JSON message to the following URL: https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment

Unlike some web Cognitive Service URLs, this one takes no querystring parameters.

In the HTTP header, pass the following information: Content-Type and the Ocp-Apim-Subscription-Key.

The API is a simple REST web service located at https://api.projectoxford.ai/emotion/v1.0/recognize. POST to this service with a header that includes:
Ocp-Apim-Subscription-Key:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

where xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx is your key.

In the Content-Type, pass "application/json".

For the Ocp-Apim-Subscription-Key, include the the Text Analytics key. You can find your key at https://www.projectoxford.ai/Subscription?popup=True

In the body, pass a JSON object that contains an array of documents. Each document contains 3 properties:

language - the Language of the text you want to analyze. Valid values are "English", "Spanish", "French", and "Portuguese".

id - A string that uniquely identifies this document. Used to match the return value to the corresponding text.

text - the text to analyze

Below is a sample JSON body:

{
"documents": [
{
"language": "English",
"id": "text01",
"text": "This is a great day."
}
]
}

After you POST this to the URL, you should expect a response that includes JSON. If all goes well, you will receive an HTTP 200 response and the returned JSON will include an array of documents (the same number that you passed in the Request body). Each Response document will contain

id - matching the id of the document in the Request document.

score - A value between 0 and 1. The higher the score, the more positive the sentiment of the text; The lower the score, the more negative the text sentiment.

You may also receive an array of errors. Each error contains the following properties:

id - matching the id of the document in the Request document.

message - a detailed error message.

Below is an sample response JSON body

{
"documents": [
{
"score": 0.95412,
"id": "text01"
}
]
}

Here is a bit of code to call this API from JavaScript. I am using jQuery's Ajax method and displaying output in a div, like the following:

<div id="OutputDiv"></div> 

var subscriptionKey = "566375db01ad43dc8f62dcc8dc3e5c1f";
var textToAnalyze = "Life is beautiful";

var webSvcUrl = "https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment";

var outputDiv = $("#OutputDiv");
outputDiv.text("Thinking...");

$.ajax({
type: "POST",
url: webSvcUrl,
headers: { "Ocp-Apim-Subscription-Key": subscriptionKey },
contentType: "application/json",
data: '{"documents": [ { "language": "en", "id": "text01", "text": "'+ textToAnalyze + '" }]}'
}).done(function (data) {
if (data.errors.length > 0) {
outputDiv.html("Error: " + data.errors[0]);
}
else if (data.documents.length > 0) {
var score = data.documents[0].score;
if (score > 0.5){
outputText = "That is a Positive thing to say!";
}
else{
outputText = "That is a Negative thing to say!";
}
outputDiv.html(outputText);
}
else {
outputDiv.text("No text to analyze.");
}

}).fail(function (err) {
$("#OutputDiv").text("ERROR! " + err.responseText);
});

Tuesday, October 11, 2016 6:48:00 AM (GMT Daylight Time, UTC+01:00)
# Saturday, May 21, 2016

Last month, I had the privilege of attending the AWS Summit in Chicago. It was a great experience for me because, although I do a lot of work with cloud computing, I have very little experience with the Amazon Web Services (AWS) platform.

The most interesting session I attended was about a service called "Aurora" (Amazon tends to give all their services catchy names). This is a relational database that looks and acts almost exactly like MySQL but runs much faster. The official product page brags that Aurora is a "MySQL-compatible relational database with 5X performance", however the session I attended claimed that they found cases in which Aurora was 63 times faster than MySQL. The presenters didn't share details of those cases, but even if results are only a fraction of that speed, it's still an impressive performance improvement.

Because Aurora is MySQL-compliant, you should be able to plug it into any application and use it just like MySQL. The SQL syntax is identical and the management tools will be familiar to anyone used to managing MySQL.

Of course, the fact that Aurora is hosted on a cloud platform like AWS gives it the advantage of high availability and flexible scaling that cloud computing offers.

Since most of my cloud computing experience is with Microsoft Azure, I tend to use Azure as a reference point for the services I saw at this summit. I was drawn to Aurora in part because I'm not aware of the same offering in Microsoft Azure.

MySQL as a service is available on Azure, but it's offered and supported by ClearDb - a third party.  If you want better performance or scalability on Azure than that offered by ClearDb, you will need to either switch to a different database or create a Virtual Machine and install MySQL on that, in which case you would be using Infrastructure as a Service, instead of Software as a Service.

In many cases, this is a non-issue. If you are building a new application, you have the flexibility to choose your preferred database technology. MySQL and SQL server have very similar languages; and, although I won't get into a debate here as to which is "better", it would be difficult to argue that SQL server is significantly less reliable or enterprise-ready than MySQL.

But there are times when you don't have a choice of database technologies. For example, if you have a large legacy application that you want to migrate to Azure, it may be a daunting task to migrate every stored procedure and SQL statement to use T-SQL.  Or if you are using a framework that is specifically built on top of MySQL, it makes sense to use that database, rather than re-writing the entire data access layer. Luckily, some frameworks have alternative data access layers. For example, Project Nami is a data access layer for WordPress that uses SQL Server as a data store, rather than MySQL.

Although the various cloud computing companies follow one another and are likely to build a service when they see traction on their competitor's platform, I find it interesting to see these gaps in offerings.

Saturday, May 21, 2016 11:28:00 AM (GMT Daylight Time, UTC+01:00)
# Monday, March 28, 2016
Monday, March 28, 2016 10:27:00 AM (GMT Daylight Time, UTC+01:00)
# Sunday, March 13, 2016

Project Oxford is a set of APIs that take advantage of Machine Learning to provide developers with

These technologies require Machine Learning, which requires a lot of computing power and a lot of data. Most of us have neither, but Microsoft does and has used it to create the APIs in Project Oxford.

Project Oxford provides APIs to analyze pictures and voice and provide intelligent information about them.

There are three broad categories of services: Vision, Voice, and Language.

The Vision APIs analyzes pictures and recognizes objects in those pictures.  For example, several Vision APIs are capable of recognizing  faces in an image. One analyzes each face and deduces that person's emotion; another can compare 2 pictures and decide whether or not 2 photographs are the same person; a third guesses the age of each person in a photo.

The Speech APIs can convert speech to text or text to speech. It can also recognize the voice of a given speaker (if you want to use that for authentication in your app, for example) and infer the intent of the speaker from his words and tone.

The Language APIs seem more of a grab bag to me. A spell checker is smart enough to recognize common proper names and homonyms.

All these APIs are currently in Preview but I've played with them and they appear very solid. Many of theme even provide a confidence factor to let you know how confident you should be in the value returned. For example, 2 faces may represent the same person but it helps to know how closely they match.

You can use these APIs. To get started, you need a Project Oxford account, but you can get one for free at projectoxford.ai.

Each API offers a free option that restricts the number and/or frequency of calls, but you can break through that boundary for a charge.

You can also find documentation, sample code, and even a place to try out each API live in your browser at projectoxford.ai.

You call each one by passing and receiving JSON to a RESTful web service, but some of them offer an SDK to make it easier to make that call from a .NET application.

You can see a couple of fun applications of Project Oxford at how-old.net (which guesses the ages of people in photographs) and what-dog.net (which identifies the breed of dog in a photo).

Sign up today and start building apps. It’s fun and it’s free!

Sunday, March 13, 2016 3:14:12 AM (GMT Standard Time, UTC+00:00)
# Wednesday, October 14, 2015

I recently spoke with Data Scientist Richard Conway from Elasta Games, who described how his company does analysis of online games.

You can watch and listen to this interview below.

Wednesday, October 14, 2015 12:36:00 PM (GMT Daylight Time, UTC+01:00)
# Monday, February 24, 2014
Monday, February 24, 2014 6:01:00 PM (GMT Standard Time, UTC+00:00)
# Monday, January 28, 2013
Monday, January 28, 2013 3:40:30 PM (GMT Standard Time, UTC+00:00)
# Thursday, August 19, 2010
Thursday, August 19, 2010 1:41:00 AM (GMT Daylight Time, UTC+01:00)
# Thursday, May 27, 2010

Tonight, I attended the Cloud Camp Detroit, which was built primarily around Open Spaces discussion and a panel discussion in front of an audience. The basics of cloud computing, specific cloud implementations and issues such as security were discussed. The "eyes-front" presentations were limited to half a dozen lightning talks.

I had a chance to interact with a lot of people far more experienced than me in this area. Many of them work outside the .Net world, so talking with them helps me see the software industry in a different perspective.

I filled in for a sick friend to deliver a presentation on Windows Azure. Below are the slides from my presentation. Thanks to Abe Pachikara of Microsoft for supplying the slides.

Thursday, May 27, 2010 5:04:45 AM (GMT Daylight Time, UTC+01:00)