Chapter 6. Azure Machine Learning

Machine learning (ML) is likely top of mind these days given the onset of game-changing artificial intelligence, from autonomous driving to protein structure prediction.

In bioinformatics, we can use ML to predict drug sensitivity, classify an organism’s resistance to a drug, or model mutations in a gene.

Any of these use cases often requires a considerable amount of data. Thus, the cloud is a perfect place for ML as we can get access to ample data from our genomics data lake and scale our modeling tasks with cloud compute services.

In this chapter, we’ll speak more broadly about machine learning, focusing on how to use Azure Machine Learning to scalably train and track your modeling tasks. Plus, despite its name, the Azure Machine Learning system is great for non-ML activities as well, including bioinformatics tasks.

How to Scale Machine Learning Tasks

When we start training ML models on our local workstation, we experience slow model training due to resource limits. Also, while performing a larger-scale search for a great predictive model, the process could be executing in an inefficient way.

During the process of performing k-fold cross-validation or tuning an algorithm’s hyperparameters, the code may be executing iterations of the process in serial—that is, one after another, waiting on the previous step to complete before running the next iteration. This is unnecessary as we don’t need the results from a previous step before proceeding to ...

Get Genomics in the Azure Cloud now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.