Chapter 7. Managing Data Within Workflows

It is rare today that a complete set of work is accomplished with a single job or project. Think about a typical CI/CD pipeline. You will usually have a job that does the building, a job for packaging, multiple jobs for testing, and so on. But even though these are individual jobs, they still need to be able to pass data and files between them. For example, the build job produces a module from source code that then needs to be tested and combined with other modules into a deliverable for the customer. Or jobs in a workflow may use outputs from a setup job as inputs or dependencies for configuration.

To accomplish this transfer of data and content, the separate jobs must have access to the intermediate results along the way. The jobs must be able to get to the various inputs, outputs, and files throughout the run of the larger process.

GitHub Actions provides syntax for capturing, sharing, and accessing inputs and outputs between jobs and steps in workflows. Additionally, it provides functionality for managing intermediate files or modules, which it calls artifacts. Actions provides the ability to persist artifacts created during a workflow run. Jobs within the same workflow can then access the artifacts and use them, like the projects in a pipeline.

Actions also provides the ability to cache collections of content to speed up future runs. This can be provided via explicitly calling the cache action or, in many cases, using a setup action ...

Get Learning GitHub Actions now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.