[toc]

Introduction

Organizations adopting containerized application delivery platforms commonly wonder if containerization is right for their legacy applications, asking:

“Can we containerize our legacy applications? What issues should we expect?”

In short…

“Yes, you can containerize your legacy applications and there are probably fewer adjustments required than you think.”

This post will:

  • summarize the most-common organizational and technical concerns in the containerization process
  • describe approaches for addressing concerns with containerized delivery of legacy applications

This post will not provide an exhaustive list of concerns and solutions, but intends to show concerns related to containerizing legacy applications are well-known and addressable using features of container platform tooling, process, and training.

Organizational Concerns

When re-deploying a legacy application to a containerized model, recognize perceived and actual risk to:

  • business value currently produced by the application
  • existing development, debugging, and operational processes

“Containerizing a legacy application is a microcosm for defining and allocating responsibilities within a Tech organization.”

 

 

xkcd - Workflow

xkcd (1172) ‘Workflow’

Adopting containers for deploying legacy applications will trigger many conversations about how to standardize deployment and operational activities.  Conversations will largely concern how:

  • artifacts are built and promoted
  • applications retrieve runtime configuration, particularly secrets
  • application and operational data moves in and out of containers

When adopting containers, expect and prompt questions about how to deal with application-specific bits of deployment and operational glue.  Prefer generic solutions applicable for the entire container ecosystem over propagating existing application-specific variation into the container.  Generic solutions will be simpler in the long run due to lower complexity, maintenance, and training costs.  Start by capturing the following for each step in the deployment process:

  • the inputs and outputs of the step
  • party responsible for each input and output

Example of an annotated Application Image Build step:

App Image Build Process

Business Value

Any time an application deployment changes, there is risk that the business value provided by the application will be affected.  Revenue, cost efficiency, or productivity may be lost and so the organization will be understandably nervous.

Careful and robust functional and load testing can build confidence that the containerized application deployment performs as well or better than the legacy deployment.  Automating these application tests will ensure that confidence in the application’s deployment is maintained over time.

Processes

The organization may not like the status-quo of an application’s deployment, but that doesn’t mean they want to change!

Changing an application’s deployment model is an opportunity to improve organizational as well as operational efficiency, but only if the necessary processes, knowledge, and support for the change is in place.

Ensure there are reasonable, modern implementations of key processes:

  • local dev experience: application engineers should be able to configure, build, and run the containerized application locally; usually simple to accomplish with docker-compose, captain, or shell scripts
  • deployment: application deployments must be automated with an audit trail of what was deployed and when
  • debugging: application and operations engineers need ways to view application logs in real-time and possibly restart instances or attach debuggers
  • operations: a scalable approach to monitoring application processes, not just hosts, must be in-place with a facility for alerting-on the application’s business and technical KPIs

Training

Ensure the larger organization understands who is responsible for what with the new deployment model, particularly operational responsibilities.  Train engineers in all core technical concepts and processes, ensuring engineers know how to fulfill their responsibilities in the new model.  Create introductory content, ‘Getting Started’ guides, checklists, clone-and-own examples, and group-chat channels to get customers up-to-speed quickly.

Traction & Feedback

Containerizing an application is a great opportunity to improve many processes, but each change should be accompanied by training, evangelism, and eager incorporation of customer feedback to ensure the change proceeds well.  Get Started Using Lean to Create Internal Technical Products has some helpful advice for ensuring process and technical changes actually yield improvements.

Technical Concerns

How do I even containerize this?

The process of containerizing a legacy application is very-similar to containerizing a greenfield application.  However, the legacy application’s interfaces and integrations may not be as clean as a greenfield application, reflecting:

  • differences in architectural style and organizational maturity over time
  • the organic accumulation of features and fixes

The following sections describe potentially challenging technical concerns and recommended approaches for addressing them.

Dependencies

Identifying an application’s runtime dependencies is the first and most important step to containerizing an application.

Examples of application runtime dependencies:

  • software packages: python, Java Development Kit (JDK), Tomcat, OpenSSL
  • configurations: licenses, cryptographic keys and certificate chains, hostnames of service dependencies

Discover these dependencies by consulting:

  • application, release, and operations engineers
  • configuration management automation
  • software configuration files and documentation
  • deployed systems using forensic methods
    • reverse-engineer a build script and package manifest for an existing system with blueprint
    • list packages installed via system’s package manager using: rpm, dpkg-query
    • shared libraries in use by application using: ldd, pmap
    • filesystem: lsof, find with atime and mtime options to determine what files are being accessed and modified
    • network: periodic netstat -a, tcpdump, active connection analysis at network devices

Address application software dependencies by installing the necessary packages and files in the application’s container image.

Note: Machine image to Docker (application) image converters are not widely used within the community at this time.   This might be surprising since a base image can be bootstrapped by ADD’ing a backup of (a portion of) the source host to an appropriate base image to create an equivalent filesystem.  However, this approach is a bit of a dead-end since it is most-useful for migrating Pet/work-of-art hosts of unknown history and the tooling for maintaining containers created from such images is even less-developed than it is for hosts.  See Containers are Not VMs (Mike Coleman, Docker Inc) for a deeper explanation of the difference between hosts and (application) containers.  If you would still like to proceed with host-to-container conversion, see:

Filesystem Dependencies

Filesystems often present the biggest dependency challenge.  Look for:

  • application log directories
  • application data such as database files or ‘mailbox’ directories used for input and output of application-level data; application-level dependencies will be be given expanded treatment in the next section.

The recommended approaches for dealing with application log directories are:

  1. configure application to log to stdout and stderr instead of a file, configure Docker Engine to route logs to a platform-managed logging service available on the network
  2. bind-mount a log directory from the host into the container for the application to log-into, deploy a platform-managed agent to ship logs to a platform-managed logging service and prune logs on host

Applications should not put logs directly onto the container filesystem because they will be difficult to get-to and eventually fill-up the container’s filesystem.

Application-level data dependencies will be given expanded treatment in the next section.

Application Data

Applications that read and write application data from the filesystem present a challenge because containers are typically recreated when application updates are deployed resulting in loss of any data stored inside the old application container.

The solution is to externalize the application data from the container using:

  • a Docker volume
  • cloud-based storage or message queue

Externalizing to Volumes

Externalizing data to a Docker volume is a small change that can be done when containerizing the application without requiring application changes.

Docker volumes can be provided to the application container in multiple ways:

  • bind-mount host directories into the app container
  • create and integrate a named persistent volume, optionally using a driver backed by network storage

Careful consideration must be given to the data volume’s:

  • safety
  • security
  • availability
  • performance

The simplest and recommended way to start running stateful applications in containers is to bind-mount directories from the host into the application container, so the application’s existing operational processes for monitoring and protecting the application’s data can be used.

A more-advanced approach is to externalize application data to persistent volumes backed by network storage.  Using network-backed volumes can yield significant scalability benefits as now applications are not limited by the container host’s storage.  However, network-backed storage volumes requires careful engineering and testing to ensure the storage is reliable and performant, especially under peak loads and failure conditions.  Centralizing data to a shared storage system can easily create a single point of failure at the heart of the application platform.

Externalizing to Cloud Storage

When application changes are practical, externalizing application data to cloud-based storage (AWS S3, Azure Storage) or messaging queue (AWS SQS, Azure Service Bus) systems can yield significant improvements in fault-tolerance, availability, and scalability.

The architectural benefits derive from moving data and integration previously dependent from specific, well-known hosts (Pets) to easily replaceable application instances that depend on cloud-provided services for data persistence (Cattle).

The main drawback to this approach is needing to change the application to operate with data hosted on an external service which has a different set of failure modes and performance characteristics than a local filesystem.

Base Image

The hosts of ‘legacy’ applications may run older operating systems there is often a question of whether to use a base image similar to the legacy environment or the organization’s more-modern standard base image hierarchy.

Prefer leveraging the organization’s existing base application image hierarchy as it is:

  • less to manage and secure
  • well-understood within the organization
  • probably using a modern base such as CentOS 7 or Ubuntu 16.04 and compatible with modern utilities

If the application is strongly-coupled to an older OS, then you can create a similar Docker base image on e.g. CentOS 6.x, but think about the strategic implications:

  • other ‘legacy’ apps will probably start to use it out of convenience
  • once created, will live for a very long time
Should App Image Hierarchy Contain a Fork for Legacy Applications?

Legacy App Image Hierarchy?

The effort to migrate an application to an updated base image is frequently less than the effort to create and maintain an older base and a parallel set of platform integration tooling.  If there is concern that the application cannot be moved to an updated base safely, then consider time-capping the migration activity before falling back to an older base.

Addressing Concerns from Incomplete Platforms

There are strategies for dealing with applications that do not run neatly inside a container, either technically or organizationally.

Returning to first principles, container technology provides two main technical advantages:

  1. organizational efficiency through uniform application packaging and distribution
  2. operational efficiency through isolation, resource limits, and platform-managed ‘ilities’

Key Point: Adoption of a uniform application packaging and distribution mechanism is extremely valuable on its own in most organizations due to the leverage and efficiency provided by consistent and generic continuous integration, delivery, and operational platforms.

An organization can choose to sacrifice operational or platform efficiency by adjusting or disabling container isolation features with the result being almost identical to a ‘normal’ single app-on-virtual machine experience.  Common application platform isolation features to disable or re-configure are:

Examples:

  • network namespace: when an application participates in a peer-to-peer network or application-managed service registry, the application container may need to be bound to the host’s ip address to facilitate routing
  • logging integration: the platform team might configure the Docker Engine to maintain logs on the container host and enable app engineers to use docker logs to inspect application logs in the development environment

Caution: Disabling container isolation features should be considered carefully for strategic implications.   Enhancing self-service features of the containerized application platform is often relatively low-effort and avoids risks of Architectural-level Technical Debt.  The expected outcomes of disabling isolation features are:

  • higher-variation on application hosts due to engineers using legacy processes to interact with the application
  • higher support costs for platform team to enable variation across applications and environments
  • restricting the number of application instances that can run on a single container host

Summary

Containerizing legacy applications can present organizational and technical challenges that are addressable with a suite of proven approaches.

Containerization technology is a response to the challenge of deploying and operating disparate ‘legacy’ applications at scale that is powerful because of the additional abstractions provided to applications and infrastructure.  The key to solving application deployment problems is a solid understanding of the fundamentals of the underlying container technology so it is applied well when building the organization’s application platform.

Want to dive deeper into the fundamentals of container tech? Check-out the expert-led, in-person Fundamentals of Docker for Engineers training course.