Hello, I’m a passerby, see personal blog for more quality articles: http://itsoku.com
Hello everyone, this article is a detailed explanation of Docker custom images, explaining how to build your own Docker image and Dockerfile operation instructions. I hope it helps you all
Mirroring is actually customizing the configuration and files added to each layer. If we can write each layer of modify, install, build, and operate commands into a script, use this script to build, customize the image, and solve the problem of unrepeatable repetition, the problem of image construction transparency, and the problem of volume. This script is Dockerfile.
A Dockerfile is a text file that contains instructions (Instruction), each of which builds a layer, so the content of each instruction is to describe how the layer should be built.
Taking the nginx image as an example, this time we use Dockerfile to customize.
In a blank directory, create a text file and name it Dockerfile:
Its contents are:
This Dockerfile is simple, two lines in total. Two directives are involved, FROM and RUN.
The so-called custom image, that must be based on an image, on which to customize. Just as we ran a container of nginx images and then made modifications, the base image must be specified. FROM specifies the base image, so FROM is a required instruction in a Dockerfile and must be the first instruction.
There are many high-quality official images on the Docker Store, and there are images of service classes that can be used directly, such as nginx, redis, mongo, mysql, httpd, php, tomcat, etc.; There are also some images that facilitate the development, building, and running of various language applications, such as node, openjdk, python, ruby, golang, etc. We can find an image that best meets our ultimate goal to customize the base image.
If no corresponding service image is found, the official image also provides some more basic operating system images, such as ubuntu, debian, centos, fedora, alpin, etc., and the software libraries of these operating systems provide us with a broader space for expansion.
In addition to selecting an existing image as the base image, Docker also has a special image named scratch. This image is a virtual concept that does not actually exist, it represents a blank image.
If you use scratch as a base image, it means that you are not based on any image, and the instructions you write next will begin to exist as the first layer of the image.
It is not uncommon to copy executable files directly into the image without any system basis, such as swarm and coreos/etcd. For programs compiled statically under Linux, there is no need for an operating system to provide runtime support, and all the libraries required are already in the executable, so direct FROM scratch will make the image more compact. Many applications developed using Go use this method to make images, which is one of the reasons why some people think Go is a particularly suitable language for container microservices architectures.
The RUN directive is used to execute command-line commands. Due to the power of the command line, the RUN directive is one of the most commonly used instructions when customizing images. There are two formats:
Since RUN can execute commands like a shell script, can we map each command to a RUN like a shell script? For example:
As mentioned earlier, every instruction in Dockerfile builds a layer, and RUN is no exception. The behavior of each RUN is the same as the process of creating the image manually just now: a new layer is created, these commands are executed on it, and after execution, the modifications of this layer are committed to form a new image.
In this way, a 7-layer image is created. This makes no sense at all, and a lot of things that are not needed at runtime are installed in the image, such as the compilation environment, updated packages, and so on. The result is a very bloated, multi-layered image that not only increases the time it takes to build and deploy, but is also prone to errors. This is a common mistake made by many newcomers to Docker.
Union FS has a maximum number of layers, such as AUFS, which used to be no more than 42 layers in size, and now cannot exceed 127 layers.
The correct way to write Dockerfile above should look like this:
First of all, all the previous commands have only one purpose, which is to compile and install the Redis executable. So there’s no need to build a lot of layers, it’s just one layer of things. Therefore, instead of using many RUN pairs to correspond to different commands, we use only one RUN directive and use && to string together the required commands. Simplify the previous 7 layers to 1 layer. When writing a Dockerfile, always remind yourself that this is not a shell script, but rather defining how each layer should be built.
Also, line breaks are also made here for formatting. Dockerfile supports the command wrapping of the shell class to add \ at the end of the line, as well as the format of the line beginning # for commenting. Good formatting, such as line breaks, indentations, comments, etc., will make maintenance and troubleshooting easier, which is a good habit.
In addition, you can see that the cleanup command was added at the end of this set of commands, the software needed to compile the build, all downloaded and expanded files were cleaned up, and the apt cache files were also cleaned up. This is an important step, as we said before, mirrors are multi-tier storage, and each layer of things will not be deleted at the next layer, but will always follow the image. Therefore, when building an image, it is important to ensure that only what really needs to be added is added to each layer, and anything that is irrelevant should be cleaned up.
One of the reasons many people who learn Docker to make bloated images is that they forget that each layer of construction must be cleaned up at the end of the unrelated files.
Go back to the Dockerfile of the previous customized Nginx image. Now that we understand the contents of this Dockerfile, let’s build this image.
In the directory where the Dockerfile file is located, execute:
From the output of the command, we can clearly see the process of building the image. In Step 2, as we said before, the RUN directive starts a container 9cdc27646c7b, executes the required command, and finally submits this layer 44aa4490ce2c, and then deletes the used container 9cdc27646c7b.
Here we use the docker build command for image building. Its format is:
Here we specify the name of the final image -t nginx:v3, and after successful build, we can run the image like we ran nginx:v2 before, and the result will be the same as nginx:v2.
If you notice, you’ll see that the docker build command has an .,. Indicates the current directory, and the Dockerfile is in the current directory, so many beginners think that this path is specifying the path where the Dockerfile is located, which is actually inaccurate. If the format corresponds to the above command, you may find that this is specifying the context path. So what is context?
First we need to understand how docker build works. Docker is divided into a Docker engine (that is, a server-side daemon) and a client-side tool at runtime. Docker’s engine provides a set of REST APIs, known as the Docker Remote API, and client tools such as the docker command interact with the Docker engine through this set of APIs to accomplish various functions. So, while on the surface it may seem as if we are performing various docker functions natively, in reality, everything is done on the server side (Docker engine) in the form of remote invocation. Because of this C/S design, it is a breeze to operate the Docker engine of a remote server.
When we build an image, not all customization will be done through the RUN directive, and it is often necessary to copy some local files into the image, such as through the COPY directive, the ADD directive, etc. The docker build command builds the image, which is not built locally, but on the server side, that is, in the Docker engine. So in this client/server-side architecture, how can the server get local files?
This introduces the concept of context. When building, the user specifies the path to the build image context, and when the docker build command learns the path, it packages everything under the path and uploads it to the Docker engine. This way, when the Docker engine receives the context packet, the deployment will get all the files needed to build the image.
If you write this in Dockerfile:
This is not to copy package.json in the directory where the docker build command is executed, nor to copy package.json in the directory where the Dockerfile is located, but to copy package.json in the context directory.
Therefore, the paths to source files in directives such as COPY are relative paths. This is also why beginners often ask COPY: The reason why /package.json /app or COPY /opt/xxxx /app doesn’t work because the paths are out of context and the Docker engine can’t get the files in these locations. If you really need those files, you should copy them to the context directory.
Now you can understand the command docker build -t nginx:v3 . This ., is actually a directory that specifies the context, and the docker build command packages the contents of that directory to the Docker engine to help build the image.
If we look at the docker build output, we’ve actually seen the process of sending this context:
Understanding the build context is important for image building to avoid making mistakes that shouldn’t be. For example, some beginners find that COPY /opt/xxxx /app does not work, so they simply put the Dockerfile to the root of the hard disk to build, only to find that after the docker build is executed, it is extremely slow and easy to build failure after sending a few tens of gigabytes of things. That’s because this practice is letting docker build package the entire hard drive, which is obviously a misuse.
In general, Dockerfile should be placed in an empty directory, or under the project root. If the required files do not exist in the directory, a copy of the required files should be made. If something in the directory really doesn’t want to be passed to the Docker engine at build time, then you can write a .dockerignore with the same syntax as gitigre, which is used to eliminate the need to pass to the Docker engine as a context.
So why would anyone mistakenly think that . Is it the directory where the Dockerfile is specified? This is because by default, if you do not specify a Dockerfile additionally, a file named Dockerfile in the context directory is treated as a Dockerfile.
This is just the default behavior, in fact the file name of the Dockerfile does not require it to be a Dockerfile, and it does not require it to be in the context directory, for example, you can use -f: . The /Dockerfile.php parameter specifies a file as a Dockerfile.
Of course, it is customary to use the default file name Dockerfile and place it in the image build context directory.
docker build also supports building from URLs, such as building directly from Git repo:
This line of command specifies the Git repo required for the build, and specifies the default master branch, the build directory is /8.14/, and then Docker will go to the git clone project by itself, switch to the specified branch, and enter the specified directory to start the build.
If the URL given is not a Git repo, but a tar archive, the Docker engine downloads the package, automatically unzips it, uses it as a context, and starts building.
or
If standard input is passed in as a text file, treat it as a Dockerfile and start building. This form has no context because it reads the contents of the Dockerfile directly from standard input, so it is not possible to copy a local file into a mirror like other methods.
If you find that the standard input file formats are gzip, bzip2, and xz, you will make it a context package, expand it directly, treat it as a context, and start building.
We’ve covered FROM, RUN, and also mentioned COPY, ADD, and in fact Dockerfile is very powerful, it provides more than a dozen instructions. Let’s move on to the other instructions.
Format:
Like the RUN directive, there are two formats, one similar to the command line and one similar to a function call.
The COPY directive copies the file/directory from the build context <源路径> directory to the location within the image of the new layer <目标路径> . Like what:
<源路径> It can be multiple, or even wildcard characters, whose wildcard rules satisfy Go’s filepath. Match rules, such as:
<目标路径> This can be an absolute path within a container or a relative path relative to the working directory (the working directory can be specified using the WORKDIR directive). The destination path does not need to be created beforehand, and if the directory does not exist, the missing directory is created before the files are copied.
Also, it’s important to note that with the COPY directive, all kinds of metadata from the source file are preserved. For example, read, write, and execute permissions, file change times, etc. This feature is useful for image customization. Especially when the build-related files are managed using Git.
The format and nature of the ADD directive and COPY are basically the same. However, some features have been added to COPY.
For example <源路径> , it can be a URL, in which case the Docker engine will try to download the linked file and put it <目标路径> there. The file permissions after download are automatically set to 600, if this is not the desired permission, then you need to add an additional layer of RUN for permission adjustment, in addition, if the downloaded package needs to be decompressed, it also needs an additional layer of RUN instructions for decompression. So it’s better to use the RUN directive directly, then use the wget or curl tool to download, handle permissions, extract, and clean up useless files. Therefore, this feature is not practical and is not recommended.
If tar <源路径> is compressed in the case of gzip, bzip2 and xz, the ADD directive will automatically extract the compressed file <目标路径> to .
This automatic decompression feature is useful in some cases, such as in the official image ubuntu:
But in some cases, if we really want to copy a compressed file into it without uncompressing it, we can’t use the ADD command.
In Docker’s official Dockerfile best practices document, COPY is required to use COPY whenever possible, because the semantics of COPY are very clear, that is, copying files, while ADD contains more complex functions and its behavior is not necessarily clear. The best use of ADD is the one mentioned where automatic decompression is required.
It is also important to note that the ADD directive invalidates the image build cache, which may make image builds slower.
Therefore, when choosing between the COPY and ADD directives, you can follow the principle that all file copies use the COPY directive, and ADD is only used when automatic decompression is required.
The format of the CMD directive is similar to RUN and is also in two formats:
When I introduced containers before, I said that Docker is not a virtual machine, and a container is a process. Since it is a process, when starting the container, you need to specify the program and parameters that you run. The CMD directive is used to specify the default start command for the container master process.
At runtime, you can specify a new command to override this default command in the image settings, for example, the default CMD of ubuntu images is /bin/bash, and if we directly docker run -it ubuntu, we will go directly to bash. We can also specify to run other commands at runtime, such as docker run -it ubuntu cat /etc/os-release. This replaces the default /bin/bash command with the cat /etc/os-release command, which outputs system version information.
In terms of instruction formats, the exec format is generally recommended, which is parsed to a JSON array when parsed, so be sure to use double quotes “, not single quotes.
If the shell format is used, the actual command is executed as a sh -c parameter. Like what:
In practice, it will be changed to:
That’s why we can use environment variables, because these environment variables are parsed by the shell.
When it comes to CMD, we have to mention the problem of applications executing in the container in the foreground and in the background. This is a common confusion for beginners.
Docker is not a virtual machine, the applications in the container should be executed in the foreground, rather than using upstart/systemd to start the background service like in the virtual machine and the physical machine, and there is no concept of the background service in the container.
Some beginners write CMD as:
Then it exits immediately after finding that the container executes. Even using the systemctl command inside the container turns out to be impossible to execute. This is because I have not understood the concept of foreground and backstage, have not distinguished the difference between containers and virtual machines, and still understand containers from the perspective of traditional virtual machines.
For containers, its initiator is the container application process, the container exists for the main process, the main process exits, the container loses the meaning of existence, and thus exits, other worker processes are not what it needs to care about.
Using the service nginx start command, you want upstart to start the nginx service as a background daemon. And just said CMD service nginx start will be understood as CMD [ ” sh ” , ” -c ” , “service nginx start”] , so the main process is actually sh. Then when the service nginx start command ends, sh is also over, sh exits as the main process, and naturally the container exits.
The correct approach is to execute the nginx executable directly and require it to run in foreground. Like what:
The format of ENTRYPOINT is the same as that of the RUN directive, which is divided into exec format and shell format.
The purpose of ENTRYPOINT, like CMD, is to specify the container launcher and parameters. ENTRYPOINT can also be replaced at runtime, but it is slightly more cumbersome than CMD and needs to be specified via the docker run parameter –entrypoint.
When ENTRYPOINT is specified, the meaning of CMD changes, no longer directly run its command, but pass the contents of the CMD as parameters to the ENTRYPOINT instruction, in other words, when actually executed, it will become:
So with CMD, why ENTRYPOINT?
Suppose we need an image that knows our current public IP address, so we can use CMD to do so:
Let’s say we use docker build -t myip . To build an image, if we need to query the current public IP address, we only need to execute:
It seems that you can use the image directly as a command, but the command always has parameters, what if we want to add parameters? For example, from the CMD above, you can see that the actual command is curl, so if we want to display HTTP header information, we need to add the -i parameter. So can we add the -i parameter directly to docker run myip?
We can see the error that the executable file is not found. As we said earlier, followed by the image name is command, which replaces the default value of the CMD at runtime. So here -i replaces the original CMD instead of being added after the original curl -s http://ip.cn. And -i is not a command at all, so naturally it can’t be found.
So if we want to include the -i parameter, we have to re-enter the command in its entirety:
This is obviously not a good solution, and using ENTRYPOINT can solve this problem. Now let’s implement this image again with ENTRYPOINT:
This time let’s try to use docker run myip -i directly:
As you can see, this time it worked. This is because when there is an ENTRYPOINT, the contents of the CMD will be passed to ENTRYPOINT as a parameter, and here -i is the new CMD, so it will be passed to curl as a parameter, which will achieve the effect we expect.
To start a container is to start the main process, but there are times when some preparation is required before starting the main process.
For example, a database of the mysql class may require some database configuration and initialization work, which must be solved before the final mysql server is running.
In addition, you may want to improve security by avoiding using the root user to start the service, and you need to perform some necessary preparatory work as root before starting the service, and finally switch to start the service as the service user. Or in addition to the service, other commands can still be executed as root, which is convenient for debugging and so on.
These preparations are independent of container CMD, and no matter what CMD is, a pre-processing work is required. In this case, you can write a script and put it into ENTRYPOINT to execute, and the script will take the received arguments (that is
You can see that a Redis user was created for the Redis service and the ENTRYPOINT was specified as the docker-entrypoint.sh script at the end.
The content of the script is judged according to the content of the CMD, if it is a redis-server, switch to the redis user to start the server, otherwise it is still executed as root. Like what:
There are two formats:
This instruction is very simple, just set the environment variable, whether it is other instructions later, such as RUN, or runtime applications, you can directly use the environment variables defined here.
This example demonstrates how to wrap lines and enclose values with spaces in double quotation marks, which is consistent with the behavior under the shell.
The environment variable is defined, so it can be used in subsequent instructions. For example, in the official node image Dockerfile, there is code like this:
The environment variable NODE_VERSION is defined here, and then the RUN layer is used many times to customize the operation $NODE_VERSION. As you can see, when upgrading the image build in the future, you only need to update 7.2.0, and Dockerfile build maintenance becomes easier.
The following directives can support environment variable expansion: ADD, COPY, ENV, EXPOSE, LABEL, USER, WORKDIR, VOLUME, STOPSIGNAL, ONBUILD.
It can be felt from this instruction list that environment variables can be used in many places and are very powerful. With environment variables, we can have a Dockerfile make more images by using different environment variables.
The format is:
As we said before, the container runtime should try to keep the container storage layer from writing operations, and for applications where database classes need to save dynamic data, their database files should be saved in volume, and we will further introduce the concept of Docker volumes in the following chapters. To prevent runtime users from forgetting to mount the directory where the dynamic file is saved as a volume, in Dockerfile we can specify certain directories to be mounted as anonymous volumes in advance, so that at run time, if the user does not specify a mount, its application can run normally without writing a large amount of data to the container storage layer.
The /data directory here is automatically mounted as an anonymous volume at runtime, and any information written to /data is not recorded in the container storage layer, thus ensuring the statelessness of the container storage layer. Of course, the runtime can override this mount setting. Like what:
In this command, the named volume mydata is used to mount to the /data position, replacing the mount configuration of the anonymous volume defined in the Dockerfile.
The format is EXPOSE <端口1> [<端口2>…].
The EXPOSE directive is a service port that declares that the runtime container provides the service, which is just a declaration, and does not open the service of this port at runtime because of this declaration application. Writing such a declaration in a Dockerfile has two benefits, one is to help the image user understand the guardian port of the image service, so as to facilitate the configuration of the mapping; Another use is to automatically randomly map the port of EXPOSE when using random port mapping at runtime, that is, when docker run -P.
In addition, there was a special use in earlier Docker versions. Previously, all containers were running in the default bridged network, so all containers could access each other directly, which was a security issue. So there is a Docker engine parameter –icc=false, when specified, the containers will not be accessible to each other by default, unless the containers that use the –links parameter can communicate with each other, and only the port declared by EXPOSE in the image can be accessed. The use of –icc=false has been basically used after the introduction of docker network, and the interconnection and isolation between containers can be easily realized through custom networks.
To distinguish between EXPOSE and using -p at run time <宿主端口><容器端口> . -p is to map the host port and the container port, in other words, to expose the corresponding port service of the container to the outside world, and EXPOSE only declares what port the container intends to use, and does not automatically map the port in the host.
The format is WORKDIR<工作目录路径>.
Use the WORKDIR directive to specify the working directory (or current directory), after which the current directory of each layer will be changed to the specified directory, if the directory does not exist, WORKDIR will help you create the directory.
As mentioned earlier, some common mistakes beginners make are to equate Dockerfile with shell scripts, and this misunderstanding can also lead to errors such as the following:
If you run this Dockerfile into a build image, you will find that the /app/world.txt file cannot be found, or its content is not hello. The reason is actually very simple, in the shell, two consecutive lines are the same process execution environment, so the memory state modified by the previous command will directly affect the latter command; In Dockerfile, the execution environment of these two lines of RUN commands is fundamentally different, and they are two completely different containers. This is the error caused by a lack of understanding of Dockerfile’s concept of building tiered storage.
As mentioned earlier, every RUN is about starting a container, executing commands, and then committing storage-tier file changes. The execution of the first-layer RUN cd /app is simply a change in the working directory of the current process, a change in memory, and the result is no file change. When it comes to the second layer, it is a completely new container that starts up, which has nothing to do with the container of the first layer, and naturally cannot inherit the memory changes in the construction process of the previous layer.
Therefore, if you need to change the location of the working directory of each layer in the future, you should use the WORKDIR directive.
Original article: https://micromaple.blog.csdn.net/article/details/125804242 Author: Micromaple Micromaple
– END –