Chapter 4. Exposition
In Chapter 3 I mainly focused on adding instrumentation to your code. But all the instrumentation in the world isnât much use if the metrics produced donât end up in your monitoring system. The process of making metrics available to Prometheus is known as exposition.
Exposition to Prometheus is done over HTTP. Usually you expose metrics under the /metrics path, and the request is handled for you by a client library. Prometheus uses a human-readable text format, so you also have the option of producing the exposition format by hand. You may choose to do this if there is no suitable library for your language, but it is recommended you use a library as itâll get all the little details like escaping correct.
Exposition is typically done either in your main function or another top-level function and only needs to be configured once per application.
Metrics are usually registered with the default registry when you define them. If one of the libraries you are depending on has Prometheus instrumentation, the metrics will be in the default registry and you will gain the benefit of that additional instrumentation without having to do anything. Some users prefer to explicitly pass a registry all the way down from the main function, so youâd have to rely on every library between your applicationâs main function and the Prometheus instrumentation being aware of the instrumentation. This presumes that every library in the dependency chain cares about instrumentation and agrees on the choice of instrumentation libraries.
This design allows for instrumentation for Prometheus metrics with no exposition at all. In that case, aside from still paying the (tiny) resource cost of instrumentation, there is no impact on your application. If you are the one writing a library you can add instrumentation for your users using Prometheus without requiring extra effort for your users who donât monitor. To better support this use case, the instrumentation parts of client libraries try to minimise their dependencies.
Letâs take a look at exposition in some of the popular client libraries. I am going to presume here that you know how to install the client libraries and any other required dependencies.
Python
You have already seen start_http_server
in Chapter 3. It
starts up a background thread with a HTTP server that only serves
Prometheus metrics.
from
prometheus_client
import
start_http_server
if
__name__
==
'__main__'
:
start_http_server
(
8000
)
//
Your
code
goes
here
.
start_http_server
is very convenient to get up and running quickly. But it is
likely that you already have a HTTP server in your application that you
would like your metrics to be served from.
In Python there are various ways this can be done depending on which frameworks you are using.
WSGI
Web Server Gateway Interface (WSGI) is a Python standard for web applications.
The Python client provides a WSGI app that you can use with your existing WSGI
code. In Example 4-1 the metrics_app
is delegated to by my_app
if
the /metrics path is requested; otherwise, it performs its usual logic. By
chaining WSGI applications you can add middleware such as authentication, which
client libraries do not offer out of the box.
Example 4-1. Exposition using WSGI in Python
from
prometheus_client
import
make_wsgi_app
from
wsgiref.simple_server
import
make_server
metrics_app
=
make_wsgi_app
()
def
my_app
(
environ
,
start_fn
):
if
environ
[
'PATH_INFO'
]
==
'/metrics'
:
return
metrics_app
(
environ
,
start_fn
)
start_fn
(
'200 OK'
,
[])
return
[
b
'Hello World'
]
if
__name__
==
'__main__'
:
httpd
=
make_server
(
''
,
8000
,
my_app
)
httpd
.
serve_forever
()
Twisted
Twisted is a Python event-driven network engine. It supports WSGI so you can
plug in make_wsgi_app
, as shown in Example 4-2.
Example 4-2. Exposition using Twisted in Python
from
prometheus_client
import
make_wsgi_app
from
twisted.web.server
import
Site
from
twisted.web.wsgi
import
WSGIResource
from
twisted.web.resource
import
Resource
from
twisted.internet
import
reactor
metrics_resource
=
WSGIResource
(
reactor
,
reactor
.
getThreadPool
(),
make_wsgi_app
())
class
HelloWorld
(
Resource
):
isLeaf
=
False
def
render_GET
(
self
,
request
):
return
b
"Hello World"
root
=
HelloWorld
()
root
.
putChild
(
b
'metrics'
,
metrics_resource
)
reactor
.
listenTCP
(
8000
,
Site
(
root
))
reactor
.
run
()
Multiprocess with Gunicorn
Prometheus assumes that the applications it is monitoring are long-lived and multithreaded. But this can fall apart a little with runtimes such as CPython.1 CPython is effectively limited to one processor core due to the Global Interpreter Lock (GIL). To work around this, some users spread the workload across multiple processes using a tool such as Gunicorn.
If you were to use the Python client library in the usual fashion, each worker would track its own metrics. Each time Prometheus went to scrape the application, it would randomly get the metrics from only one of the workers, which would be only a fraction of the information and would also have issues such as counters appearing to be going backwards. Workers can also be relatively short-lived.
The solution to this problem offered by the Python client is to have each
worker track its own metrics. At exposition time all the metrics of all the
workers are combined in a way that provides the semantics you would get from a
multithreaded application. There are some limitations to the approach used, the
process_
metrics and custom collectors will not be exposed, and the
Pushgateway cannot be used.2
Using Gunicorn, you need to let the client library know when a worker process exits.3 This is done in a config file like the one in Example 4-3.
Example 4-3. Gunicorn config.py to handle worker processes exiting
from
prometheus_client
import
multiprocess
def
child_exit
(
server
,
worker
):
multiprocess
.
mark_process_dead
(
worker
.
pid
)
You will also need an application to serve the metrics. Gunicorn uses WSGI, so you can use
make_wsgi_app
. You must create a custom registry containing only a
MultiProcessCollector
for exposition, so that it does not include both the
multiprocess metrics and metrics from the local default registry (Example 4-4).
Example 4-4. Gunicorn application in app.py
from
prometheus_client
import
multiprocess
,
make_wsgi_app
,
CollectorRegistry
from
prometheus_client
import
Counter
,
Gauge
REQUESTS
=
Counter
(
"http_requests_total"
,
"HTTP requests"
)
IN_PROGRESS
=
Gauge
(
"http_requests_inprogress"
,
"Inprogress HTTP requests"
,
multiprocess_mode
=
'livesum'
)
@IN_PROGRESS.track_inprogress
()
def
app
(
environ
,
start_fn
):
REQUESTS
.
inc
()
if
environ
[
'PATH_INFO'
]
==
'/metrics'
:
registry
=
CollectorRegistry
()
multiprocess
.
MultiProcessCollector
(
registry
)
metrics_app
=
make_wsgi_app
(
registry
)
return
metrics_app
(
environ
,
start_fn
)
start_fn
(
'200 OK'
,
[])
return
[
b
'Hello World'
]
As you can see in Example 4-4, counters work normally, as do
summarys and histograms. For gauges there is additional optional configuration
using multiprocess_mode
. You can configure the gauge based on how you intended to
use the gauge, as follows:
all
-
The default, it returns a time series from each process, whether it is alive or dead. This allows you to aggregate the series as you wish in PromQL. They will be distinguished by a
pid
label. liveall
-
Returns a time series from each alive process.
livesum
-
Returns a single time series that is the sum of the value from each alive process. You would use this for things like in-progress requests or resource usage across all processes. A process might have aborted with a nonzero value, so dead processes are excluded.
max
-
Returns a single time series that is the maximum of the value from each alive or dead process. This is useful if you want to track the last time something happened such as a request being processed, which could have been in a process that is now dead.
min
-
Returns a single time series that is the minimum of the value from each alive or dead process.
There is a small bit of setup before you can run Gunicorn as shown in
Example 4-5. You must set an environment variable called
prometheus_multiproc_dir
. This points to an empty directory the client
library uses for tracking metrics. Before starting the application, you should
always wipe this directory to handle any potential changes to your
instrumentation.
Example 4-5. Preparing the environment before starting Gunicorn with two workers
hostname $ export prometheus_multiproc_dir=$PWD/multiproc hostname $ rm -rf $prometheus_multiproc_dir hostname $ mkdir -p $prometheus_multiproc_dir hostname $ gunicorn -w 2 -c config.py app:app [2018-01-07 19:05:30 +0000] [9634] [INFO] Starting gunicorn 19.7.1 [2018-01-07 19:05:30 +0000] [9634] [INFO] Listening at: http://127.0.0.1:8000 (9634) [2018-01-07 19:05:30 +0000] [9634] [INFO] Using worker: sync [2018-01-07 19:05:30 +0000] [9639] [INFO] Booting worker with pid: 9639 [2018-01-07 19:05:30 +0000] [9640] [INFO] Booting worker with pid: 9640
When you look at the /metrics you will see the two defined metrics, but
python_info
and the process_
metrics will not be there.
Tip
Each process creates several files that must be read at exposition time in
prometheus_multiproc_dir
. If your workers stop and start a lot, this can make
exposition slow when you have thousands of files.
It is not safe to delete individual files as that could cause counters to incorrectly go backwards, but you can either try to reduce the churn (for example, by increasing or removing a limit on the number of requests workers handle before exiting4), or regularly restarting the application and wiping the files.
These steps are for Gunicorn. The same approach also works with other Python
multiprocesses setups, such as using the multiprocessing
module.
Go
In Go, http.Handler
is the standard interface for providing HTTP handlers, and
promhttp.Handler
provides that interface for the Go client library. You
should place the code in Example 4-6 in a file called example.go.
Example 4-6. A simple Go program demonstrating instrumentation and exposition
package
main
import
(
"log"
"net/http"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
"github.com/prometheus/client_golang/prometheus/promhttp"
)
var
(
requests
=
promauto
.
NewCounter
(
prometheus
.
CounterOpts
{
Name
:
"hello_worlds_total"
,
Help
:
"Hello Worlds requested."
,
})
)
func
handler
(
w
http
.
ResponseWriter
,
r
*
http
.
Request
)
{
requests
.
Inc
()
w
.
Write
([]
byte
(
"Hello World"
))
}
func
main
()
{
http
.
HandleFunc
(
"/"
,
handler
)
http
.
Handle
(
"/metrics"
,
promhttp
.
Handler
())
log
.
Fatal
(
http
.
ListenAndServe
(
":8000"
,
nil
))
}
You can fetch dependencies and run this code in the usual way.
hostname $ go get -d -u github.com/prometheus/client_golang/prometheus hostname $ go run example.go
This example uses promauto
, which will automatically register your metric
with the default registry. If you do not wish to do so you can use
prometheus.NewCounter
instead and then use MustRegister
in an init
function:
func init() { prometheus.MustRegister(requests) }
This is a bit more fragile, as it is easy for you to create and use the metric
but forget the MustRegister
call.
Java
The Java client library is also known as the simpleclient. It replaced the original client, which was developed before many of the current practices and guidelines around how to write a client library were established. The Java client should be used for any instrumentation for languages running on a Java Virtual Machine (JVM).
HTTPServer
Similar to start_http_server
in Python, the HTTPServer
class in the Java
client gives you an easy way to get up and running (Example 4-7).
Example 4-7. A simple Java program demonstrating instrumentation and exposition
import
io.prometheus.client.Counter
;
import
io.prometheus.client.hotspot.DefaultExports
;
import
io.prometheus.client.exporter.HTTPServer
;
public
class
Example
{
private
static
final
Counter
myCounter
=
Counter
.
build
()
.
name
(
"my_counter_total"
)
.
help
(
"An example counter."
).
register
();
public
static
void
main
(
String
[]
args
)
throws
Exception
{
DefaultExports
.
initialize
();
HTTPServer
server
=
new
HTTPServer
(
8000
);
while
(
true
)
{
myCounter
.
inc
();
Thread
.
sleep
(
1000
);
}
}
}
You should generally have Java metrics as class static fields, so that they are only registered once.
The call to DefaultExports.initialize
is needed for the various process
and
jvm
metrics to work. You should generally call it once in all of your Java
applications, such as in the main function. However, DefaultExports.initialize
is idempotent and thread-safe, so additional calls are harmless.
In order to run the code in Example 4-7 you will need the
simpleclient dependencies. If you are using Maven,
Example 4-8 is what the dependencies
in your pom.xml
should look like.
Example 4-8. pom.xml dependencies for Example 4-7
<dependencies>
<dependency>
<groupId>
io.prometheus</groupId>
<artifactId>
simpleclient</artifactId>
<version>
0.3.0</version>
</dependency>
<dependency>
<groupId>
io.prometheus</groupId>
<artifactId>
simpleclient_hotspot</artifactId>
<version>
0.3.0</version>
</dependency>
<dependency>
<groupId>
io.prometheus</groupId>
<artifactId>
simpleclient_httpserver</artifactId>
<version>
0.3.0</version>
</dependency>
</dependencies>
Servlet
Many Java and JVM frameworks support using subclasses of HttpServlet in their
HTTP servers and middleware. Jetty is one such server, and you can see how to
use the Java clientâs MetricsServlet
in Example 4-9.
Example 4-9. A Java program demonstrating exposition using MetricsServlet and Jetty
import
io.prometheus.client.Counter
;
import
io.prometheus.client.exporter.MetricsServlet
;
import
io.prometheus.client.hotspot.DefaultExports
;
import
javax.servlet.http.HttpServlet
;
import
javax.servlet.http.HttpServletRequest
;
import
javax.servlet.http.HttpServletResponse
;
import
javax.servlet.ServletException
;
import
org.eclipse.jetty.server.Server
;
import
org.eclipse.jetty.servlet.ServletContextHandler
;
import
org.eclipse.jetty.servlet.ServletHolder
;
import
java.io.IOException
;
public
class
Example
{
static
class
ExampleServlet
extends
HttpServlet
{
private
static
final
Counter
requests
=
Counter
.
build
()
.
name
(
"hello_worlds_total"
)
.
help
(
"Hello Worlds requested."
).
register
();
@Override
protected
void
doGet
(
final
HttpServletRequest
req
,
final
HttpServletResponse
resp
)
throws
ServletException
,
IOException
{
requests
.
inc
();
resp
.
getWriter
().
println
(
"Hello World"
);
}
}
public
static
void
main
(
String
[]
args
)
throws
Exception
{
DefaultExports
.
initialize
();
Server
server
=
new
Server
(
8000
);
ServletContextHandler
context
=
new
ServletContextHandler
();
context
.
setContextPath
(
"/"
);
server
.
setHandler
(
context
);
context
.
addServlet
(
new
ServletHolder
(
new
ExampleServlet
()),
"/"
);
context
.
addServlet
(
new
ServletHolder
(
new
MetricsServlet
()),
"/metrics"
);
server
.
start
();
server
.
join
();
}
}
You will also need to specify the Java client as a dependency. If you are using Maven, this will look like Example 4-10.
Example 4-10. pom.xml dependencies for Example 4-9
<dependencies>
<dependency>
<groupId>
io.prometheus</groupId>
<artifactId>
simpleclient</artifactId>
<version>
0.3.0</version>
</dependency>
<dependency>
<groupId>
io.prometheus</groupId>
<artifactId>
simpleclient_hotspot</artifactId>
<version>
0.3.0</version>
</dependency>
<dependency>
<groupId>
io.prometheus</groupId>
<artifactId>
simpleclient_servlet</artifactId>
<version>
0.3.0</version>
</dependency>
<dependency>
<groupId>
org.eclipse.jetty</groupId>
<artifactId>
jetty-servlet</artifactId>
<version>
8.2.0.v20160908</version>
</dependency>
</dependencies>
Pushgateway
Batch jobs are typically run on a regular schedule, such as hourly or daily. They start up, do some work, and then exit. As they are not continuously running, Prometheus canât exactly scrape them.5 This is where the Pushgateway comes in.
The Pushgateway6 is a metrics cache for service-level batch jobs. Its architecture is shown in Figure 4-1. It remembers only the last push that you make to it for each batch job. You use it by having your batch jobs push their metrics just before they exit. Prometheus scrapes these metrics from your Pushgateway and you can then alert and graph them. Usually you run a Pushgateway beside a Prometheus.
A service-level batch job is one where there isnât really an instance
label
to apply to it. That is to say it applies to all of one of your services, rather
than being innately tied to one machine or process instance.7 If you donât particularly care where a batch job runs
but do care that it happens (even if it happens to currently be set up to run
via cron on one machine), it is a service-level batch job. Examples include a per-datacenter batch job to check for bad machines, or one that performs garbage collection across a whole service.
Note
The Pushgateway is not a way to convert Prometheus from pull to push. If, for example, there are several pushes between one Prometheus scrape and the next, the Pushgateway will only return the last push for that batch job. This is discussed further in âNetworks and Authenticationâ.
You can download the Pushgateway from the Prometheus download page. It is an
exporter that runs by default on port 9091, and Prometheus should be set up to
scrape it. However, you should also provide the honor_labels: true
setting
in the scrape config as shown in Example 4-11. This is
because the metrics you push to the Pushgateway should not have an instance
label, and you do not want the Pushgatewayâs own instance
target label to end
up on the metrics when Prometheus scrapes them.8 honor_labels
is discussed in
âLabel Clashes and honor_labelsâ.
Example 4-11. prometheus.yml scrape config for a local Pushgateway
scrape_configs
:
-
job_name
:
pushgateway
honor_labels
:
true
static_configs
:
-
targets
:
-
localhost:9091
You can use client libraries to push to the Pushgateway. Example 4-12 shows the structure you would use for a Python batch job. A custom registry is created so that only the specific metrics you choose are pushed. The duration of the batch job is always pushed,9 and only if the job is successful is the time it ended at pushed.
There are three different ways you can write to the Pushgateway. In Python
these are the push_to_gateway
, pushadd_to_gateway
, and delete_from_gateway
functions.
push
-
Any existing metrics for this job are removed and the pushed metrics added. This uses the PUT HTTP method under the covers.
pushadd
-
The pushed metrics override existing metrics with the same metric names for this job. Any metrics that previously existed with different metric names remain unchanged. This uses the POST HTTP method under the covers.
delete
-
The metrics for this job are removed. This uses the DELETE HTTP method under the covers.
As Example 4-12 is using pushadd_to_gateway
, the value of my_job_duration_seconds
will always get replaced. However,
my_job_last_success_seconds
will only get replaced if there are no
exceptions; it is added to the registry and then pushed.
Example 4-12. Instrumenting a batch job and pushing its metrics to a Pushgateway
from
prometheus_client
import
CollectorRegistry
,
Gauge
,
pushadd_to_gateway
registry
=
CollectorRegistry
()
duration
=
Gauge
(
'my_job_duration_seconds'
,
'Duration of my batch job in seconds'
,
registry
=
registry
)
try
:
with
duration
.
time
():
# Your code here.
pass
# This only runs if there wasn't an exception.
g
=
Gauge
(
'my_job_last_success_seconds'
,
'Last time my batch job successfully finished'
,
registry
=
registry
)
g
.
set_to_current_time
()
finally
:
pushadd_to_gateway
(
'localhost:9091'
,
job
=
'batch'
,
registry
=
registry
)
You can see pushed data on the status page, as Figure 4-2 shows.
An additional metric push_time_seconds
has been added by the Pushgateway because Prometheus will always use the time at which it scrapes as the
timestamp of the Pushgateway metrics. push_time_seconds
gives you a way to
know the actual time the data was last pushed.
You might have noticed that the push is referred to
as a group. You can provide labels in addition to the job
label when
pushing, and all of these labels are known as the grouping key. In Python
this can be provided with the grouping_key
keyword argument. You would use
this if a batch job was sharded or split up somehow. For example, if you have 30
database shards and each had its own batch job, you might distinguish them with
a shard
label.
Tip
Once pushed, groups stay forever in the Pushgateway. You should avoid using grouping keys that vary from one batch job run to the next, as this will make the metrics difficult to work with and cause performance issues. When decommissioning a batch job, donât forget to delete its metrics from the Pushgateway.
Bridges
Prometheus client libraries are not limited to outputting metrics in the Prometheus format. There is a separation of concerns between instrumentation and exposition so that you can process the metrics in any way you like.
For example, the Go, Python, and Java clients each include a Graphite bridge. A bridge takes metrics output from the client library registry and outputs it to something other than Prometheus. So the Graphite bridge will convert the metrics into a form that Graphite can understand10 and write them out to Graphite as shown in Example 4-13.
Example 4-13. Using the Python GraphiteBridge to push to Graphite every 10 seconds
import
time
from
prometheus_client.bridge.graphite
import
GraphiteBridge
gb
=
GraphiteBridge
([
'graphite.your.org'
,
2003
])
gb
.
start
(
10
)
while
True
:
time
.
sleep
(
1
)
This works because the registry has a method that allows you to get a snapshot
of all the current metrics. This is CollectorRegistry.collect
in Python,
CollectorRegistry.metricFamilySamples
in Java, and Registry.Gather
in Go.
This is the method that HTTP exposition uses, and you can use it too. For example,
you could use this method to feed data into another non-Prometheus instrumentation
library.11
Tip
If you ever want to hook into direct instrumentation you should instead use
the metrics output by a registry. Wanting to know every time a counter is
incremented does not make sense in terms of a metrics-based monitoring system.
However the count of increments is already provided for you by
CollectorRegistry.collect
and works for custom collectors.
Parsers
In addition to a client libraryâs registry allowing you to access metric output, the Go12 and Python clients also feature a parser for the Prometheus exposition format. Example 4-14 only prints the samples, but you could feed Prometheus metrics into other monitoring systems or into your local tooling.
Example 4-14. Parsing the Prometheus text format with the Python client
from
prometheus_client.parser
import
text_string_to_metric_families
for
family
in
text_string_to_metric_families
(
u
"counter_total 1.0
\n
"
):
for
sample
in
family
.
samples
:
(
"Name: {0} Labels: {1} Value: {2}"
.
format
(
*
sample
))
DataDog, InfluxDB, Sensu, and Metricbeat13 are some of the monitoring systems that have components that can parse the text format. Using one of these monitoring systems, you could take advantage of the Prometheus ecosystem without ever running the Prometheus server. I personally believe that this is a good thing, as there is currently a lot of duplication of effort between the various monitoring systems. Each of them has to write similar code to support the myriad of custom metric outputs provided by the most commonly used software. A project called OpenMetrics aims to work from the Prometheus exposition format and standardise it. Developers from various monitoring systems, including myself, are involved with the OpenMetrics effort.
Exposition Format
The Prometheus text exposition format is relatively easy to produce and parse. Although you should almost always rely on a client library to hande it for you, there are cases such as with the Node exporter textfile collector (discussed in âTextfile Collectorâ) where you may have to produce it yourself.
I will be showing you version 0.0.4 of the text format, which has the content type header
Content-Type: text/plain; version=0.0.4; charset=utf-8
In the simplest cases, the text format is just the name of the metric followed
by a 64-bit floating-point number. Each line is terminated with a line-feed
character (\n
).
my_counter_total 14 a_small_gauge 8.3e-96
Tip
In Prometheus 1.0, a protocol buffer format was also supported as it was slightly (2-3%) more efficient. Only a literal handful of exporters ever exposed just the protocol buffer format. The Prometheus 2.0 storage and ingestion performance improvements are tied to the text format, so it is now the only format.
Metric Types
More
complete output would include the HELP
and TYPE
of the metrics as
shown in Example 4-15. HELP
is a description of what the metric is,
and should not generally change from scrape to scrape. TYPE
is one of
counter
, gauge
, summary
, histogram
, or untyped
. untyped
is used when
you do not know the type of the metric, and is the default if no type is
specified. Prometheus currently throws away HELP
and TYPE
, but they will
be made available to tools like Grafana in the future to aid in writing
queries. It is invalid for you to have a duplicate metric, so make sure all the time series that belong to a metric are grouped together.
Example 4-15. Exposition format for a gauge, counter, summary, and histogram
# HELP example_gauge An example gauge # TYPE example_gauge gauge example_gauge -0.7 # HELP my_counter_total An example counter # TYPE my_counter_total counter my_counter_total 14 # HELP my_summary An example summary # TYPE my_summary summary my_summary_sum 0.6 my_summary_count 19 # HELP my_histogram An example histogram # TYPE my_histogram histogram latency_seconds_bucket{le="0.1"} 7 latency_seconds_bucket{le="0.2"} 18 latency_seconds_bucket{le="0.4"} 24 latency_seconds_bucket{le="0.8"} 28 latency_seconds_bucket{le="+Inf"} 29 latency_seconds_sum 0.6 latency_seconds_count 29
For histograms, the _count
must match the +Inf
bucket, and
the +Inf
bucket must always be present. Buckets should not
change from scrape to scrape, as this will cause problems for PromQLâs
histogram_quantile
function. The le
labels have floating-point values and
must be sorted. You should note how the histogram buckets are cumulative, as
le
stands for less than or equal to.
Labels
The histogram in the preceding example also shows how labels are represented. Multiple labels are separated by commas, and it is okay to have a trailing comma before the closing brace.
The ordering of labels does not matter, but it is a good idea to have the ordering consistent from scrape to scrape. This will make writing your unit tests easier, and consistent ordering ensures the best ingestion performance in Prometheus.
# HELP my_summary An example summary # TYPE my_summary summary my_summary_sum{foo="bar",baz="quu"} 1.8 my_summary_count{foo="bar",baz="quu"} 453 my_summary_sum{foo="blaa",baz=""} 0 my_summary_count{foo="blaa",baz="quu"} 0
It is possible to have a metric with no time series, if no children have been initialised, as discussed in âChildâ.
# HELP a_counter_total An example counter # TYPE a_counter_total counter
Escaping
The format is encoded in UTF-8, and full UTF-814 is permitted in both HELP
and label values. Thus you need to use backslashes to escape characters that would cause issues using backslashes. For HELP
this
is line feeds and backslashes. For label values this is line feeds,
backslashes, and double quotes.15 The format ignores extra whitespace.
# HELP escaping A newline \\n and backslash \\ escaped # TYPE escaping gauge escaping{foo="newline \\n backslash \\ double quote \" "} 1
Timestamps
It is possible to specify a timestamp on a time series. It is an integer value in milliseconds since the Unix epoch,16 and it goes after the value. Timestamps in the exposition format should generally be avoided as they are only applicable in certain limited use cases (such as federation) and come with limitations. Timestamps for scrapes are usually applied automatically by Prometheus. It is not defined as to what happens if you specify multiple lines with the same name and labels but different timestamps.
# HELP foo I'm trapped in a client library # TYPE foo gauge foo 1 15100992000000
check metrics
Prometheus 2.0 uses a custom parser for efficiency. So just because a /metrics can be scraped doesnât mean that the metrics are compliant with the format.
Promtool is a utility included with Prometheus that among other things can verify that your metric output is valid and perform lint checks.
curl http://localhost:8000/metrics | promtool check metrics
Common mistakes include forgetting the line feed on the last line, using carriage return and line feed rather than just line feed,17 and invalid metric or label names. As a brief reminder, metric and label names cannot contain hyphens, and cannot start with a number.
You now have a working knowledge of the text format. The full specification can be found in the official Prometheus documentation.
I have mentioned labels a few times now. In the following chapter youâll learn what they are in detail.
1 CPython is the official name of the standard Python implementation. Do not confuse it with Cython, which can be used to write C extensions in Python.
2 The Pushgateway is not suitable for this use case, so this is not a problem in practice.
3 child_exit
was added in Gunicorn version 19.7 released in March 2017.
4 Gunicornâs --max-requests
flag is one example of such a limit.
5 Though for batch jobs that take more than a few minutes to run, it may also make sense to scrape them normally over HTTP to help debug performance issues.
6 You may see it referenced as pgw in informal contexts.
7 For batch jobs such as database backups that are tied to a machineâs lifecycle, the node exporter textfile collector is a better choice. This is discussed in âTextfile Collectorâ.
8 The Pushgateway explicitly exports empty instance
labels for metrics without an instance
label. Combined with honor_labels: true
, this results in Prometheus not applying an instance
label to these metrics. Usually, empty labels and missing labels are the same thing in Prometheus, but this is the exception.
9 Just like summarys and histograms, gauges have a time function decorator and context manager. It is intended only for use in batch jobs.
10 The labels are flattened into the metric name. Tag (i.e., label) support for Graphite was only recently added in 1.1.0.
11 This works both ways. Other instrumentation libraries with an equivalent feature can have their metrics fed into a Prometheus client library. This is discussed in âCustom Collectorsâ.
12 The Go clientâs parser is the reference implementation.
13 Part of the Elasticsearch stack.
14 The null byte is a valid UTF-8 character.
15 Yes, there are two different sets of escaping rules within the format.
16 Midnight January 1st 1970 UTC.
17 \r\n
is the line ending on Windows, while on Unix, \n
is used. Prometheus has a Unix heritage, so it uses \n
.
Get Prometheus: Up & Running now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.