Latest version and main are impacted and any releases since Jun 27, 2023 (bug introduced in #1296).
Both func constrainLabels and func constrainLabels return the reference to the user's label list or map (when there's no label constraints), instead of creating a new copy (this is the root cause of the potential race conditions):
|
func constrainLabels(desc *Desc, labels Labels) (Labels, func()) { |
|
if len(desc.variableLabels.labelConstraints) == 0 { |
|
// Fast path when there's no constraints |
|
return labels, func() {} |
|
} |
|
func constrainLabelValues(desc *Desc, lvs []string, curry []curriedLabelValue) []string { |
|
if len(desc.variableLabels.labelConstraints) == 0 { |
|
// Fast path when there's no constraints |
|
return lvs |
|
} |
Creating a handful of potential race conditions in functions utilizing these two functions like func (m *MetricVec) GetMetricWith:
|
func (m *MetricVec) GetMetricWith(labels Labels) (Metric, error) { |
|
labels, closer := constrainLabels(m.desc, labels) |
|
defer closer() |
|
|
|
h, err := m.hashLabels(labels) |
|
if err != nil { |
|
return nil, err |
|
} |
|
|
|
return m.getOrCreateMetricWithLabels(h, labels, m.curry), nil |
|
} |
At best, when a race conditions is encountered, metrics will be unintentionally written with incorrect labels. At worst:
- potentially crashing the program if the race condition is hit just right, triggering a
fatal error: concurrent map iteration and map write
- metric labels values could change to a previous written value set, triggering
collected before with the same name and label values errors
For example, this is a reproducible race condition in func (m *MetricVec) GetMetricWith:
- initial call
myCounterVec.With(myLabels) with myLabels := map[string]string{"foo": "bar"} and assuming there no label constraints
1 which will call c, err := v.GetMetricWith(labels) and will make these internal calls
labels, closer := constrainLabels(m.desc, labels)
labels still points to the same map as myLabels due to there being no label constraints being used
h, err := m.hashLabels(labels)
2. h being the hash of the labels values of map[string]string{"foo": "bar"}, i.e.: hash of "bar"
return m.getOrCreateMetricWithLabels(h, labels, m.curry), nil, will make the following calls
- assuming a metric didn't exist for the hash value
hash, then metric, ok := m.getMetricWithHashAndLabels(hash, labels, curry) will results in ok := False
- leading to the creation of a new metric
lvs := extractLabelValues(m.desc, labels, curry)
- since
labels still points to the same map as myLabels, then a value change from "foo": "bar" to "foo": "cake" in myLabels will be reflected in labels
m.metrics[hash] = append(m.metrics[hash], metricWithLabelValues{values: lvs, metric: metric})
- assuming that label value change occurred in
myLabels right before lvs := extractLabelValues(m.desc, labels, curry) was executed, then you'll have a miss match between hash and the new metric metric = m.newMetric(lvs...)
The miss match between hash and the new metric, will either:
- assuming the new metric previously didn't exist
- potentially crash the program due to a
fatal error: concurrent map iteration and map write if the race condition is hit just right
- the new metric will unintentionally be written with the wrong labels
- assuming it did exist
- again, potentially triggering a
fatal error: concurrent map iteration and map write crash
mfs, done, err := reg.Gather() returns a collected before with the same name and label values error due to its checkMetricConsistency call and will either, depending the prom http setting
- PanicOnError - panic due to the error
- HTTPErrorOnError - return 500s with the body as the error when metrics are scraped
- ContinueOnError - ignores the error (besides logging it which is all three cases do) or report the error if no metrics have been gathered
Latest version and
mainare impacted and any releases since Jun 27, 2023 (bug introduced in #1296).Both
func constrainLabelsandfunc constrainLabelsreturn the reference to the user's label list or map (when there's no label constraints), instead of creating a new copy (this is the root cause of the potential race conditions):client_golang/prometheus/vec.go
Lines 665 to 669 in bf37be4
client_golang/prometheus/vec.go
Lines 684 to 688 in bf37be4
Creating a handful of potential race conditions in functions utilizing these two functions like
func (m *MetricVec) GetMetricWith:client_golang/prometheus/vec.go
Lines 238 to 248 in bf37be4
At best, when a race conditions is encountered, metrics will be unintentionally written with incorrect labels. At worst:
fatal error: concurrent map iteration and map writecollected before with the same name and label valueserrorsFor example, this is a reproducible race condition in
func (m *MetricVec) GetMetricWith:myCounterVec.With(myLabels)withmyLabels := map[string]string{"foo": "bar"}and assuming there no label constraints1 which will call
c, err := v.GetMetricWith(labels)and will make these internal callslabels, closer := constrainLabels(m.desc, labels)labelsstill points to the same map asmyLabelsdue to there being no label constraints being usedh, err := m.hashLabels(labels)2.
hbeing the hash of the labels values ofmap[string]string{"foo": "bar"}, i.e.: hash of"bar"return m.getOrCreateMetricWithLabels(h, labels, m.curry), nil, will make the following callshash, thenmetric, ok := m.getMetricWithHashAndLabels(hash, labels, curry)will results inok := Falselvs := extractLabelValues(m.desc, labels, curry)labelsstill points to the same map asmyLabels, then a value change from"foo": "bar"to"foo": "cake"inmyLabelswill be reflected inlabelsm.metrics[hash] = append(m.metrics[hash], metricWithLabelValues{values: lvs, metric: metric})myLabelsright beforelvs := extractLabelValues(m.desc, labels, curry)was executed, then you'll have a miss match betweenhashand the new metricmetric = m.newMetric(lvs...)The miss match between
hashand the new metric, will either:fatal error: concurrent map iteration and map writeif the race condition is hit just rightfatal error: concurrent map iteration and map writecrashmfs, done, err := reg.Gather()returns acollected before with the same name and label valueserror due to itscheckMetricConsistencycall and will either, depending the prom http setting