Published on

Unexpected Trace Data Behavior in Google Cloud Trace with OpenTelemetry

Authors
  • avatar
    Name
    Jay
    Twitter

I have applications running on GKE (Google Kubernetes Engine), and I’ve configured OpenTelemetry to collect telemetry data—including logs, metrics, and traces. In other cloud environments, I used tools like Tempo, Loki, and Thanos to store this telemetry data.

However, Google Cloud offers a fully managed monitoring service that can significantly reduce the operational overhead of running these tools yourself. Thanks to the Google Cloud Exporter, integrating with the OpenTelemetry Collector is straightforward.

Everything was working well and the setup was simple. But then I noticed something odd in the trace data later. That’s where my story begins.

Unexpected JDBC Format in Node.js Trace Data

While analyzing trace data from a Node.js application, I noticed something odd in the db.connection_string attribute:

jdbc:mysql://127.0.0.1:3306/my_db

Wait, JDBC? That caught me off guard. why is a Node.js application emitting a JDBC-style connection string?

The trace was generated by the mysql2 instrumentation package. Naturally, I wanted to understand why a JDBC prefix was appearing in trace data from a JavaScript runtime. So I did some digging.

It turns out the MySQL2 instrumentation itself generates this format! Here’s the relevant code from the library:

export function getConnectionAttributes(config: Config): Attributes {
  const { host, port, database, user } = getConfig(config)
  const portNumber = parseInt(port, 10)
  if (!isNaN(portNumber)) {
    return {
      [SEMATTRS_NET_PEER_NAME]: host,
      [SEMATTRS_NET_PEER_PORT]: portNumber,
      [SEMATTRS_DB_CONNECTION_STRING]: getJDBCString(host, port, database),
      [SEMATTRS_DB_NAME]: database,
      [SEMATTRS_DB_USER]: user,
    }
  }
  return {
    [SEMATTRS_NET_PEER_NAME]: host,
    [SEMATTRS_DB_CONNECTION_STRING]: getJDBCString(host, port, database),
    [SEMATTRS_DB_NAME]: database,
    [SEMATTRS_DB_USER]: user,
  }
}

As you can see, the instrumentation calls getJDBCString to generate the connection string, even in a Node.js context.

But why did they choose to use a JDBC-style format?

To investigate further, I checked the README of the MySQL2 instrumentation package. It states:

This package uses @opentelemetry/semantic-conventions version 1.22+, which implements Semantic Convention Version 1.7.0

So I followed the trail to the OpenTelemetry Specification for Semantic Conventions v1.7.0, specifically the section related to databases. Here’s the relevant snippet:

- id: connection_string
	tag: connection-level
	type: string
	brief: >
			The connection string used to connect to the database.
			It is recommended to remove embedded credentials.
	examples: 'Server=(localdb)\v11.0;Integrated Security=true;'

Unfortunately, it doesn’t directly mention the JDBC format. Still curious, I browsed through related GitHub issues and eventually found this one, which indirectly suggested that earlier specifications or conventions may have encouraged using JDBC-style strings—even outside of Java environments.

So in the end, it turns out that seeing a JDBC-style connection string in Node.js trace data is normal behavior, at least when using this instrumentation. It’s hardcoded by the library itself.

db.statement information Was Truncated

I discovered a limitation with the GCP-managed Cloud Trace service: the db.statement attribute was being truncated. This attribute is particularly useful for identifying which query a client is executing for each URL, but in GCP Cloud Trace, the query information was incomplete and cut off.

According to the GCP official documentation, there’s a strict limit:

Maximum size of value for a label or attribute: 256 bytes

This explains why the full SQL query wasn’t visible in the trace.

db.name Is Not Following the Spec

Since I couldn’t always determine which database a client was querying based solely on the (truncated) db.statement, I started paying more attention to the db.name attribute.

Even if you configure a single database for the initial connection using the mysql2 library, you can still issue queries to other logical databases. According to the OpenTelemetry semantic conventions (version 1.7.0), the db.name attribute is call-level and should reflect the actual database targeted by a given statement.

From the spec:

- id: name
	tag: call-level
	type: string
	required:
		conditional: >
			Required, if applicable and no more-specific attribute is defined.
	brief: >
		If no [tech-specific attribute](#call-level-attributes-for-specific-technologies)
		is defined, this attribute is used to report the name of the database being accessed.
		For commands that switch the database, this should be set to the target database
		(even if the command fails).
	note: >
		In some SQL databases, the database name to be used is called "schema name".
	examples: [ 'customers', 'main' ]

So what happens when you run a query like this?

SELECT * FROM `other_db.some_table`;

According to the spec, the db.name should be other_db. However, in practice, db.name always reflects the database configured during the initial connection.

This behavior can be traced to the implementation of the mysql2 instrumentation package, which sets db.name based on the connection config rather than parsing the query dynamically:

export function getConnectionAttributes(config: Config): Attributes {
  const { host, port, database, user } = getConfig(config)
  const portNumber = parseInt(port, 10)
  if (!isNaN(portNumber)) {
    return {
      [SEMATTRS_NET_PEER_NAME]: host,
      [SEMATTRS_NET_PEER_PORT]: portNumber,
      [SEMATTRS_DB_CONNECTION_STRING]: getJDBCString(host, port, database),
      [SEMATTRS_DB_NAME]: database,
      [SEMATTRS_DB_USER]: user,
    }
  }
  return {
    [SEMATTRS_NET_PEER_NAME]: host,
    [SEMATTRS_DB_CONNECTION_STRING]: getJDBCString(host, port, database),
    [SEMATTRS_DB_NAME]: database,
    [SEMATTRS_DB_USER]: user,
  }
}

This means that even if you query a different logical database, the instrumentation will always report the original connection database as db.name.

Attributes Were Not Consistent from the Same Source

I noticed another strange behavior in GCP Cloud Trace: the attribute keys in the trace data were not consistent. Initially, I assumed that this might be because two different sources were sending trace data. However, the real cause turned out to be a strict limitation in GCP Cloud Trace:

Maximum number of labels or attributes per span: 32

Since the spans had more than 32 attributes, Cloud Trace may simply drop the excess ones. Because the order of attributes isn’t guaranteed, it might drop different attributes even for spans from the same source. As a result, the attributes displayed in the UI can vary between otherwise identical spans, giving the impression that the attribute keys are inconsistent.

Conclusion

I noticed some odd behavior in the trace data from GCP Cloud Trace and initially suspected that my system wasn’t storing trace data correctly. In the end, there was no issue with my setup. The problem came from the mysql2 instrumentation package, which strictly adheres to OpenTelemetry Specification v1.7.0 and sets the JDBC string format to the db.connection_string attribute.

Due to GCP Cloud Trace limits—specifically, the maximum size of an attribute value and the maximum number of attributes per span—some attribute values were truncated, and others were dropped entirely. This made the trace data appear inconsistent, even though it came from a single, identical source.

While digging deeper to understand the issue, I also noticed that the db.name attribute is defined at the call level in the OpenTelemetry spec. However, the mysql2 instrumentation doesn’t fully follow this when you’re querying other logical databases from the same application. This could be an interesting problem to explore further—perhaps as a future side project to improve the library.

Another idea that came to mind: it would be useful if users could choose which version of the semantic conventions an instrumentation library follows. At the time of writing, the latest OpenTelemetry specification version is v1.36.0, while the db.connection_string attribute was removed in v1.25.0.

I understand that upgrading to newer spec versions is difficult due to compatibility concerns. especially if your monitoring systems rely on certain attributes. But currently, for example, the mysql2 instrumentation is locked into v1.7.0. What if the library allowed users to select a semantic version, and the instrumentation adapted its behavior accordingly?