Splunk Tech Talks
Deep-dives for technical practitioners.

Starting With Observability: OpenTelemetry Best Practices

LesediK
Splunk Employee
Splunk Employee

Tech Talk

Starting With Observability: OpenTelemetry Best Practices

 

In this Tech Talk , learn about what OpenTelemetry is, why it’s the future of Observability, and how to use the Splunk Distribution of the OpenTelemetry collector in a demo. We’ll also discuss design considerations about an OpenTelemetry collector deployment and the components of the OpenTelemetry pipeline and how they can impact performance.

 

Read the Tech Talk blog, Observability Unveiled: Navigating OpenTelemetry's Framework and Deployment Options

 

LesediK
Splunk Employee
Splunk Employee

What's next?

Watch the OpenTelemetry: What’s Next. Logs, Profiles, and More Tech Talk

Here are a few more resources you might find helpful:

Training Courses:


Certification: 
Splunk O11y Cloud Certified Metrics User > Get Started


Documentation: 
Overview of the Splunk Distribution of OpenTelemetry Collector > Learn More

App: https://opentelemetry.io

zpage links

 

CNCF Slack, Splunk PS

Wouldn’t it be great to talk to experts in real-time about your specific problem?

Join the CNCF Slack, #opentelemetry

Also… if you’re a Splunk customer with a PS entitlement… use it!

LesediK
Splunk Employee
Splunk Employee

Here are a few top of mind questions from the live Tech Talk

 

Q. Where is the line between Operations and Maintenance and Observability? And how fine is that line?

A. Generally, I'd say that maintenance and operations are pretty similar in scope and generally focus on the infrastructure and applications that operate on that infrastructure, whereas Observability shines when trying to deal with customer problems and customer experience. There's not any 'correct' answer here, I don't think.

LesediK_7-1676518142760.png

 

Q. Is OpenTelemetry like Agents that we deployed to collect telemetry data?

A. OpenTelemetry includes a collector that has a similar function to 'agents' in other systems, but OpenTelemetry is more than just the collector - it includes a wire format and SDKs, as well as conventions for how data should be sent. It's a broader ecosystem vs. just being 'an agent'.

LesediK_7-1676518142760.png

Q. How about WebLogic?

A. There is a WebLogic exporter (not supported by Splunk) that looks like it emits Prometheus-format metrics. These can be ingested through the OTel collector and then emitted anywhere. https://github.com/oracle/weblogic-monitoring-exporter

LesediK_7-1676518142760.png

Q. Where can I find ITSI KPI's for K8s?

A. You can deploy Splunk Connect for Kubernetes (SCK) to ingest data into ITSI and then create an SA or other dashboard based on the data from your K8s cluster. For richer support, Splunk Observability Cloud ships with Kubernetes Navigator that provides out-of-the-box data on health and performance of your cluster and pods.

LesediK_7-1676518142760.png

Q. Does native OTel can be configured or use to send logs/traces/metrics to Splunk Enterprise as backend?

A. You can send logs and metrics to Splunk Enterprise using the splunk_hec exporter and an HEC. Traces can't be sent to Splunk Enterprise natively (what would it do with them?). 

LesediK_7-1676518142760.png

Q. Can you talk about how Splunk Core Enterprise will relate to OpenTelemetry?

A. Having to set up multiple agents can be toilsome, so we definitely have plans to simplify this process and potentially make it possible to only use one agent for both Observability data and Splunk Enterprise use cases. Look for more in this space throughout the next year.

LesediK_7-1676518142760.png

Q. Can application logs be pushed via OTel agent?

A. Yes, using the filelog receiver you can read a file on the disk and emit entries as OpenTelemetry logs. There's also other receivers for systemd journals and other log methods.

LesediK_7-1676518142760.png

Q. Can you speak briefly to how implementing observability with OpenTelemetry can support a security program? 

A. As I mentioned briefly during Q&A, having this data can help enrich data about threats, making it easier to determine for example if you're experiencing credential stuffing or a boost in users organically. Additionally, the ability to process data in the OpenTelemetry collector is also valuable for security uses – you can prevent PII, PCI, HIPAA or other sensitive/regulated data from ever making it into the Observability backend through a processor, for example.

LesediK_7-1676518142760.png

Q. I want to send traces from a python script. I have python OTel collector installed on windows. I am running splunk-py-trace python script.py 

A. There's a Python guided setup process described in the docs that's easier than manually running it and I'd suggest you give that a try first: https://docs.splunk.com/observability/en/gdi/get-data-in/application/python/instrumentation/instrume...

LesediK_7-1676518142760.png

Q. We use Splunk Enterprise now. With OpenTelemetry added, what can we skip in our environment?

A. Getting started with OpenTelemetry will help future-proof your installation. As you decide to later on adopt Observability tools, you won't have to do as much work to get started. Autoinstrumentation could make getting trace data, for example, as simple as restarting your app.

LesediK_7-1676518142760.png

Q. How pipeline is created from implementation to investigation?

A. First, you need to determine what data sources you want to send to your observability system, then select receivers that can receive that type of data. Next, think about if there is any processing or manipulation you want to do to the data, then finally select an exporter that is supported by your observability tool. It's pretty straightforward when you think about the data flowing like through a pipe - something that takes it in and something that puts it out both need to be there.

LesediK_7-1676518142760.png

Q. Can we send traces from a python script instead of a service.

A. Yes, you can just directly send traces in OTLP format directly to a backend and skip the collector entirely if you want to do that. The options are on this page in the OTel docs: https://opentelemetry.io/docs/instrumentation/python/exporters/

LesediK_7-1676518142760.png

Q. Can Splunk integrate with VictoriaMetrics?

A. There's no direct integration I know of, but from their webpage it looks like VictoriaMetrics does support OTLP, so the OpenTelemetry collector is still useful (and you could use it to send data to both VictoriaMetrics and Splunk, for example.)

LesediK_7-1676518142760.png

Q. How should we install the splunk OTel collector on a machine we do not have backend access to I mean a machine where we can't access file system.

A. If you can't access the filesystem, I'm guessing that you're on some sort of PaaS-type platform in which case I'd check the OpenTelemetry website to see if there's integrations written for that platform and suggest using those instead. You can also just emit data directly from your app to a collector running on another machine where you do have access - the collector isn't mandatory on the application machine.

LesediK_7-1676518142760.png

Q. What port OTel communicate with Splunk Cloud (Direct) or Gateway ?

A. Splunk Cloud HECs receive events on port 443 (https). 

LesediK_7-1676518142760.png

Q. So HOW do you get to zpages screen? Browser to https://0.0.0.0:xxx ?

A. Changing the listen interface to 0.0.0.0 means that you can access the zpages for a collector by going to http://[collectorip]:55679/debug/servicez from any other machine. If you don't change the listen interface, you can only access the zpages from the machine the collector is running on (using http://localhost:55679/debug/servicez)

LesediK_7-1676518142760.png

Q. One gateway [and multiple collectors (based on host) ] must exist for each region? If you have resources spread on multiple regions in cloud platform.

A. It's not required, but recommended to minimize cross-region traffic (you can configure more aggressive batching on the cross-region gateways, for example).

LesediK_7-1676518142760.png

Q. Where can we find the certification program?

A.  https://www.splunk.com/en_us/training/certification-track/splunk-o11y-cloud-certified-metrics-user.h...

LesediK_7-1676518142760.png

Q. YAML config checker is also very useful, correct syntax counts

A. This isn't a question but yes! Use a YAML linter (search for one, there's a ton) and it makes config editing so much easier.

LesediK_7-1676518142760.png

Q. Can you comment on the current status of Splunk OpenTelemetry Operator and whether or not it is also ready for production or still too early to tell? 

A. We rely on (and contribute to) the upstream operator which many customers use in production. See more in the docs: https://docs.splunk.com/observability/en/gdi/opentelemetry/auto-instrumentation/auto-instrumentation... . If you're referring to SOK (used for deploying Splunk Enterprise agents and infrastructure in a K8s environment,) it is Splunk supported and ready for production.

LesediK_7-1676518142760.png

Q. Is it possible to breakout the API capabilities of Splunk Open Telemetry - interested in understanding how it would work with legacy Mulesoft APIs

A. You can learn more about the instrumentation APIs in the OTel docs at https://opentelemetry.io/docs/instrumentation/

LesediK_7-1676518142760.png

Q. We use now Splunk Enterprise with forwarders. If we add OpenTelemetry, what can we skip in our environment?

A. Nothing, right now. OpenTelemetry can emit data to Splunk Enterprise, but this isn't currently a supported use by Splunk. In the future, deploying OpenTelemetry may be supported to send data to Splunk Enterprise, however.

LesediK_7-1676518142760.png

Q. You mentioned unhashing the zpages in the otel collector and then rehashing on the host. How is that done? can you speak a bit more on that? Also, is that possible with MSI Otel collector set up?

A. You'll need to stop the service, edit the agent_config.yaml file to enable zpages and change the listening endpoint (if you want to do that,) then restart the service. For the MSI version of our distro, you'll see instructions on how to edit the config here: https://github.com/signalfx/splunk-otel-collector/blob/main/docs/getting-started/windows-manual.md 

LesediK_7-1676518142760.png

Q. Is Observability included with Splunk Cloud or it is an additional module that can be purchased? Sorry, I understand the question is a little off topic. 

A. Splunk Observability Cloud is a separate product, not included with Splunk Cloud. Learn more at https://splunk.com/o11y . There's a 14-day free trial (or your salesperson can get you a custom one.)

LesediK_7-1676518142760.png

 

LesediK
Splunk Employee
Splunk Employee

And there's more!

Answers to your most pressing OpenTelemetry questions!

 

Q. What are the differences between the Spunk distro and OSS OTel? What has Splunk added that's not in OSS OTel? 

A. You can see more about our distribution at https://github.com/signalfx/splunk-otel-collector . Some of the more notable differences are packaging (e.g. the Windows MSI) and some of the host monitoring features previously provided by the SignalFx Smart Agent.

LesediK_7-1676518142760.pngQ. Is there a means to centrally manage all open telemetry agents?

A. This can be done with something like OpAMP which is in active development. There's some good info on this in the OTel docs: https://opentelemetry.io/docs/collector/management/ . Of course, if you're using K8s, we provide a helm chart that can help centralize deployment.

LesediK_7-1676518142760.pngQ. Since I already have the json format data in splunk enterprise or splunk cloud. Can I make use of that ingested events to push the data in observability cloud using otel collector add-on or splunk hec exporter ?

A. I don't believe this is possible right now, but I'd recommend you describe your use case a little more to an engineer to see if we have another way to accomplish what you need. https://www.splunk.com/en_us/about-splunk/contact-us.html

LesediK_7-1676518142760.png

Q. From a roadmap perspective, what sort of event message queues are you thinking of?

A. I don't know what's on the Splunk roadmap as far as additional MQ systems, but message queues are something that OpenTelemetry has significant semantic conventions for, so a lot of the tough work has been done. You can check out the OpenTelemetry registry to see if there's any work being done by others for the tech you use: https://opentelemetry.io/ecosystem/registry/?s=MQ&component=&language=

LesediK_7-1676518142760.png

Q. Did I hear correctly that OTEL telemetry back to splunk enterprise is possible but not currently supported?

A. Yes - using the splunk_hec exporter you can send data to Splunk Enterprise, but this is not currently supported by Splunk Support.

LesediK_7-1676518142760.pngQ. With Splunk and AppDynamics both now a Cisco product, will you recommend using Splunk vs Appdynamics for OpenTelemetry or are they both similar integration for OTel?

A. As the acquisition of Splunk by Cisco has not yet been fully approved and finalized, I can't go into detail on this. I will say that in general I still think OpenTelemetry is the right way to future-proof an Observability practice no matter which backend

LesediK_7-1676518142760.pngQ. Why do we need Observability when we have monitoring solution?

A. Observability is really useful to find out where things are going wrong vs. just "something is wrong". Using tracing is basically mandatory in modern, complex, distributed applications. Monitoring tools can't give you enough information to accurately and rapidly troubleshoot.

 

LesediK_7-1676518142760.png

Q. How flexible is the OTel Collector in routing to multiple Splunk indexes in Splunk Cloud

A. This is possible to do (you can send to an arbitrary number of them if you really want to,) but use of the OpenTelemetry collector to send data to Splunk Cloud is not officially supported by Splunk at this time. You'll want to think about performance impact on the collector, as well as the data volume you're emitting to Splunk Cloud.

LesediK_7-1676518142760.png

Q. Can you provide the definition that Splunk is using for "Observability" please?

A. Observability is the ability to measure the internal states of a system by examining its outputs.

LesediK_7-1676518142760.png

Q. I believe Splunk offers support when using OpenTelemetry on Kuberenetes, does it also offer support when using OpenTelemetry instead of Universal Forwarder?

A. Splunk offers support for the Splunk Distribution of OpenTelemetry Collector.

LesediK_7-1676518142760.png

 

Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...