Hello, following query is slow and processing a lot of data
environment=tesxt earliest=-0d@d (index=iis_openapi OR index=iis OR index=iis1 ) cs_method=POST | regex cs_uri_stem=(?i)"/account/v1/login/forgot-password" |eval Hour=strftime(_time,"%H")|search Hour>=5 AND Hour<9| bin _time span=60s | stats count as RPM by _time | eval TPS=RPM/60 | stats max(TPS) as MaxTPS
Are there way to optimize this query ?
Made some changes
environment=test earliest=@d+5h latest=@d+9h (index=iis_openapi OR index=iis_securehostopenapi OR index=iis_securepayment) cs_method=POST | regex cs_uri_stem=(?i)"/account/v1/login/forgot-password"| bin _time span=1m | stats count as RPM by _time | eval TPS=RPM/60 | stats max(TPS) as
MaxTPS
it is still taking 56 secs for this query to run , here is the results from inspect job
Duration(sec) Component Invocations Input count Output count
2.45 command.prestats 3,258 15,495 8,954
34.23 command.regex 3,258 32,047,620 15,495
3.73 command.remotetl 3,258 15,495 15,495
2,383.81 command.search 6,516 32,047,620 64,095,240
57.63 command.search.calcfields 9,202 32,196,524 32,196,524
2.74 command.search.expand_search 56 - -
77.43 command.search.fieldalias 9,202 32,196,524 32,196,524
26.30 command.search.filter 9,202 32,196,524 32,047,620
22.42 command.search.index 9,539 - -
312.24 command.search.kv 9,202 - -
246.08 command.search.lookups 9,202 32,196,524 32,196,524
155.64 command.search.rawdata 9,202 - -
1,358.04 command.search.typer 9,202 32,047,620 32,047,620
61.37 command.search.tags 9,202 32,047,620 32,047,620
7.24 command.search.track_sourcetypes 3,258 - -
2,424.81 dispatch.stream.remote 3,257 - 69,230,943
This is phase0 search query from inspect job
litsearch (cs_method=POST (__f!=v OR environment=prod) (index=iis_openapi OR index=iis_securehostopenapi OR index=iis_securepayment) (index!=dp_sec_log OR index!=log-vdi-prod OR index!=idm-win-other OR index!=www_app OR index!=www_app) _time>=1680523200.000 _time<1680537600.000) | litsearch (cs_method=POST environment=prod (index=iis_openapi OR index=iis_securehostopenapi OR index=iis_securepayment) (index!=dp_sec_log OR index!=log-vdi-prod OR index!=idm-win-other OR index!=www_app OR index!=www_app) _time>=1680523200.000 _time<1680537600.000) | regex cs_uri_stem=(?i)"/account/v1/login/forgot-password" | bin _time span=1m | addinfo type=count label=prereport_events track_fieldmeta_events=true | fields keepcolorder=t "*" "_bkt" "_cd" "_si" "_time" "host" "index" "linecount" "prestats_reserved_*" "psrsvd_*" "source" "sourcetype" "splunk_server" | remotetl nb=300 et=1680523200.000000 lt=1680537600.000000 max_count=1000 max_prefetch=100 | prestats count by _time
Also, are you using datamodels and if so, do you have the web datamodel in use and is it accelerated. If so you can use tstats, but if not, you can't
Have you tried putting the cs_uri_stem search criteria into the search statement rather than in the regex?
Also, can you show an example of what the _raw data looks like for one of those events - to see if you can make use of TERM() statements.
You can see that this
1,358.04 command.search.typer 9,202 32,047,620 32,047,620
appears to be taking a significant part of that time and there are 32 million events going into it. That is the time spent creating event types. If you have broad sharing of many eventtypes I suspect that may negatively affect performance.
Do you have 32 million forgot password requests? I suspect not, so you need to see how you can reduce the data coming off disk - TERM and the cs_uri_stem in the search may help.
Since you are only looking for events between 5am and 9am, try using
earliest=@d+5h latest=@d+9h
If your ingestion is auto extracting date_hour and other date_* fields than you can put the hour filter in the initial search part.
Is your cs_uri_stem search looking for that anywhere in the uri or an exact match - just wondering if that can be part of the search too.
Also, I forget it IIS logs have those fields as quoted or if they are in the logs as unquoted. If they are unquoted, then you may be able to use TERM, i.e.
TERM(environment=tesxt) TERM(cs_method=POST)
which would probably reduce the data pulled from disk.
Have you looked at the job inspector to see where the time is being spent and what the phase0 search is converted to?
@bowesmana @ITWhisperer @inventsekar This is where it it taking more time from inspect job
Duration (seconds) Component Invocations Input count Output count
2,133.38 command.search 6,598 32,047,620 64,095,240
52.30 command.search.calcfields 9,307 32,196,524 32,196,524
0.00 command.search.evalfilter 9,307 32,196,524 32,196,524
69.98 command.search.fieldalias 9,307 32,196,524 32,196,524
23.20 command.search.filter 9,307 32,196,524 32,047,620
213.23 command.search.lookups 9,307 32,196,524 32,196,524
1,219.75 command.search.typer 9,307 32,047,620 32,047,620
56.40 command.search.tags 9,307 32,047,620 32,047,620
6.95 command.search.track_sourcetypes 3,299 - -
0.01 dispatch.preview.write_results_to_disk 30 - -
7.64 dispatch.process_remote_timeline 1,860 19,383,546 -
3.95 dispatch.remote_timeline_fullevents 1,034 11,207,755 5,400
0.00 dispatch.stream.local 1 - -
2,171.22 dispatch.stream.remote 3,298 - 70,330,684
31.46 command.regex 3,299 32,047,620 15495
Following is phase0 search value from job inspector
litsearch (cs_method=POST (__f!=v OR environment=prod) (index=iis_openapi OR index=iis_securehostopenapi OR index=iis_securepayment) (index!=dp_sec_log OR index!=log-vdi-prod OR index!=idm-win-other OR index!=www_app OR index!=www_app) _time>=1680523200.000 _time<1680537600.000) | litsearch (cs_method=POST environment=prod (index=iis_openapi OR index=iis_securehostopenapi OR index=iis_securepayment) (index!=dp_sec_log OR index!=log-vdi-prod OR index!=idm-win-other OR index!=www_app OR index!=www_app) _time>=1680523200.000 _time<1680537600.000) | regex cs_uri_stem=(?i)"/account/v1/login/forgot-password" | eval Hour=strftime('_time',"%H") | bin _time span=1m | addinfo type=count label=prereport_events track_fieldmeta_events=true | fields keepcolorder=t "*" "_bkt" "_cd" "_si" "_time" "host" "index" "linecount" "prestats_reserved_*" "psrsvd_*" "source" "sourcetype" "splunk_server" | remotetl nb=300 et=1680523200.000000 lt=1680537600.000000 max_count=1000 max_prefetch=100 | prestats count by _time
this looks like a straight forward search query.. not much fine-tuning can be done.
earliest=-0d@d ---- you meant to run this search query from midnight today to the current time, right (for example.. if you run this query at 7am meaning.. u r searching for 0am to 7am...is that right?!?!)
as per my knowledge, the only fine tuning ...ie, the search optimization that can be done for this query is .. the summary indexing.
https://docs.splunk.com/Documentation/Splunk/9.0.4/Knowledge/Usesummaryindexing