(armada) Chart Time Metrics
Change-Id: I121d8fcf050a83cbcf01a14c1543d11a0b04ea2a
This commit is contained in:
parent
987eacad79
commit
2aafaa8048
|
@ -0,0 +1,156 @@
|
|||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
|
||||
=======================================
|
||||
Time Performance Metrics for Each Chart
|
||||
=======================================
|
||||
|
||||
Allow time performance metrics on charts including deployment time, upgrade
|
||||
time, wait time, test time, and consumed time for docs or resources, if
|
||||
applicable.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
There are currently no time metrics within Armada for chart deployments,
|
||||
upgrades, tests, or other actions. This can cause issues in that there is no
|
||||
known time for deployment of an environment, potentially restricting
|
||||
deployment or upgrade periods for charts. By logging time metrics that can be
|
||||
scraped by CRD and Prometheus, this will allow for better predictability of
|
||||
deployments and upgrades as well as show when charts are acting not as intended.
|
||||
|
||||
Use Cases
|
||||
---------
|
||||
|
||||
Knowing how long a chart takes to deploy or upgrade can streamline these
|
||||
processes in future deployements or upgrades. It allows for predictable chart
|
||||
deployment and upgrade times as well as finding inconsistencies within those
|
||||
deployments and upgrades, likely pinpointing which chart(s) is causing errors.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
Add time metrics to the `ChartBuilder`, `ChartDeploy`, and `ChartDelete`
|
||||
classes. The timer will be built in python library `time` which will
|
||||
then be written to the logs for use or analysis.
|
||||
|
||||
These metrics include the full deployment, upgrade, wait, install, and delete
|
||||
time for charts through Armada. These will be logged with a date and timestamp
|
||||
with the chart name and action performed, such as the following::
|
||||
|
||||
Ingress DEPLOYMENT start: 2019-06-25 12:34:56 UTC
|
||||
...
|
||||
Ingress DEPLOYMENT complete: 2019-06-25 13:57:09 UTC
|
||||
Ingress DEPLOYMENT duration: 01:22:13
|
||||
|
||||
As shown, the logs will show the chart name, the action the chart is performing,
|
||||
the status of the action, and the datetime of the stage along with the duration
|
||||
at the end. In case of an error, the `complete` will be replaced with `error`.
|
||||
|
||||
In order to log these metrics, changes to the deployment files will need to be
|
||||
made, adding in lines to create the timestamps needed and then log the start,
|
||||
completion or error, and duration times for the chart's action.
|
||||
|
||||
Example:
|
||||
|
||||
chart_deploy.py::
|
||||
|
||||
def execute(self, chart, cg_test_all_charts, prefix, known_releases):
|
||||
namespace = chart.get('namespace')
|
||||
release = chart.get('release')
|
||||
release_name = r.release_prefixer(prefix, release)
|
||||
LOG.info('Processing Chart, release=%s', release_name)
|
||||
start_time = time.time()
|
||||
...
|
||||
LOG.info('Chart deployment/update completed in %s' % \
|
||||
time.strftime('%H:%M:%S', time.gmtime(time.time() - start_time)))
|
||||
|
||||
start_time = time.time()
|
||||
# Wait
|
||||
timer = int(round(deadline - time.time()))
|
||||
chart_wait.wait(timer)
|
||||
|
||||
LOG.info('Chart wait completed in %s' % \
|
||||
time.strftime('%H:%M:%S', time.gmtime(time.time() - start_time)))
|
||||
|
||||
start_time = time.time()
|
||||
# Test
|
||||
just_deployed = ('install' in result) or ('upgrade' in result)
|
||||
...
|
||||
if run_test:
|
||||
self._test_chart(release_name, test_handler)
|
||||
|
||||
LOG.info('Chart test completed in %s' % \
|
||||
time.strftime('%H:%M:%S', time.gmtime(time.time() - start_time)))
|
||||
...
|
||||
|
||||
The logs will then have the time metrics as follows::
|
||||
|
||||
2019-07-01 00:00:00.000 0 INFO armada.handlers.chart_deploy [-] [chart=chart-name] Beginning chart deployment
|
||||
...
|
||||
2019-07-01 00:00:00.000 0 INFO armada.handlers.chart_deploy [-] [chart=chart-name] SUCCESS chart deployment complete in 00:00:00.000
|
||||
|
||||
Prometheus can then scrape the metrics from the logs as long as the chart has
|
||||
enabled it in the Prometheus section of the chart's values.yaml::
|
||||
|
||||
monitoring:
|
||||
prometheus:
|
||||
enabled: false
|
||||
node_exporter:
|
||||
scrape: true
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
1. A simplistic alternative is to merely log time stamps for each action which
|
||||
occurs on a chart. While similar the same as the proposed change, it would not
|
||||
show an elapsed time but just start and end points.
|
||||
|
||||
2. Another alternative is to use the `datetime` library instead of the `time`
|
||||
library. This allows for very similar functionality in getting the elapsed
|
||||
time for chart deployment, update, wait, test, etc. It is slightly more
|
||||
effort to get the `timedelta` object produced by comparing two `datetime`
|
||||
objects to a string format to put into the log.
|
||||
|
||||
|
||||
Security Impact
|
||||
---------------
|
||||
None
|
||||
|
||||
Notifications Impact
|
||||
--------------------
|
||||
|
||||
Extra notification diplaying deployment or upgrade time
|
||||
|
||||
Other End User Impact
|
||||
---------------------
|
||||
None
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
None
|
||||
|
||||
Other Deployer Impact
|
||||
---------------------
|
||||
None
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
Dependencies
|
||||
============
|
||||
None
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
None
|
Loading…
Reference in New Issue