(armada) Chart Time Metrics

Change-Id: I121d8fcf050a83cbcf01a14c1543d11a0b04ea2a
This commit is contained in:
Samuel Pilla 2019-06-24 08:47:33 -05:00
parent 987eacad79
commit 2aafaa8048
1 changed files with 156 additions and 0 deletions

View File

@ -0,0 +1,156 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
=======================================
Time Performance Metrics for Each Chart
=======================================
Allow time performance metrics on charts including deployment time, upgrade
time, wait time, test time, and consumed time for docs or resources, if
applicable.
Problem description
===================
There are currently no time metrics within Armada for chart deployments,
upgrades, tests, or other actions. This can cause issues in that there is no
known time for deployment of an environment, potentially restricting
deployment or upgrade periods for charts. By logging time metrics that can be
scraped by CRD and Prometheus, this will allow for better predictability of
deployments and upgrades as well as show when charts are acting not as intended.
Use Cases
---------
Knowing how long a chart takes to deploy or upgrade can streamline these
processes in future deployements or upgrades. It allows for predictable chart
deployment and upgrade times as well as finding inconsistencies within those
deployments and upgrades, likely pinpointing which chart(s) is causing errors.
Proposed change
===============
Add time metrics to the `ChartBuilder`, `ChartDeploy`, and `ChartDelete`
classes. The timer will be built in python library `time` which will
then be written to the logs for use or analysis.
These metrics include the full deployment, upgrade, wait, install, and delete
time for charts through Armada. These will be logged with a date and timestamp
with the chart name and action performed, such as the following::
Ingress DEPLOYMENT start: 2019-06-25 12:34:56 UTC
...
Ingress DEPLOYMENT complete: 2019-06-25 13:57:09 UTC
Ingress DEPLOYMENT duration: 01:22:13
As shown, the logs will show the chart name, the action the chart is performing,
the status of the action, and the datetime of the stage along with the duration
at the end. In case of an error, the `complete` will be replaced with `error`.
In order to log these metrics, changes to the deployment files will need to be
made, adding in lines to create the timestamps needed and then log the start,
completion or error, and duration times for the chart's action.
Example:
chart_deploy.py::
def execute(self, chart, cg_test_all_charts, prefix, known_releases):
namespace = chart.get('namespace')
release = chart.get('release')
release_name = r.release_prefixer(prefix, release)
LOG.info('Processing Chart, release=%s', release_name)
start_time = time.time()
...
LOG.info('Chart deployment/update completed in %s' % \
time.strftime('%H:%M:%S', time.gmtime(time.time() - start_time)))
start_time = time.time()
# Wait
timer = int(round(deadline - time.time()))
chart_wait.wait(timer)
LOG.info('Chart wait completed in %s' % \
time.strftime('%H:%M:%S', time.gmtime(time.time() - start_time)))
start_time = time.time()
# Test
just_deployed = ('install' in result) or ('upgrade' in result)
...
if run_test:
self._test_chart(release_name, test_handler)
LOG.info('Chart test completed in %s' % \
time.strftime('%H:%M:%S', time.gmtime(time.time() - start_time)))
...
The logs will then have the time metrics as follows::
2019-07-01 00:00:00.000 0 INFO armada.handlers.chart_deploy [-] [chart=chart-name] Beginning chart deployment
...
2019-07-01 00:00:00.000 0 INFO armada.handlers.chart_deploy [-] [chart=chart-name] SUCCESS chart deployment complete in 00:00:00.000
Prometheus can then scrape the metrics from the logs as long as the chart has
enabled it in the Prometheus section of the chart's values.yaml::
monitoring:
prometheus:
enabled: false
node_exporter:
scrape: true
Alternatives
------------
1. A simplistic alternative is to merely log time stamps for each action which
occurs on a chart. While similar the same as the proposed change, it would not
show an elapsed time but just start and end points.
2. Another alternative is to use the `datetime` library instead of the `time`
library. This allows for very similar functionality in getting the elapsed
time for chart deployment, update, wait, test, etc. It is slightly more
effort to get the `timedelta` object produced by comparing two `datetime`
objects to a string format to put into the log.
Security Impact
---------------
None
Notifications Impact
--------------------
Extra notification diplaying deployment or upgrade time
Other End User Impact
---------------------
None
Performance Impact
------------------
None
Other Deployer Impact
---------------------
None
Implementation
==============
Assignee(s)
-----------
Work Items
----------
Dependencies
============
None
Documentation Impact
====================
None