(armada) Chart Time Metrics

Change-Id: I121d8fcf050a83cbcf01a14c1543d11a0b04ea2a
2019-06-24 08:47:33 -05:00 · 2019-06-24 08:47:33 -05:00 · 2dfc155f48
parent 987eacad79
commit 2dfc155f48
1 changed files with 163 additions and 0 deletions
--- a/specs/approved/armada_time_metrics.rst
+++ b/specs/approved/armada_time_metrics.rst
@ -0,0 +1,163 @@
+..
+ This work is licensed under a Creative Commons Attribution 3.0 Unported
+ License.
+
+ http://creativecommons.org/licenses/by/3.0/legalcode
+
+
+=======================================
+Time Performance Metrics for Each Chart
+=======================================
+
+Allow time performance metrics on charts including deployment time, upgrade
+time, wait time, test time, and consumed time for docs or resources, if
+applicable.
+
+Problem description
+===================
+
+There are currently no time metrics within Armada for chart deployments,
+upgrades, tests, or other actions. This can cause issues in that there is no
+known time for deployment of an environment, potentially restricting
+deployment or upgrade periods for charts. By adding time metrics for the
+charts, this will allow for better predictability of deployments and upgrades
+as well as show when charts are acting not as intended.
+
+Use Cases
+---------
+
+Knowing how long a chart takes to deploy or upgrade can streamline these
+processes in future deployements or upgrades. It allows for predictable chart
+deployment and upgrade times as well as finding inconsistencies within those
+deployments and upgrades, likely pinpointing which chart(s) is causing errors.
+
+Proposed change
+===============
+
+Add time metrics to the `ChartBuilder`, `ChartDeploy`,  and `ChartDelete`
+classes. The timer will be the built in python library `time` which will
+then be written to the logs for use or analysis.
+
+These metrics include the full deployment, upgrade, wait, install, and delete
+time for charts through Armada. These will be logged with a date & timestamp
+with the chart name and action performed, such as the following::
+
+    Ingress DEPLOYMENT start: 2019-06-25 12:34:56 UTC
+    ...
+    Ingress DEPLOYMENT complete: 2019-06-25 13:57:09 UTC
+    Ingress DEPLOYMENT duration: 01:22:13
+
+As shown, the metrics will show the chart, the action the chart is performing,
+the stage of the action, and the datetime of the stage along with the duration
+at the end. In case of an error, the `complete` will be replaced with `error`.
+
+In order to log these metrics, changes to the deployment files will need to be
+made, adding in lines to create the timestamps needed and then log the start,
+completion or error, and durtion times for the chart's action. 
+
+Example:
+
+chart_deploy.py::
+
+    def execute(self, chart, cg_test_all_charts, prefix, known_releases):
+        namespace = chart.get('namespace')
+        release = chart.get('release')
+        release_name = r.release_prefixer(prefix, release)
+        LOG.info('Processing Chart, release=%s', release_name)
+        start_time = time.time()
+    ...
+        LOG.info('Chart deployment/update completed in %s' % \
+        time.strftime('%H:%M:%S', time.gmtime(time.time() - start_time)))
+
+        start_time = time.time()
+        # Wait
+        timer = int(round(deadline - time.time()))
+        chart_wait.wait(timer)
+
+        LOG.info('Chart wait completed in %s' % \
+            time.strftime('%H:%M:%S', time.gmtime(time.time() - start_time)))
+
+        start_time = time.time()
+        #Test
+        just_deployed = ('install' in result) or ('upgrade' in result)
+    ...
+        if run_test:
+            self._test_chart(release_name, test_handler)
+
+        LOG.info('Chart test completed in %s' % \
+            time.strftime('%H:%M:%S', time.gmtime(time.time() - start_time)))
+    ...
+
+Alternatives
+------------
+
+1. A simplistic alternative is to merely log time stamps for each action which
+occurs on a chart. While almost the same as the proposed change, it doesn't
+show an elapsed time but just start and end points.
+
+2. Another alternative is to use the `datetime` library instead of the `time`
+library. This allows for very similar functionality in getting the elapsed
+time for chart deployment, update, wait, test, etc. It is slightly more
+effort to get the `timedelta` object produced by comparing two `datetime`
+objects to a string format to put into the log.
+
+3. A third alternative is to use the Prometheus metrics through Openstack Helm.
+The Prometheus config file currently has it scrape the cAdvisor endpoint to
+retrieve metrics. These metrics could be use to show the starting time of
+chart deployments based on the containers. The 'container_start_time_seconds'
+metric will show the epoch timestamp for the container the chart is running,
+which can be converted normal timestamp. In order to grab the scraped metrics,
+a HTTP request as follows can be used::
+
+    curl http://127.0.0.1:9090/metrics
+
+Unfortunatly these metrics do not include anything that would easily show
+when a chart was finished. A possibility would to grab the next chart's
+'container_start_time_seconds' timestamp and compare it to the previous, thus
+getting a rough estimate for the time performance of a chart deployment.
+However, for upgrades, waits, and tests times, it may prove too complex
+from the Prometheus scraped metrics to get accurate data since this returns
+only data for the starting of the containers.
+
+Security Impact
+---------------
+None
+
+Notifications Impact
+--------------------
+
+Extra notification diplaying deployment or upgrade time
+
+Other End User Impact
+---------------------
+None
+
+Performance Impact
+------------------
+None
+
+Other Deployer Impact
+---------------------
+None
+
+Implementation
+==============
+
+Assignee(s)
+-----------
+
+Work Items
+----------
+
+Dependencies
+============
+None
+
+Documentation Impact
+====================
+None
+
+References
+==========
+TODO
+