Phabricator Link
|
Wiki Link
|
Status
|
Priority
|
Author
|
Assignee
|
Projects
|
Subtasks
|
Parent Tasks
|
T102575
|
T102575: document graphite failover/backfill procedures
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T106351
|
T106351: RESTBase dashboard annotations for deployments (and more)
|
open
|
Medium (orange)
|
|
|
|
|
|
T110240
|
T110240: [Discussion] Consider validating JSON schemas when running x-ample tests?
|
open
|
Medium (orange)
|
|
|
|
|
|
T123854
|
T123854: Set up action API latency / error rate metrics & alerts
|
resolved
|
High (red)
|
|
|
|
|
|
T131253
|
T131253: Report ok / broken metrics from service_checker
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T141324
|
T141324: Look into shoving gerrit logs into logstash
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T141895
|
T141895: Get phabricator error logs into logstash
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T143733
|
T143733: Send Jenkins daemon logs to logstash
|
open
|
Medium (orange)
|
|
|
|
|
|
T152427
|
T152427: Create a check/calendar alert for MariaDB TLS certs
|
open
|
Medium (orange)
|
|
|
|
|
|
T152595
|
T152595: Implement TLS expiration/validation checking for MariaDB certificates
|
duplicate
|
Needs Triage (violet)
|
|
|
|
|
|
T152782
|
T152782: Kibana functionality missing after upgrade: histograms
|
invalid
|
Medium (orange)
|
|
|
|
|
|
T157702
|
T157702: Followup for TLS MariaDB server roll-out
|
open
|
Medium (orange)
|
|
|
|
|
|
T172479
|
T172479: Collect error logs from jobchron/jobrunner services in Logstash
|
declined
|
Low (yellow)
|
|
|
|
|
|
T176335
|
T176335: logs sent to logstash are lost when the elasticsearch cirrus cluster is unavailable
|
open
|
Medium (orange)
|
|
|
|
|
|
T176430
|
T176430: api feature logs should be sent to both eqiad and codfw clusters
|
resolved
|
High (red)
|
|
|
|
|
|
T177778
|
T177778: Improve database application performance monitoring visibility
|
resolved
|
Low (yellow)
|
|
|
|
|
|
T178445
|
T178445: flapping monitoring for recommendation_api on scb
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T180051
|
T180051: Reduce the number of fields declared in elasticsearch by logstash
|
open
|
Low (yellow)
|
|
|
|
|
|
T187147
|
T187147: Port mediawiki/php/wmerrors to PHP7 and deploy
|
resolved
|
High (red)
|
|
|
|
|
|
T190455
|
T190455: Logstash no longer captures DB queries in debug mode
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T192948
|
T192948: Upgrade prometheus-jmx-exporter on all services using it
|
open
|
Medium (orange)
|
|
|
|
|
|
T193766
|
T193766: Ship host syslogs to ELK
|
open
|
Medium (orange)
|
|
|
|
|
|
T197173
|
T197173: Ship MX logs to ELK
|
open
|
Medium (orange)
|
|
|
|
|
|
T199479
|
T199479: Add alerts for Logstash rates in production
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T203169
|
T203169: Logstash hardware expansion
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T205849
|
T205849: Begin the implementation of Q1's Logging Infrastructure design (2018-19 Q2 Goal)
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T205850
|
T205850: Procure and provision Logging pipeline hardware in multiple datacenters
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T205851
|
T205851: Migrate >=90% of existing Logstash traffic to the logging pipeline
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T205852
|
T205852: Onboard at least 10 new non-sensitive log producers to the logging pipeline
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T205855
|
T205855: Investigate approaches to ingest sensitive log producers
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T205873
|
T205873: Investigate Kafka main cluster usage for logging pipeline
|
resolved
|
High (red)
|
|
|
|
|
|
T206454
|
T206454: Setup Kafka cluster, producers and consumers for logging pipeline
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T206633
|
T206633: Setup rsyslog to be able to produce logs to Kafka
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T209300
|
T209300: Review and make librdkafka-0.11.6 installable from stretch-wikimedia
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T209860
|
T209860: Ship peopleweb apache2 error logs to ELK
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T210455
|
T210455: Ship prometheus logs to ELK
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T210458
|
T210458: Ship PuppetDB logs to ELK
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T210846
|
T210846: Ship Grafana server logs to ELK
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T211124
|
T211124: Move mediawiki to new logging infrastructure
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T211125
|
T211125: Move service-runner to new logging infrastructure
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T211217
|
T211217: Spin up 3 logstash/kibana frontend VMs in codfw
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T211859
|
T211859: cronspam from elasticsearch-curator on stretch
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T213506
|
T213506: Grafana alerting broken after upgrade to 5.0.0
|
resolved
|
High (red)
|
|
|
|
|
|
T215611
|
T215611: MediaWiki errors overloading logstash
|
resolved
|
High (red)
|
|
|
|
|
|
T217162
|
T217162: The api-feature-usage log channel should use log context instead of parsing a string
|
resolved
|
Needs Triage (violet)
|
|
|
|
|
|
T219919
|
T219919: Move citoid logging to new logging pipeline
|
resolved
|
High (red)
|
|
|
|
|
|
T219921
|
T219921: Move cxserver logging to new logging pipeline
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T219922
|
T219922: Move eventstreams logging to new logging pipeline
|
declined
|
Medium (orange)
|
|
|
|
|
|
T219923
|
T219923: Move graphoid logging to new logging pipeline
|
declined
|
Medium (orange)
|
|
|
|
|
|
T219924
|
T219924: Move mobileapps logging to new logging pipeline
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T219925
|
T219925: Move proton logging to new logging pipeline
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T219926
|
T219926: Move recommendation-api logging to new logging pipeline
|
resolved
|
Needs Triage (violet)
|
|
|
|
|
|
T219927
|
T219927: Move parsoid logging to new logging pipeline
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T219928
|
T219928: Move AQS logging to new logging pipeline
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T220709
|
T220709: Upgrade statsd_exporter to 0.9
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T222377
|
T222377: Move kartotherian/tilerator logging to new logging pipeline
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T223336
|
T223336: [Regression] fatal-errors.php action=segfault results in a 503 error under php7-fpm.
|
declined
|
High (red)
|
|
|
|
|
|
T226986
|
T226986: Client side error logging production launch
|
resolved
|
Needs Triage (violet)
|
|
|
|
|
|
T228838
|
T228838: Consider enabling all MW log channels by default for WMF
|
open
|
Needs Triage (violet)
|
|
|
|
|
|
T233047
|
T233047: Apache mod_status aggregator
|
open
|
Medium (orange)
|
|
|
|
|
|
T234283
|
T234283: Messages in Logstash from php-fatal-error.php are missing from type:mediawiki/channel:fatal
|
resolved
|
High (red)
|
|
|
|
|
|
T235244
|
T235244: Ensure operational visibility in ChronologyProtector
|
open
|
Needs Triage (violet)
|
|
|
|
|
|
T235490
|
T235490: logstash_checker should be able to check for error for any php version
|
open
|
Needs Triage (violet)
|
|
|
|
|
|
T236832
|
T236832: /etc/php/php7-fatal-error.php uses unsafe ob_start
|
resolved
|
Needs Triage (violet)
|
|
|
|
|
|
T238296
|
T238296: job queue insert rate metrics gone from Grafana
|
resolved
|
Needs Triage (violet)
|
|
|
|
|
|
T239090
|
T239090: Restbase logging indexing conflict on 'res' and 'body' logging fields
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T239713
|
T239713: Citoid is logging all request / response headers as separate fields
|
open
|
Medium (orange)
|
|
|
|
|
|
T240685
|
T240685: MediaWiki Prometheus support
|
open
|
High (red)
|
|
|
|
|
|
T242751
|
T242751: Update monolog/monolog to 2.1.1 or later
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T246030
|
T246030: Enable client side error logging in prod for small wiki
|
resolved
|
High (red)
|
|
|
|
|
|
T247675
|
T247675: Stop overriding LogstashFormatter format
|
open
|
Low (yellow)
|
|
|
|
|
|
T247786
|
T247786: MWExceptionHandler reqId sometimes differs from php-wmerrors reqId
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T247820
|
T247820: Decide on `service-runner` aggregated prometheus metrics and use of `service` label
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T248884
|
T248884: Documentation of client side error logging capabilities on mediawiki
|
open
|
Low (yellow)
|
|
|
|
|
|
T255585
|
T255585: [EPIC] Extend client-side error logging coverage to include English Wikipedia
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T258414
|
T258414: Cassandra Grafana dashboards seem to disagree with actual utilization
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T263728
|
T263728: mtail 3.0.0-rc35 doesn't support the histogram type in -oneshot mode.
|
resolved
|
High (red)
|
|
|
|
|
|
T266515
|
T266515: Set ENV SERVERGROUP for jobrunner MW web requests
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T269676
|
T269676: Mediawiki logging indexing conflict on 'status' for 'authevents'
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T269680
|
T269680: MediaWiki logging indexing conflict on 'session' for 'session-ip' channel
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T270517
|
T270517: Investigate opcache hit rate on Buster appserver
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T271822
|
T271822: Add support for scraping php applications to the kubernetes prometheus scraper
|
open
|
Medium (orange)
|
|
|
|
|
|
T278906
|
T278906: Change cpjobqueue "processing" time metrics from pre-aggregated quantile to native Prometheus histogram bucket
|
open
|
Medium (orange)
|
|
|
|
|
|
T280805
|
T280805: Error in apifeatureusage curator "forcemerge" step
|
resolved
|
Needs Triage (violet)
|
|
|
|
|
|
T281048
|
T281048: mwlog1001 is running out of free space on /srv/mw-log
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T293531
|
T293531: Monitor/dashboard number of queries killed by the automatic query killer
|
duplicate
|
Medium (orange)
|
|
|
|
|
|
T63780
|
T63780: Add swift logs to logstash
|
open
|
Medium (orange)
|
|
|
|
|
|
T63788
|
T63788: Add kafka log files to logstash
|
open
|
Medium (orange)
|
|
|
|
|
|
T63789
|
T63789: Add Zookeeper log files to logstash
|
open
|
Medium (orange)
|
|
|
|
|
|
T78514
|
T78514: Detailed cassandra monitoring: metrics and dashboards done, need to set up alerts
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T7
|
T7: Get icinga alerts into logstash
|
resolved
|
Medium (orange)
|
|
|
|
|
|
T83729
|
T83729: Fix monitoring of poolcounter service
|
open
|
Medium (orange)
|
|
|
|
|
|
T97024
|
T97024: some cassandra metrics sent with invalid values
|
resolved
|
Medium (orange)
|
|
|
|
|
|