AzureWatch's email provider (Amazon SES) has without warning suspended our account. At this time, AzureWatch is not sending any emails out. Portal, monitoring, auto-scaling, and other automation services continue to run.
CloudMonix email services were also impacted but have now been restored due to less complexity involved.
UPDATE: On 2015/09/20 - email services were restored approximately 24hrs after initial outage.
Bugs Fixed
Moderate - Diagnostics Extensions for Cloud Services deployed with SDK v2.5+ and with a period "." in their Role names should now properly be enabled for Diagnostics
Moderate - New Static IP implemented for monitoring services. Learn more here.
Outages:
- Major: AzureWatch alongside Azure has suffered a major outage that lasted for approximately 1.5hrs (from 6:48pm to 8:23pm, Central time, 11/18/2014). Outage was related to general network timeouts within Azure data centers and prevented AzureWatch properly performing monitoring.
Issues Resolved:
- Moderate: monitoring of certain customer accounts was occasionally delayed due to various stabillity issues.
- Moderate: querying for performance counters of Virtual Machines not enabled for monitoring should no longer be happening
Bugs Fixed:
- Moderate: when monitoring storage accounts from multiple subscriptions, AzureWatch would intermittently not be able to monitor storage accounts from all subscriptions properly
- Moderate: AzureWatch will now only wait up to 10 seconds for virtual machines to respond to powershell commands to retrieve data
Bugs Fixed:
- Moderate: AzureWatch will now send Outage Resolved (UP) alerts again.
New Features:
- No customer impact expected
- Major: Upgrades across all internal frameworks, improved caching
- Minor: Enabling support for Service Bus Subscription monitoring
Bugs resolved:
- Major: All customers received incomplete daily charts. The issue has been resolved and updated emails sent out.
Bugs resolved:
- Minor: A hotfix has been deployed that minimizes monitoring outage timeouts as experienced by certain customers
New Features:
- Moderate: Data storage has been restructured in support of newly redesigned and soon to be released dashboard
A planned upgrade to AzureWatch Main website, Management Portal and Monitoring Service has been rolled back after 10 minutes of running in production due to a discovered issue. After the upgrade, a number of customers began receiving false alerts as AzureWatch attempted to scale their no-longer existing (but configured for scaling) deployments. Prior to the upgrade, AzureWatch used to simply ignore deployments that were no longer present, even if it was configured to monitor and scale them.
While the discovered issue had no bearing on production use and monitoring, the amount of false alerts sent to customers mail boxes every minute was deemed inappropriate and upgrade has been rolled back until a fix can be applied.
We apologize for the inconvenience.
We are seeing world-wide connectivity errors connecting to Azure Storage. Problems appear to be caused by an expired Azure SSL certificate
This Azure-related outage is impacting AzureWatch's ability to monitor customers subscriptions
All Azure-related services have been restored approximately 10 hours after the Azure Storage-related outage started. AzureWatch was sending alerts to customers who monitor their Storage Accounts and who have enabled the "Alert on Failure" option. During the outage, AzureWatch Management Portal was unavailable.
All AzureWatch related services are now running normally
New Features:
- Moderate: AzureWatch now analyzes Active Message Count within the Service Bus queues instead of the total message count
- Moderate: AzureWatch no longer imports all the possible metrics for monitored endpoints, but only those that are being aggregated. Any metrics that do not have associated aggregations can no longer be viewed from the Historical Reports section
Bug Fixes:
New Features:
- Moderate: We have altered the way we monitor Azure Storage accounts. Previously, during every monitoring cycle we executed a number of actions against Azure Storage accounts that were not very useful to monitor but have added a degree of unpredictability to monitoring results. This caused us to send out a number of alerts to our customers for conditions that were rarely impacting their overall production environments. In particular, during every monitoring cycle, we measured the time it took to count containers, queues and tables, as well as execute "CreateIfNotExists" command against table, queue and blob storage. These commands typically do not execute within predictable time periods, fall within the Azure Storage SLAs that deal with transaction counts per second, or are in general useful to monitor. We have thus, removed these commands from the time measurements and now only monitor the time it takes to add & remove rows from table storage, add/remove messages from the queue and the time it takes to add/delete a file from blob storage.
Bug Fixes:
New Features:
- Major: Azure Compute Services (Worker and Web Roles) can now be auto-scaled or alerted based on the size of Service Bus Queues
- Moderate: SQL Azure and Federations monitoring now supports capture of a new measurement: Active Query Count. This allows users to monitor how many actively executing queries are running against their SQL Azure databases.
Bug Fixes:
- Moderate: Fixed occasional failing of adding multiple new Rules
- Minor: Setup Wizard now accepts names of the URL's from the first time
- Minor: Enhanced validation of Rules to prevent invalid formulas, long names and descriptions
Bug Fixes:
- Moderate: Corrected an error where certain customers would receive false outage notification emails
New Features:
Bug Fixes:
- Moderate: Performance improvement: no longer using wad-control-container blob storage to detect changes to monitored deployments in order to force capture of correct metrics - but instead forcing capture of all needed metrics on a schedule
New Features:
Bug Fixes:
- Moderate: Some of the larger customers (those who are monitoring 50+ servers) were occasionally experiencing longer than 1-minute monitoring cycles. Performance improvements have been applied to the monitoring logic