AzureWatch alerts and other email communication is down (RESOLVED)

20. September 2015 00:04 by author Igor Papirov

AzureWatch's email provider (Amazon SES) has without warning suspended our account.  At this time, AzureWatch is not sending any emails out.  Portal, monitoring, auto-scaling, and other automation services continue to run.

CloudMonix email services were also impacted but have now been restored due to less complexity involved. 

 

UPDATE: On 2015/09/20 - email services were restored approximately 24hrs after initial outage.



AzureWatch Monitoring - Static IP Update and Diagnostics Extension bug fixed

6. June 2015 00:00 by author Igor Papirov

Bugs Fixed

Moderate - Diagnostics Extensions for Cloud Services deployed with SDK v2.5+ and with a period "." in their Role names should now properly be enabled for Diagnostics

Moderate - New Static IP implemented for monitoring services.  Learn more here.



Azure and AzureWatch Outage

19. November 2014 15:18 by author Igor Papirov

Outages:

  • Major: AzureWatch alongside Azure has suffered a major outage that lasted for approximately 1.5hrs (from 6:48pm to 8:23pm, Central time, 11/18/2014). Outage was related to general network timeouts within Azure data centers and prevented AzureWatch properly performing monitoring.


AzureWatch Processing - Hotfix for Service Dashboard alerts deployed

20. June 2014 04:12 by author Igor Papirov

Hot Fixes:

  • Major: Due to RSS changes within Windows Azure Service Dashboard, AzureWatch monitoring functionality of the Azure Service Dashboard has been disabled until further notice


AzureWatch Monitoring: Email flooding with Service Dashboard Notifications

20. June 2014 03:43 by author AzureWatch Administration

Hot Fixes:

  • Major: Due to RSS changes within Windows Azure Service Dashboard, AzureWatch was flooding customers mailboxes with service alerts regarding the dashboard. This issue is currently being hot-fixed


AzureWatch Monitoring: bugs fixed with certain instance-based rules

24. April 2014 00:00 by author AzureWatch Administration

Issues Resolved:

  • Minor: non-instance-specific metrics (such as queue counts, instance counts, etc) are now available to be used in instance-specific Rules/alerts
  • Minor: InstanceName variable is properly working in instance-based rules
  • Minor: Rule's comments are now showing up in alert text


AzureWatch Management Portal: Minor Setup Wizard improvements

20. April 2014 01:00 by author AzureWatch Administration

New Features:

  • Minor: Setup Wizard now allows to check ON/OFF all items in it's grids


Cross-reference metrics in rules from across different resources

7. April 2014 02:47 by author AzureWatch Administration

New Features:

  • Moderate: Rules engine now contains a set of functions that allows metric evaluation from different resources


Minor bug fixes with OFF alerts

3. April 2014 13:39 by author AzureWatch Administration

Issues Resolved:

  • Minor: OFF alerts were sometimes being sent out without corresponding ON alerts
  • Minor: Alerts now show all dates properly in users local time


AzureWatch Monitoring: Alerts on Event Logs for Cloud Services are now working again

24. March 2014 09:48 by author AzureWatch Administration

Issues Resolved:

  • Moderate: a issue preventing alerts based on Event Logs from Cloud Services from being sent, has been resolved


AzureWatch Monitoring: Reboot actions are now working again

22. March 2014 14:26 by author AzureWatch Administration

Issues Resolved:

  • Moderate: rule-based management actions that are not scaling commands, ie: reboots, shutdown/start-ups, etc. are now working again
  • Minor: CountOf_ prefix now works for aggregate metrics with 0 captured data points


AzureWatch Monitoring and Portal: Major enhancements to Alerts and Rules Engine

16. March 2014 12:00 by author AzureWatch Administration

New Features:

  • Major: Rules that trigger alerts are no longer mixed together with rules that execute management actions
  • Major: Ability to execute rules after a sustained period of time
  • Moderate: Abilty to send one alert when rule evaluates to TRUE and only subsequently one alert when it evaluates to FALSE


AzureWatch Monitoring: Stabilty issues resolved

20. February 2014 05:31 by author AzureWatch Administration

Issues Resolved:

  • Moderate: monitoring of certain customer accounts was occasionally delayed due to various stabillity issues.
  • Moderate: querying for performance counters of Virtual Machines not enabled for monitoring should no longer be happening


AzureWatch Monitoring: Intermittent errors when monitoring Storage Accounts & other

10. February 2014 15:58 by author AzureWatch Administration

Bugs Fixed:

  • Moderate: when monitoring storage accounts from multiple subscriptions, AzureWatch would intermittently not be able to monitor storage accounts from all subscriptions properly
  • Moderate: AzureWatch will now only wait up to 10 seconds for virtual machines to respond to powershell commands to retrieve data


AzureWatch: Receive notifications for official Windows Azure Service Dashboard

22. January 2014 23:25 by author AzureWatch Administration

New Features:



AzureWatch Monitoring: Outage Resolution (UP) alerts are not being sent

11. January 2014 01:39 by author Igor Papirov

Bugs Fixed:

  • Moderate: AzureWatch will now send Outage Resolved (UP) alerts again.


AzureWatch: Active monitoring of Windows Event Logs for Virtual Machines

9. January 2014 11:18 by author Igor Papirov

New Features:

  • Moderate: Ability to monitor Application, System and Security Windows Event logs (functionality in beta) for Virtual Machines

Bugs Fixed:

  • Minor: AzureStore subscriptions can now be deleted again


AzureWatch: Event Log Monitoring for Cloud Services

4. January 2014 02:41 by author Igor Papirov

New Features:

  • Moderate: Ability to monitor Application and System Windows Event logs for Cloud Services (functionality in beta)
  • Moderate: Ability to call certain string-based functions inside Rules. ie: Contains, StartsWith, EndsWith
  • Moderate: Ability to interogate Instance Name and Windows Event Log properties inside a Rule

Bugs Fixed:

  • Moderate: Users can now save User Profile screen again


AzureWatch Monitoring: maintenance release ahead of Service Bus Subscription monitoring

25. December 2013 04:02 by author Igor Papirov

New Features:

  • No customer impact expected
  • Major: Upgrades across all internal frameworks, improved caching
  • Minor: Enabling support for Service Bus Subscription monitoring


AzureWatch Daily Emails - Hotfix

13. November 2013 00:13 by author Igor Papirov

Bugs resolved:

  • Major: All customers received incomplete daily charts. The issue has been resolved and updated emails sent out.  


AzureWatch Monitoring: Hotfix to address intermittent gaps in monitoring

6. November 2013 15:09 by author Igor Papirov

Bugs resolved:

  • Minor: A hotfix has been deployed that minimizes monitoring outage timeouts as experienced by certain customers


AzureWatch Monitoring: Back end services restructured

28. October 2013 09:44 by author Igor Papirov

New Features:

  • Moderate: Data storage has been restructured in support of newly redesigned and soon to be released dashboard


AzureWatch: Resolving extraneous Scaling Failure alerts

1. October 2013 01:32 by author Igor Papirov

Bug Fixes:

  • Minor: During intermittent timeouts of Azure Service Management API, AzureWatch should no longer attempt to issue scaling events and send out failed "Instance Count Range Check" alerts


AzureWatch: Basic Support for Windows Azure China

27. August 2013 13:41 by author Igor Papirov

New Features:

  • Moderate: Initial support for Windows Azure China introduced.  Functionality is strictly alpha.  Known issues exist in retrieving queue-based metric data,


AzureWatch: Queue metrics have been restored

27. August 2013 09:59 by author Igor Papirov

Bug Fixes:

  • Moderate: AzureWatch was unable to pull in queue counts for a number of customers after a recent upgrade. This has been fixed


AzureWatch Monitoring: Count of {0} instances in Service Configuration does not match actual instance count deployed.

18. August 2013 15:18 by author Igor Papirov

Bug Fixes:

  • Moderate: Efforts have been made to stop or minimize he amount of scaling error alerts sent from AzureWatch.  


Throttling of outage alerts

16. August 2013 13:08 by author Igor Papirov

New Features:

  • Minor: Resource outage alerts will only be sent out on the 2nd consequent failure.  Resource outage alerts will now display duration of the outage.


Virtual Machines performance counters bug fixed

16. August 2013 02:52 by author Igor Papirov

Bug Fixes:

  • Moderate: Retrieval of performance counters from Virtual Machines was not working in certain cases. This issue has been corrected.


AzureWatch Alerts: Smarter Outage Alerts

14. August 2013 13:08 by author Igor Papirov

New Features:

  • Moderate: Resource outage alerts no longer spam every minute when the resource is down.  Now, the alerts go out when the outage is detected and when it is resolved.  Text of the alerts have changed.

Bug Fixes:

  • Moderate: Username/Password now appear on Virtual Machine and Availability Set configuration screens


AzureWatch Management Portal: New Scheduling Features

12. August 2013 14:00 by author Igor Papirov

New Features:

  • Major: Ability to set different upper and lower instance limits by time of day
  • Moderate: Ability to import all configuration settings (rules, metrics, etc.) from one cloud Role into another
  • Moderate: Customers who have multiple AzureWatch accounts can now easily switch between them from the Management Portal

Bug Fixes:

  • Minor: "Show Available Variables" button was not working for non-configured units


Support for auto-scaling & monitoring of Azure VM's (IaaS)

27. July 2013 13:53 by author Igor Papirov

New Features:

  • Major: It is now possible to monitor, heal, manage and monitor Azure VM's and Availability Sets.  This functionality is soft-launched is considered to be beta
  • Minor: UI has new icons
  • Minor: Setup wizard has new rules for memory monitoring and rebooting based on low-ram conditions
  • Minor: Historical reports no longer randomly error with an expired session state exception


AzureWatch: Support for Auto-Reboot/Reimage of instances and additional alert options

17. May 2013 12:44 by author Igor Papirov

New Features:

  • Major: In addition to scaling up or down, AzureWatch can now reboot or reimage instances via its advanced aggregation and rule-evaluation engine.  This is useful for resource-leaking servers.
  • Major: AzureWatch can now send alerts based on rules that execute on a single-server level (instead of aggregating all the data for a Role).
  • Moderate: AzureWatch now supports sending alerts to multiple mailboxes.  Furthermore, users can customize each rule and forward alerts from a particular rule to additional set of mailboxes.

 



AzureWatch - Support for monitoring of *multiple* Azure subscriptions under one account!

12. May 2013 14:13 by author Igor Papirov

New Features:

  • Major: AzureWatch now supports monitoring multiple Azure subscriptions under one AzureWatch account.  Entering Subscription ID's into AzureWatch can be done either through Setup Wizard or Azure-Specific Settings screen.
  • Major: AzureWatch no longer requires that every monitored instance sends its diagnostic data to a single storage account.  Users can now use any number of storage accounts to host diagnostic data.  AzureWatch will automatically sense which storage accounts hold diagnostic data.

Important Notes:

  • Please make sure that AzureWatch Management Certificate (public key in .cer file) is uploaded to every monitored Subscription under Windows Azure Portal's Management Certificates section.
  • Unfortunately, we do not have an automated way to merge existing AzureWatch accounts together.  Customers who have multiple AzureWatch accounts and who wish to combine those accounts together will have to do so manually.


AzureWatch Monitoring: Production issue, False Azure Storage Outage alerts

11. May 2013 11:18 by author Igor Papirov

Production Issue:

  • Moderate: A production-level issue has caused one of our servers to begin sending false alerts regarding outages in Azure Storage.  The error has been corrected.  Duration of the issue: 33 minutes.  Start: 05/10/2013 8:39pm CST, End: 05/10/2013 9:11pm CST.  We apologize for the inconvenience.


AzureWatch - minor improvements and bug fixes

5. May 2013 05:33 by author Igor Papirov

New Features:

  • Minor: When monitoring resources such as SQL Azure, Storage, and URLs, AzureWatch will trigger intermittent timeout alerts much less frequently due to improved retry logic

Bug Fixes:

  • Minor: AzureWatch Management Portal will no longer prompt to save changes when no changes need to be saved


AzureWatch Monitoring - Scalability improvements

23. April 2013 06:40 by author Igor Papirov

New Features:

  • Moderate: In anticipation of major influx of new users due to Global Windows Azure Bootcamp promotion, caching and locking improvements have been made to monitoring infrasrtucture

Bug Fixes:

  • N/A


AzureWatch Monitoring: Bug fixes

4. April 2013 21:39 by author Igor Papirov

New Features:

  • N/A

Bug Fixes:

  • Moderate: Approximate metrics per second for Azure Websites are now being properly calculated and extrapolated
  • Minor: Scaling alerts show current and requested instance counts again


AzureWatch - Soft launch of Azure Websites support

2. April 2013 22:36 by author Igor Papirov

New Features:

  • Major: Support for monitoring and auto-scaling of Azure Websites
  • Major: Refactoring of scaling and rule-evaluation engines in order to support Azure Websites
  • Major: Main Paraleap's site (www.paraleap.com) redesigned
  • Minor: Rule notification emails contain more information about rule that was triggered

Bug Fixes:

  • N/A

Known Bugs & Issues:

  • Moderate: Setup wizard does not correctly configure Azure Websites yet
  • Moderate: Estimates (Approximate metrics) for changes in performance data for Azure Websites are not yet accurately calculating
  • Minor: Scaling alerts no longer show instance counts that were executed
  • Minor: Clicking on an Azure Website tile on AzureWatch dashboard does not work

 



AzureWatch Monitoring - Planned upgrade rolled back

1. April 2013 18:23 by author Igor Papirov

A planned upgrade to AzureWatch Main website, Management Portal and Monitoring Service has been rolled back after 10 minutes of running in production due to a discovered issue.  After the upgrade, a number of customers began receiving false alerts as AzureWatch attempted to scale their no-longer existing (but configured for scaling) deployments.  Prior to the upgrade, AzureWatch used to simply ignore deployments that were no longer present, even if it was configured to monitor and scale them.

While the discovered issue had no bearing on production use and monitoring, the amount of false alerts sent to customers mail boxes every minute was deemed inappropriate and upgrade has been rolled back until a fix can be applied.

We apologize for the inconvenience.



AzureWatch Management Portal - Support for monitoring of SSL certificates

2. March 2013 13:22 by author Igor Papirov

New Features:

  • Moderate: AzureWatch can now track a new metric when monitoring SSL-based URL's, it is the number of days until the SSL certificate expires.  Once the metric is captured it can be used in Rules to generate a notification alert

Bug Fixes:

  • N/A

 



AzureWatch Monitoring - Azure-related Outage

23. February 2013 06:09 by author Igor Papirov

We are seeing world-wide connectivity errors connecting to Azure Storage.  Problems appear to be caused by an expired Azure SSL certificate

This Azure-related outage is impacting AzureWatch's ability to monitor customers subscriptions

 



AzureWatch Monitoring - Services restored

22. February 2013 23:58 by author Igor Papirov

All Azure-related services have been restored approximately 10 hours after the Azure Storage-related outage started.  AzureWatch was sending alerts to customers who monitor their Storage Accounts and who have enabled the "Alert on Failure" option.  During the outage, AzureWatch Management Portal was unavailable.

All AzureWatch related services are now running normally



AzureWatch Monitoring - Service Bus Queue counts & data import optimization

6. February 2013 14:13 by author Igor Papirov

New Features:

  • Moderate: AzureWatch now analyzes Active Message Count within the Service Bus queues instead of the total message count
  • Moderate: AzureWatch no longer imports all the possible metrics for monitored endpoints, but only those that are being aggregated.  Any metrics that do not have associated aggregations can no longer be viewed from the Historical Reports section

Bug Fixes:

  • None


AzureWatch Monitoring - Change to the way Azure Storage is monitored

24. December 2012 14:03 by author Igor Papirov

New Features:

  • Moderate: We have altered the way we monitor Azure Storage accounts.  Previously, during every monitoring cycle we executed a number of actions against Azure Storage accounts that were not very useful to monitor but have added a degree of unpredictability to monitoring results.  This caused us to send out a number of alerts to our customers for conditions that were rarely impacting their overall production environments.  In particular, during every monitoring cycle, we measured the time it took to count containers, queues and tables, as well as execute "CreateIfNotExists" command against table, queue and blob storage.  These commands typically do not execute within predictable time periods, fall within the Azure Storage SLAs that deal with transaction counts per second, or are in general useful to monitor. We have thus, removed these commands from the time measurements and now only monitor the time it takes to add & remove rows from table storage, add/remove messages from the queue and the time it takes to add/delete a file from blob storage.

Bug Fixes:

  • None


AzureWatch Management Portal - Support for Service Bus Queues

24. December 2012 13:53 by author Igor Papirov

New Features:

  • Major: Azure Compute Services (Worker and Web Roles) can now be auto-scaled or alerted based on the size of Service Bus Queues
  • Moderate: SQL Azure and Federations monitoring now supports capture of a new measurement: Active Query Count.  This allows users to monitor how many actively executing queries are running against their SQL Azure databases.

Bug Fixes:

  • Moderate: Fixed occasional failing of adding multiple new Rules
  • Minor: Setup Wizard now accepts names of the URL's from the first time
  • Minor: Enhanced validation of Rules to prevent invalid formulas, long names and descriptions


AzureWatch Monitoring - False outages alerts corrected

19. December 2012 22:49 by author Igor Papirov

Bug Fixes:

  • Moderate: Corrected an error where certain customers would receive false outage notification emails

 



AzureWatch Monitoring - Performance Improvements

20. November 2012 08:15 by author Igor Papirov

New Features:

  • None 

Bug Fixes:

  • Moderate: Performance improvement: no longer using wad-control-container blob storage to detect changes to monitored deployments in order to force capture of correct metrics - but instead forcing capture of all needed metrics on a schedule


AzureWatch Monitoring - Performance Improvements

15. November 2012 07:58 by author Igor Papirov

New Features:

  • None 

Bug Fixes:

  • Moderate: Some of the larger customers (those who are monitoring 50+ servers) were occasionally experiencing longer than 1-minute monitoring cycles.  Performance improvements have been applied to the monitoring logic