Page tree
Skip to end of metadata
Go to start of metadata

For real time updates on all services; please see our status here or follow us on Twitter here

(The most recent incidents are at the top)

IncidentWho AffectedDescription ApplicationStatus
134Core network failureAll off-campus users

On Saturday 16 May 2020 at 0730 hours a failure was detected on our core layer 3 network management which resulted it disruption to services/systems by users off-campus

UPDATE 16 May at 1045 hours a temporary solution was implimented to stabilse off campus access

UPDATE 16 May at 2200 hours the temporary solution failed and all off-campus access to services/systems was disrupted/lost

UPDATE 17 May at 0645 hours the failed management card was replaced and services and systems returned to normal. We will continue to monitor for the next 24 hours

email, authentication, WVD and O365 platformsat-risk
133Windows Virtual Desktop - Connection errorsremote VDI users

On Tues 17 March 2020 at 2000 hours our monitoring recorded a failures in connecting to the VDI session hosts

UPDATE 18 March 2020 at 0130 Microsoft incident confirmed 9VQ0-D88 and currently being mitigated. 

Preliminary Root Cause: Engineers determined that an unexpected increase in demand for the service caused the latency. Mitigation: Engineers took the impacted back end service out of rotation allowing a backlog of queries to be processed.

We are continuing to monitor

132Disruption to all servicesAll users

On Fri 6 March 2020 at 0630 hours our monitoring recorded a cascade failure of all services/systems due to hardware failures

UPDATE 6 March 2020 at 0730 all off-campus services resumed

UPDATE 6 March 2020 at 0800 all on-campus services restored

All servicesresolved
131Wireless problemsAll BYOD users

On Thurs 27 Feb 2020 at 0800 hours all campus BYOD wireless services went off-line due to a failure with the secure authentication process

UPDATE 27 Feb 2020 at 1050 hours all wireless services returned to normal service

BYOD wirelessresolved
130O365 auth failureAll students

On Sat 8 Feb 2020 at 1310 hours monitoring indicated that the authentication service (ADFS) that handles authentication for various college resources stop processing valid login requests

Further investigations identified an on-going issue with O365 that was being mitigated by Microsoft

UPDATE 8 Feb 2020 at 1420 hours the system return to normal operation after recovery of an O365 component

129Loss of email serviceAll staff 

On Sat 11 Jan 2020 at 0910 hours all staff email services became unavailable both on and off campus including both internal and external mail flow queues

This was caused by a catastrophic failure of hardware during a maintenance window, which resulted it all the mail servers going off-line

UPDATE 11 Jan 2020 at 1015 hours we recovered the failed hardware and restored all mail services including mail flow queues. During the outage external email was held at external mail flow queues and delivered to staff mailboxes once services came back online - though with a one hour delay

Staff emailresolved
128Outlook client problemsStaff on campus

On Thurs 19 Dec 2019 at 1000 hours monitoring identified a problem in stability with Outlook clients on campus

This was traced to a cascade failure within our email services that was causing the instability only with on-campus Outlook clients - it does not effect any OWA services

UPDATE 27 Dec 2019 at 1400 further mitigation applied as the problem continues to effect random users

UPDATE 19 Feb 2020 0900 hours a temporary fix has been applied to fully mitigate this issue

Staff email connectivityresolved
127DNS resolver problemsAll on campus users

On Thurs 12 Dec 2019 at 1120 hours our monitoring reported issues accessing external internet resources

This was traced to external issues with the JaNET DNS resolver service we use (TT: 20191212-182659)

UPDATE 12 Dec 2019 at 1200 hours we applied mitigation to our DNS services whilst the issue is investigated by JaNET

UPDATE 12 Dec 2019 at 1240 hours 'JANET TT: 20191212-182659 [Open] Issues affecting DNS resolver service' ticket opened

UPDATE 12 Dec 2019 at 1330 hours further issues observed after reports that issue is resolved with JaNET DNS, re-applied mitigation

UPDATE 12 Dec 2019 at 1700 hours no further outages observered so mitigation removed and we will continue to monitor

External websitesresolved
126Wi-Fi login problemsBYOD users

On Mon 11 Nov 2019 at 0800 hours the colleges BYOD wireless services (RBC-INTERNET and RBC-PUBLIC) became inaccessible due to authentication portal failures

This was traced to corruption to the web service certificate chain file becoming corrupted during a routine reboot

UPDATE 11 Nov 2019 at 0930 hours the corrupted file was not fixable locally so a support call was logged with the supplier

UPDATE 11 Nov 2019 at 1300 hours both BYOD services returned to normal operation

BYOD wireless SSIDsresolved
125OAUTH failurewireless users

On Sat 28 Oct 2019 at 0850 hours our monitoring reported issues with processing Oauth requests on the wireless SSID RBC-PUBLIC. This failing and preventing users from logging into this wireless network

UPDATE 30 Sept 2019 at 0930 hours : we have escolated the issue with the various Oauth providers

UPDATE 3 Oct 2019 at 0615 hours : service returned to normal

wireless RBC-PUBLICresolved
124Online store inaccessibleall users on online store

On Fri 27 Sept 2019 at 0447 hours our monitoring systems reported multiple issues with accessing the WPM online store

We are continuing to monitor and investigate the cause

WPM online storeresolved
123ShareFile service downall users of this service

On Tues 25 Sept 2019 at 0815 hours our monitoring indicated problems access the Sharefile service This confirmed when users started to report problems accessing links

Citrix confirmed a global issue -

UPDATE 25 Sept 2019 at 1015 hours : service returned to normal

ShareFile external sharingresolved
122Cloudflare issuesVarious college cloud systems

On Tues 2 July 2019 at 1400 hours monitoring indicated that various college cloud systems started to report 502 bad gateway errors

This is connected to a global issue with Cloudflare

UPDATE 2 July 2019 at 1515 hours : update on mitigation by Cloudflare, we will continue to monitor as systems stabilise 

cloud servicesresolved
121O365 shared areasAll users of O365 shared areas

On Monday 3 June 2019 at 1300 hours monitoring identified problems accessing documents located in the colleges Microsoft O365 SharePoint shared areas

This was further identified as a problem specifically affecting our tenancy and using any Microsoft application to open/save documents. Using the online version to access documents or direct downloading is not effected

We are currently investigating with Microsoft

UPDATE 4 June 2019 at 0830 hours : monitoring indicates that the O365 shared areas problem has been resolved, we will continue to monitor

O365 SharePointresolved
120Loss of radio channelsEstates and IT

On Thursday 14 March 2019 at 0750 hours a radio repeater failed on Lamorbey campus which took down the ability for any college radios to communicate using channels 3 and 4. Alternative procedures are in-place using other channels whilst we await a replacement unit

UPDATE 15 March at 1530 hours : the repeater is back online and service to channels 3 and 4 is resumed

Radio communicationresolved
119VLE errorsUsers in USA

On Wed 13 Marc 2019 at 1630 hours our external monitoring started to report problems accessing the colleges VLE website, specifically from within the USA. This was tracked down to a problem with Facebook/Flickr at this time, as there is external content from these services imbedded in the colleges VLE

We confirmed that whilst some errors were being displayed it only related to this missing content and did not affect the VLE functionality

We will continue to monitor

UPDATE 13 March at 1930 hours : the issue has been resolved

118SAN controller failureAll users

On Tues 22 Jan 2019 at 1130 hours one of the virtual server cluster hosts went off-line. This was caused by a controller card failure in the attached storage. All services remained online but there is a loss of all reslience so all service should be considered at risk

UPDATE 23 Jan at 0830 hours : the failed virtual server was bought back online and rejoined the cluster

virtual server hostresolved
117Water problem in server roomAll users

On Sat 19 Jan 2019 at 1030 hours during routine maintenance a water leak was discovered in server room L115. This was traced to being caused by a leaking AC unit

UPDATE 21 Jan at 0800 hours : Estates dried out floor and provided temporary containment

UPDATE 22 Jan at 1030 hours : engineer identified leak cause in AC drain pipe and implimented a permenant fix

server room L115resolved
116Wireless controller failureOncampus users of wi-fi

On Sat 12 Jan 2019 at 0830 hours during routine maintenance checks one of the resilient wireless controllers failed after a reboot. This caused a failure of the high availability nature of the service and so is to be considered at risk

UPDATE Mon 14 Jan at 0900 hours -efforts to recover the controller have failed so a support call has been logged to get it replaced

UPDATE 24 Jan at 0800 hours : FortiNet engineers are investigating the cause and collected logs for further anaylsis

UPDATE 28 Jan at 0800 hours : issued identified as bug 0532038 still awaiting fix

UPDATE 6 Feb at 0800 hours : patch applied a HA cluster rebuilt, tested and normal service resummed

115Email delivery problemUsers on campus

On Thurs 20 Dec 2018 at 0920 hours monitoring identified problems with users sending emails to external email addresses - they were being returned undeliverable

After investigation it was discoved that an on campus email server had corrupted the TLS certificates used to secure the email flow off campus to Microsoft

UPDATE Thurs 20 Dec 2018 at 1000 hours - the problimatic email server was removed from the service cluster and normal email flow resumed

outgoing emailresolved
114Google public DNSoff campus users

On Tues 18 Dec 2018 at 0710 hours our monitoring identified issues with DNS resolution for our domain with Googles two public DNS servers

This was confirmed as only effecting Googles public DNS servers and

We advise users to switch to another DNS server - like OpenDNS whilst we resolve the issue with Google

UPDATE Tues 18 Dec 2018 at 0830 hours - mitigating problem with Google but expect further disruption throughout day if using Googles public DNS servers

UPDATE Tues 18 Dec 2018 at 1030 hours - issue resolved and no further disruption anticipated

public DNS for domain servicesresolved
113O365 SharePoint OnlineAll users

On Tuesday 6 November 2018 at 1100 hours our monitoring indicated problems with trying to access college resources in SharePoint sites both on and of campus

This was related to a Microsoft incident SP152986

UPDATE Tues 6 Nov 2018 at 1330 hours - mitigation being applied by MS

UPDATE Tues 6 Nov 2018 at 1430 hours ongoing mitigation being applied by MS

UPDATE Tues 6 Nov 2018 at 1620 hours - issue reported as remidiated by MS, we will continue to monitor

SharePoint Online sitesresolved
112O365 SharePoint and OneDriveAll users

On Monday 22 October 2018 at 2010 hours our monitoring indicated problems with slow response when trying to access college resources in SharePoint sites and OneDrive both on and of campus

This was related to a Microsoft incident SP151830

UPDATE Tues 23 Oct 2018 at 0915 hours - The problem continues and no includes timeouts when trying to open and save documents within these areas

UPDATE Tues 23 Oct 2018 at 1315 hours - Ticket raised with MS #11844977 requesting further support

UPDATE Tues 23 Oct 2018 at 1515 hours - performance issues accessing sites returning to normal, we will continue to monitor

UPDATE Tues 23 Oct 2018 at 1730 hours - service returned to normal

SharePoint and OneDrive sitesresolved
111Athens authenticationUsers using Athens gateway

On Mon 1 Oct 2018 at 1450 hours our monitoring indicated problems with the Athens authentication gateway that protects online resources, this was confirmed as a major outage

UPDATE Mon 1 Oct 2018 at 1715 hours - Eduserv have applied a fix and systems are recovering, we will continue to monitor

UPDATE Tues 2 Oct 2018 at 0715 hours - service now stable

Athens authenticationresolved
110Power failureUsers in Rose Theatre and Cafe 

On Fri 7 Sept 2018 at 0917 hours monitoring reported loss of conectivity to all voice and data devices located within the Rose Theatre building. Further investigation identified a power failure to the data switch connecting to other buildings on campus

UPDATE Fri 7 Sept 2018 at 0945 hours - the power was restored to the data cabinet but further problems when trying to power on the data switch - a replacement power supply was requested from support

UPDATE Fri 7 Sept 2018 at 1150 hours - the power supply was replaced and the switch powered on and repatched. All systems returned to normal operation

Voice, Wi-Fi, computers and CC terminalsresolved
109O365 shared areas and emailuser based in USA

On Tues 4 Sept 2018 at 1115 hours monitoring reported that access to O365 shared areas and email was failing in the US continent

UPDATE 4 Sept 2018 at 1400 hours : Microsoft advised known incident MO147606 is the cause

UPDATE 4 Sept 2018 at 1530 hours : Microsoft advised further details of symptoms


UPDATE 4 Sept 2018 at 1630 hours : Monitoring reporting services returning to normal

O365 sharepoint sites and emailresolved
108O365 shared areas all users of shared areas

On Thurs 30 Aug 2018 at 0810 hours users started to report issues accessing O365 sharepoint sites (shared areas), somtimes they got in but mostly access just hung after authenticating. Confirmed not an authentication issue as no other services effected

UPDATE 30 Aug 2018 at 1000 hours : Microsoft advised known issue SP147225 is the cause

UPDATE 30 Aug 2018 at 1300 hours : Escolated with Microsoft #1276633

UPDATE 30 Aug 2018 at 1345 hours : Intermittent access restored but currently running very slow 

UPDATE 30 Aug 2018 at 1435 hours : MS confirm our tenancy is effected by service incident SP147225 with no ETA to fix yet

UPDATE 30 Aug 2018 at 1730 hours : Mitigation by MS being applied and stability and response times have improved, still awaiting further confirmation of status

UPDATE 31 Aug 2018 at 0630 hours : Ongoing recovery of service by MS, still at risk but monitoring indicates stable access and responsive over the past 12 hours - more details Known issues

UPDATE 31 Aug 2018 at 1630 hours : MS confirm service restored and issue closed - more details Known issues

O365 sharepoint sitesresolved
107Server failure all users

On Sat 18 Aug 2018 at 1140 hours one of the hypervisor nodes failed due to corruption during a routine update window, this resulted in all hypervisor hosts being servered from the remaining hypervisor node

The impact is the loss of reslience and load blancing across multiple systems and services, which may restult in certain services being slower to respond and all systems are now considered 'at risk'

UPDATE Mon 20 Aug 2018 at 0630 hours: recovery of the failed hypervisor node OS was completed and fully tested so starting the rebuild of the cluster drives

UPDATE Mon 20 Aug 2018 1940 hours: rebuild of cluster drives complete and sync across nodes active. Normal HA cluster operations resummed

various sytemsresolved
106VLE downall users accessing VLE

On Tues 7 August 2018 at 1320 hours monitoring should loss of access from multplie locations. Cause currently being investigated

Update 7 Aug 2018 at 1340 hours VLE monitoring reporting back online

Update 8 Aug 2018 at 0700 no further issues logged 

105Primary server room failureall users

On Sun 27 May 2018 at 1730 hours the two AC units in the primary server failed resulting in an uncontrolled increase in room temperature. This reached critical at 1750 hours when a number of systems and services located in the server room failed

Update 27 May at 1815 hours: all remaining services and systems were shutdown, the AC units were power cycled and server room vented

Update 27 May 1900 hours: a temporary AC unit was installed to allow some systems to be restarted

Update 27 May 1935 hours: identified failed hardware and started backup restore of systems

Update 27 May 2005 hours: key authentication, email and DNS systems back online

Update 27 May 2110 hours: restore of primary key systems complete and services back online but with no redundancy

Update 28 May 0830 hours: confirmed temporary AC in server room holding and no further failures

Update 28 May 1600 hours: confirmed temporary AC in server room holding and no further failures

Update 29 May 0830 hours: key secondary systems bought back online as temp AC still holding

Update 30 May 0700 hours: still awaiting fix/replacement of broken AC unit, as result all third level systems remain off-line which includes DA, wireless, SQL replication, DAG, hyper-v replication, GFI, WUS, WDS, RDP and all resilient systems. No current ETA to fix 

Update 31 May 0700 hours: awaiting installation of temporary hire AC units later today

Update 31 May 1130 hours: BYOD wireless service restored 

Update 31 May 1730 hours: hired AC unit installed, all third level systems/services now back online, risk level changed from critical/red to warning/yellow

Update 20 June 1130 hours: failed AC unit replaced and server room returned to normal operations

 all systemsresolved
104 Global transit links across JaNETall users

On Tues 8 May 2018 at 0818 hours the global transit providers out of the JaNET network went off line

This result in loss of internet access to certain parts of the world and also effecting external users trying to access services on campus 

UPDATE 8 May at 0935 hours; services returned to normal 

103MyAthens login errorUsers trying to access MyAthens

On 20 March 2018 at 0345 hours our monitoring reported problems accessing the myAthens home page after logging in openAthens - returns a HTTP 500 server error

This error was logged with EduServ as it seems to only affect accessing the myAthens site not authentication or resources

UPDATE 20 March at 1100 hours - site back online

myAthens site resolved
102Problems accessing college resources off campusUsers based in USA and Australia

On 20 March 2018 at 0738 hours our external monitoring report problems with users accessing college web resources from locations in the USA and Australia:

This has been logged with JaNET 

UPDATE : 20 March at 0900 hours - JaNET confirm routing problems #TT180695

UPDATE : 20 March at 0930 hours - routing issue resolved and services are returning to normal 

off campus resources resolved
101JaNET link issues redundant systems

On 6 March 2018 at 0700 hours our external monitoring reported intermitant problems with our reslient JaNET link via PR

A ticket was raised with JaNET Operations #180634 

UPDATE : 6 March at 0900 hours - advised routing/resolver issue in core network which is spreading to other network services

UPDATE : 6 March at 0930 hours - effecting external peering on core network which is effecting on campus services

UPDATE : 6 March at 1100 hours - JaNET advised that the issue experienced this morning was due to a corrupted forwarding table in a router within Telehouse North.  All systems returned to normal

reslient link via PR and other servicesresolved
100 Power lossnone

On 3 March 2018 at 0054 hours the campus UPS devices switched to battery operation until 0104 hours

No disruption to live services during this period recorded and further information as to cause requested from Estates

power resolved
99VLE downall users of

On 1 March 2018 at 1117 hours our monitoring reported that was not responding and further investigation confirmed the outage from multiple locations.  Ticket raised with CoSector Digital (ULCC)

UPDATE : 1 March at 1150 hours - ULCC confirm power outage at data centre, no time to fix given

UPDATE : 1 March at 1210 hours - website now responsive to requests but returning an error


UPDATE : 1 March at 1230 hours - CoSector Digital (ULCC) confirmed power restored but now in recovery mode for the next few hours until normal services return 

UPDATE : 1 March at 1645 hours - confirmed back online 



98AC failure on campus services

On 22 Feb 2018 at 0040 hours the AC unit in server room two failed which has resulted in the shut down of systems running in this room - primary affected users have been notified and all campus services systems should be considered 'at-risk' until further notice

UPDATE : 22 Feb at 0700 hours - Estates dept. notified of failure

UPDATE : 27 Feb at 0700 hours - No change still awaiting fix for failures

UPDATE : 5 March at 0700 hours - No change, still awaiting fix for failure

UPDATE : 12 March 0800 hours - engineer onsite investigating the failure

UPDATE : 19 March at 0700 hours - No change, still awaiting fix for failure

UPDATE : 26 March at 0700 hours - No change, still awaiting fix for failure

UPDATE : 2 April at 0700 hours - No change, still awaiting fix for failure

UPDATE : 9 April at 0700 hours - No change, still awaiting fix for failure

UPDATE : 16 April at 0700 hours - No change, still awaiting fix for failure

UPDATE : 23 April at 0700 hours - No change, still awaiting fix for failure

UPDATE : 30 April at 0700 hours - No change, still awaiting fix for failure

UPDATE : 3 May at 1130 hours - AC unit fixed in C120, waiting for temperature to stabilise over the next 24 hours 

UPDATE : 8 May at 0700hours - AC unit fixed in C120, all services/systems back online

97Compromsied website - visitors of VLE

On 11 Jan 2018 at 1030 hours CSIRT notified us of a possible website compromise

[JANET_CSIRT #1624173] Possible webserver compromise

> Google detected 4 suspicious URLs (space inserted to prevent
> accidental clicking in case your email client auto-links URLs):
> http://vle.bruford (
> http://vle.bruford (
> https://vle.bruford (
> https://vle.bruford (

Update : 11 Jan at 1230 hours - confirmed scan status also at WebsecurityGuard

UPDATE : 11 Jan at 1530 hours - ULLC confirmed location of offending link ( and removed it

Waiting for Google to rescan the site to confirm status cleared 

UPDATE : 12 Jan at 0800 hours - Google safe search still not cleared 

UPDATE : 13 Jan at 0900 hours - Google safe search flag reset to safe