seflow s.n.c. - Power failure on Irideos Datacenter – Incident details

Power failure on Irideos Datacenter

Resolved
Partial outage
Started over 3 years agoLasted 8 days

Affected

SeFlow Cloud

Partial outage from 3:09 AM to 9:00 AM

Updates
  • Resolved
    Resolved

    All components has been restored

  • Monitoring
    Monitoring

    All networks equipments has been repaired, we're now re-integrating the 3rd antiddos cluster into the network and will close incident

  • Update
    Update

    We completed the setup of Minap IX and are working to complete the MIX link

    After that we will reactivate all AntiDDoS functionality (now is working at 70%).

    Network restoration status: 80% done

  • Identified
    Identified

    We reactivated all PNI Connection, Cogent Uplink, and partially MIX link

    We expect to complete the configuration changes for Monday Evening

  • Monitoring
    Monitoring

    We started injecting traffic into new router. Latency should decrease and speed increase on most locations.

    NTP service has been restored from time.inrim.it server

    IPv6 connectivity has been restored

  • Identified
    Identified

    We fixed all servers, switches issue and are now working on network side.

    Broken border router has been replaced and we are cabling new one and reconfiguring it

  • Update
    Update

    New Progress:

    Cloud Infrastructure: 100% Restored

    Tor Switch: 100% done

    Border Router: 0% done - eta 3 days

  • Monitoring
    Monitoring

    New Progress:

    • Cloud Infrastructure: 100% Restored

    • Tor Switch: 70% done - new done: 3 hours

    • Border Router: 0% done - eta 3 days

  • Update
    Update

    New Progress:

    • Cloud Infrastructure: 100% Restored - Customers can now boot the vm

    • Tor Switch: 4% done - eta: 12 Hours

    • Border Router: 0% done - eta 3 days

  • Identified
    Identified

    Gentili Clienti, indicativamente alle 21.40 Irideos ha avuto un guasto all' impianto elettrico causano lo spegnimento improvviso dell' intero datacenter 2 in Caldera Campus. Gli elettricisti hanno ripristinato la corrente alle 22.30 e han consentito l' ingresso allo staff SeFlow circa un 'ora dopo.

    Dal sopralluogo è emerso quanto segue:

    • Entrambi gli switch di core del Cloud per lo storage distribuito han mostrato errori al boot causando l' imopssibilità di avvio delle vm
    • Uno dei border router huawei ne8000 è in allarme non permettendo più il boot
    • 4 tor switch non sono più in grado di effettuare il boot o non si accendono più

    I prossimi passi? Il nostro staff sta lavorando per ripristinare la situazione il più velocemente possibile:

    • Infrastruttura Cloud: 75% Ripristinata - eta: 8 ore

    • Tor Switch: 4% ripristinati - eta: 12 Ore

    • Border Router: 0% ripristinato - eta 3 giorni

    Abbiamo deciso di dare massima priorità al ripristino del cloud e dei tor switch per portare tutti i clienti online

    #

    Dear Customers, at around 21.40 Irideos had a power failure in Milan DC2 that cause outages in our network. Most of our services were impacted. Electricians powered on data center again at about 22.30 and we were authorized to enter our rooms one hour later.

    What we discovered?

    • Unluckily both storage switch was unable to boot. This causing VM to fail to start
    • One of border ne8000 router failed to boot
    • 4 tor switch not boot anymore

    What to do? All our staff is working to restore any failure and this is the time frame

    • Cloud Infrastructure: 75% Restored - eta: 8 hours

    • Tor Switch: 4% done - eta: 12 Hours

    • Border Router: 0% done - eta 3 days

    We're giving the priority on cloud and tor switch to restore customer services