Back to Archive

Systems Performance Engineer

Wikimedia Foundation, Inc.

Remote | Posted: 4 years ago

This job is expired and may no longer be accepting applications.

Summary


The Wikimedia Foundation is looking for a System Performance Engineer to join its Performance Team. We are a globally distributed and diverse team of engineers, motivated to explore and innovate with ways to improve and monitor the performance and availability of Wikipedia and its sister projects.


We continuously measure performance on a fully Free and Open Source software stack, monitoring synthetic measurements (WebPageTest, WebPageReplay, Browsertime) and Real User Monitoring (direct collection, stored in Prometheus/Graphite). We also monitor the performance of our backend services (PHP, MariaDB, Varnish) and leverage an ELK stack for logging. This wealth of performance data is made available to the public through Grafana dashboards and open datasets. We are looking to complement the team’s wide area of expertise with a person who has in-depth knowledge of system-level performance (Linux kernel, containers).


We strive to be the performance standard bearer in the Foundation and the Wikimedia community. We aim to be visible in the performance community and work to influence others and bring learnings to the team.


Wikipedia and its sister projects are themselves powered by Free and Open Source software with MediaWiki at their core, surrounded by an ecosystem of services in PHP, Node.js, and Python. The web traffic is served from geographically distributed caching clusters powered by Varnish and Apache Traffic Server.


If you find what we do interesting, and you are excited by improving the reliability and delivery of one of the Internet’s top 10 websites, you might be just the person we need. Come as you are!


You are responsible for:



  • Reviewing the architectural design of new services that need to operate at scale

  • Monitoring services in production, and finding opportunities for optimizing their performance and resource utilization

  • Investigating, diagnosis and follow-up on incidents or outages in Wikimedia’s infrastructure

  • Troubleshooting and follow-up on emerging issues in our application stack

  • Interfacing between the Performance Team and the Site Reliability Engineering team (SRE)

  • Utilizing configuration management and deployment tooling (Puppet, Kubernetes)


Skills and Experience:



  • 2+ years experience in a System Performance, SRE or DevOps position or equivalent

  • Experience in supporting complex web applications running on Linux

  • Experience working with Python, Go or PHP applications

  • B.S. or M.S. in Computer Science or equivalent in related work experience

  • Comfortable with configuration management and orchestration tools (such as Puppet, Ansible, or Chef), and modern observability infrastructure (such as Prometheus, or Logstash)

  • Comfortable with shell and scripting languages used in an SRE or DevOps context (such as Python, Bash, or Go)

  • Good understanding of Linux/Unix fundamentals and sysadmin debugging


Qualities that are important to us:



  • Creativity to improve our infrastructure

  • Ability to work as an effective part of a globally distributed team

  • Aptitude for automation and streamlining of recurring tasks

  • Sharing our Values and working in accordance with them


Additionally, we’d love it if you have:



  • A track record of open source contributions

  • Experience with low-level systems troubleshooting (CPU/memory profiling, C/C++ experience, in-depth Linux knowledge)

  • Familiarity with modern distributed container management systems (Kubernetes, Docker Swarm, Mesos, …)

  • Experience with advanced distributed storage and database systems (Swift, Ceph, Cassandra, etc.)

  • Remote work experience with a highly distributed team



U.S. Benefits & Perks*



  • Fully paid medical, dental and vision coverage for employees and their eligible families (yes, fully paid premiums!)

  • The Wellness Program provides reimbursement for mind, body and soul activities such as fitness memberships, baby sitting, continuing education and much more

  • The 401(k) retirement plan offers matched contributions at 4% of annual salary

  • Flexible and generous time off - vacation, sick and volunteer days, plus 19 paid holidays - including the last week of the year.

  • Family friendly! 100% paid new parent leave for seven weeks plus an additional five weeks for pregnancy, flexible options to phase back in after leave, fully equipped lactation room.

  • For those emergency moments - long and short term disability, life insurance (2x salary) and an employee assistance program

  • Pre-tax savings plans for health care, child care, elder care, public transportation and parking expenses

  • Telecommuting and flexible work schedules available

  • Appropriate fuel for thinking and coding (aka, a pantry full of treats) and monthly massages to help staff relax

  • Great colleagues - diverse staff and contractors speaking dozens of languages from around the world, fantastic intellectual discourse, mission-driven and intensely passionate people


*Eligible international workers' benefits are specific to their location and dependent on their employer of record


More information


WMF
Blog
Wikimedia 2030
Wikimedia Medium Term Plan
Diversity and inclusion information for Wikimedia workers, by the numbers
Wikimania 2019
Annual Report - 2017 

This is Wikimedia Foundation 
Facts Matter
Our Projects
Fundraising Report

This job was sourced from StackOverflow Jobs.