With the unfortunate demise of Bloom.fm, a product I designed and developed as CTO for 2.5 years at a company I co-founded 5.5 years ago; I think it’s a good time to write a series on my learnings, decisions and experiences.
My development team (minus visual designers) varied from 6-13 developers over the years. With this size team we wrote a music streaming service backend, a music content ingestion system, 3 native mobile apps, a web app, an internal admin console and designed and built in-house infrastructure for hosting everything. We did not use cloud hosting because of the need to store over 700TB of content securely and without the massive on-going costs that cloud storage would incur. Here’s a quick list of the tech we used at Bloom…
I spared no expense with our development tools. Each developer had a fast workstation with dual 27″ monitors. We had about 18 development servers which included simulated production environments (dev & test) and 9 Windows and OSX build servers for continuous integration and testing.
- TeamCity for continuous integration
- All builds and tests were run automatically and logged
- Custom deployment tasks and services were written to allow one push deploy to dev, test and production
- Confluence for documentation
- Mercurial for source control
- Branching was used extensively.
- Commented out code was not allowed to be checked into the repository.
- Visual Studio 2012 for backend and web development
- Xcode for iOS development
- IntelliJ IDEA for Android development
- YouTrack for bug and feature tracking, project management
Backend web services
Every element of the backend was implemented with care and attention to detail to architecture and performance. For example, our caching strategy was to cache structured data (rather than blind output caching) which allowed caching of combined static (e.g. search results) and dynamic (e.g. user purchase state) data. We used a Redis pub/sub to support distributed cache invalidation which allowed us to use AppServer memory for local caching of shared data that would otherwise need to always be accessed via Redis.
Because we started a long time before tech like SignalR was released, we developed our own scalable real-time notification service based upon http.sys and Memcached (later Redis) back in 2008. This notification service allowed us to deliver hundreds of thousands of notifications without scalability issues. Users could edit a playlist on one device and see it change immediately on their other device.
(team size: 7 people including myself)
- C# 4.5, WCF and IIS7.5 for web services
- Postgres 9.4
- Sqlite for simulated fast unit testing
- Redis for structured caching (search results, user, artist, album data, personalised radios etc) and distributed state (session state, streaming tokens, etc)
- Solr for search (we originally used NLucene)
- RabbitMQ for queueing and request offloading
- NUnit for unit testing
- Booksleeve redis client
- Shaolinq Linq provider for Postgres and Sqlite
- Platform.VirtualFileSystem for file system access
- StructureMap for dependency injection
There were 3 VLANS on the network. Management, Internet and Backend. The backend VLANs all ran on 10Gb ethernet. All of Bloom’s infrastructure was managed in-house at co-location facilities in central London and docklands. Bloom ran on 60 servers (approximately 40 physical servers and 20 virtual servers). All physical servers were Dell PowerEdge servers (R620s, R720s). Storage servers consisted of 14 clustered 4U SuperMicro storage servers running our own setup of CentOS and Gluster.
The total cost of infrastructure was a little over £200K (capex). I worked my ass off to make the most efficient custom built solution — and it was awesome fast. This one off cost was less than the ongoing monthly spend our CEO allocated to banner ads with an insane CPC of over £200 per user. Cloud hosting is great for many startup business models where costs scale directly with user count (like Dropbox) but if we had hosted Bloom.fm on Amazon it would have cost well over £200K every month (not just a one off capex!).
(team size: 2 people including myself)
- Linux LVS for load balancing
- Zabbix for monitoring
- Puppet for automation and server management
- EVERYTHING was scripted. New servers were automatically setup in a just a few steps
- Graphite for API logging and monitoring
- NGINX for serving images and media content
- PHP & PHP-FPM to manage streaming authentication via Redis (time limited tokens etc)
- Validated requests were redirected back to NGINX using X-ACCEL-REDIRECT
- GlusterFS for storage (two 800TB volumes storing over 400 million files each)
- Each brick was used as an independent server and made failsafe using LVS allowing mounts to work right down to the failure of the last brick in the cluster
- Each brick was formatted using xfs with 512 byte inodes
- Insane performance which increased every time new storage was added
- One-off cost was £140K. Would have cost over £70K a month if we had stored in the cloud (not even including data transfer costs!)
- XenServer as the hypervisor for our VMs
- FusionIO PCI flash storage for primary databases (excellent value)
- Force10 10Gb switches (excellent performance and value)
- Cisco ASR 1000 routers
- Pingdom for SMS monitoring
All mobile apps were written natively for performance and size.
(team size: 4 people including myself)
- Objective-C for iOS
- Java for Android
- The UI used OpenGL for animation performance (rare for any app outside of games)
- C# for Windows Phone
We had two web apps. The Bloom.fm music player web app and the internal admin system for managing and monitoring the system as well as user management.
(team size: 2 people)
- C# 4.5
Third Party Integration
Bloom integrated with many third party services including:
- Facebook (registration and sharing)
- Google+ (registration and sharing)
- Twitter (sharing)
- Discogs (metadata)
- Last.fm (metadata, scrobbling)
- Adswizz (audio advertising)
- MusicBrainz (metadata)
- Apple (payments)
- Google (payments)
- WorldPay & HSBC (payments)
We built an extremely comprehensive music streaming service with a comprehensive feature set that easily handled 100,000 concurrent users with almost no load and could have handled ten times that without any additional servers or optimisations. This was all done (including computers, servers, salaries) on a technical and (graphical) design budget of £3M over 2 and 1/2 years. Many companies can easily spend upwards of £1M on storage alone (don’t use an off the shelf SAN for a music streaming service!). The key was to keep all design and development in-house (contractors are ok as long as they work in-house), hire only the best of the best and to be confident and capable enough to develop custom solutions when necessary. A tech startup creates tech right?