BT Tradespace team blog

Rants, musings, news and updates from the BT Tradespace team.

Back to the About BT Tradespace homepage

Don’t make a Pollocks of your data!

Posted by Alex Loveless

1 Mar 2009 in V2

I love data. I spend lots of time looking at it. I spend lots of time telling people that I’ve been looking at it, and what interesting things I’ve found. People tend to look at me sympathetically at that point and get back on with doing real work.

Something I’ve not spent much time doing of late is looking at the actual data. What I usually look at is prettified abstractions, aggregations and representations of the ‘data’ served out through random number generators like Omniture.

In the process of rebuilding that hallowed monolith to SME culture that is BT Tradespace, I’ve been having to think about real data again, or more specifically, how it’s structured in databases.

To most people, ‘data’ is the amorphous soup of information that underpins the interweb which barely manifests itself other than as the latest Steven Fry tweet.

But the data out back comes in various forms and flavours. And how you store and access it is vitally important when constructing a web applications.

I was intrigued when I read this article and is sensational headline claim, and its straightforward explanation of the emerging new world of online data. So much so, I sent the link to our Chief Technical Architect Richard.

His response solicited a small stumble to my generally rampaging ego. What the article suggests, is that for the sake of ease and extensibility, you could forsake the relational database altogether and store the data however it damn well suits you. BT Tradespace (both V1 and the nascent V2) use Endeca at their core which, as Richard irritatingly pointed out, forsakes the relational database for the sake of ease and extensibility! Now, Richard is not at all smarmy, but for the sake of dramatic effect you should think of him so.

The subsequent conversation down the pub developed upon this theme. Richard suggested that, for your website, you could store your data however you wish, across 20 databases if you so felt. After all, our service orientated application is quasi-anarchic in structure, so having a highly structured database supporting it seems a little strange anyway, right? This solicited a response from me along the lines of “get your dirty mangling hands of my data!” What actually happened is that a launched into a lengthy diatribe about future proofing your data. Here is a shortened version:

You see, we do store our data across, like, 20 databases on Tradespace V1, and it causes us a right royal collective headache. When we built V1, no-one really thought that much about what we’d need to do with the data further down the line. So they just threw the data architecture together as suited the application developers (even if in many cases this defied common sense). The data is all over the place in and most of the logic for accessing it lives in the dark recesses of the application itself. So querying said data is extremely difficult and costly.

So when Richard suggested that we could (theoretically) do something similar, my data sensibilities went into spasm. I need to be able to use our data – not just to relay blog entries to the web page, but to understand all the wonderful things that happen on our website and work out how to do stuff better. If you just take spread it around all unevenly like butter on a Tesco sandwich, doing so may not just be hard, it may be impossible.

The point is this: if for the sake of convenience, performance, extensibility etc. you want to go all Jackson Pollock on your data, then fine, just make sure you’re also keeping it somewhere else for other, more structured uses.

This is, of course, all moot, and I was preaching to the converted with Richard. We’d already figured this one out for V2. We just ferry the data off to the datawarehouse as it emerges, leaving the application/operational database to do as it will. For that database we need to think about scale and performance, flexibility and extensibility. If that means relinquishing structure, then so be it. I’ve got all my data in a very structured form elsewhere.

All this seems so obvious, we broke this rule on V1, and I’ve seen it broken (usually by overly creative application developers) countless other places. This is horses for courses. It’s unlikely that you only have 1 use for your data, and although storing it several places seems wasteful, it’s generally essential to maintain its integrity for its multiple uses. Again, blindingly obvious, but here’s the rub: doing this retroactively is often spectacularly difficult and expensive.

Respect your data from day one, as I can guarantee you’ll need it later.

Comments Off

A Quantum of Deja-vu

Posted by Alex Loveless

11 Feb 2009 in V2

There are theories. Many theories. Some say we’re swimming through an infinitely large pepper pot of quantum indices. Perception of time is a function of our ability collapse quantum probability waves thus carving out a fleeting but tangible present. Others say that each quantum state is acted out forming an endlessly expanding melee of alternate realities; that our consciousness exists from one miniscule moment to the next upon a node of a huge quantum net.
So you might say that if everything happened, then nothing also happened. It’s impossible to know which quantum path you arrived at the present by, as memory is a factor of time and thus a slave to quantum uncertainty.
So with a toss of the quantum coin I will render all pasts uncertain and thus irrelevant. Either that or I’ll confuse you, using quasi-scientific, poorly researched ramble, into forgetting that the past happened – that we’ve been here before…
Either way what’s done (or probably done) is done. I banish thee, ghost of V2s past and summon the spirit of V2s yet to come.
Some probability waves have already been collapsed:

  • V2 will be built using Java (J2EE)
  • V2 will have a public API
  • V2 will be virtual
  • V2 will be agile (yes, in the true sense of the term this time!)
  • V2 will be mystical and magical
  • V2 will succeed where V1 failed
  • V2 will rock

We have already designed the user experience, we just needed to identify a platform gnarly enough to contain the beast. None exist, so we’re building one from scratch.
When will the beast be unleashed? Simple answer: an incalculable number of quantum iterations away. For the present, this will have to satisfy you.
V2 is dead, long live V2!

Comments Off

Surviving the death by a thousand cuts

Posted by Alex Loveless

25 Sep 2008 in Infrastructure

Come friendly bombs and fall on Slough!
It isn’t fit for humans now,
There isn’t grass to graze a cow.
Swarm over, Death!

(Excerpt from Slough by John Betjeman)

I’m hoping that no one intends to take Mr. Betjeman’s words too seriously, as a small corner of Slough is now a little brighter thanks to the presence of our servers. Keen observers will have noticed that our site was down for 6 hours last weekend. We unplugged our servers and took them to a shiny new data centre in Slough where they will live from now on. Thankfully for them they’ll never have to brave the supersized Tesco or Lloyds Bar on the High Street. They’ll also never bump into David Brent.

That’s not the only time that the site has been down, or at least unresponsive, recently. As noted in my post on the main BT Tradespace blog some time ago, we’ve been experiencing some performance issues since our site got noticed by a few extra people. Anyone following the Twitter saga will know where this would lead if left untended.
I’d like to talk a little about what we’re doing about this and journey we took to get where we are today. Beware folks, here there be dragons!

When we first started tackling this problem we figured out pretty quickly that simply throwing more hardware at the issue wasn’t working. The problem must be buried much deeper in the complex (overly so) BT Tradespace application.

Before we go any further it’s worth mentioning this: BT Tradespace, in its current incarnation, was built as a prototype. It was never deigned to cope with the amount of traffic that it’s currently bearing. That’s not to say that we can’t or shouldn’t do anything about this, it’s merely to provide some context.

We embarked on an explorative mission to work out what was wrong with the poor, sick pony. First of all we ran a load test in an effort to really flush out the bottlenecks. The problem was, the bottlenecks were conspicuous by their absence. We completely failed to recreate anything resembling real conditions. This led us to the tentative conclusion that many of the problems (particularly the performance spikes) were triggered by a specific set of events. We were now in needle and haystack territory.

We then set about analysing the out voluminous output from the load test to see if we could find the smoking gun. What we found was less a smoking gun, and more like a thousand little bloodied daggers.
The performance issues had arisen out of a raft of issues from memory leaks, to suboptimal database queries, to sloppy algorithmic development. Each of these would only manifest intermittently, or would take time to build up. The net effect is the erratic behaviour we see on the site.

We also discovered that our application and particularly Sharepoint (WSS) is particularly disk intensive, causing jam-ups in the system when the database was being thrashed.

The effect has been dubbed the ‘death by a thousand cuts’. Not much fun, but surely better than being forced to listen to the full Cliff Richard back catalogue or having to watch all Eastenders episodes back to back.

The path forward was immediately much clearer. First thing to do was tackle the disk issue (and turn off the Cliff Richard CD). To do this we’ve implemented a SAN (Storage Area Network – basically super fast, shared disk) which also conveniently addressed some data level resiliency issues we had.

Next we need to start fixing the code. We’ve embarked on a series of best practise initiatives and code reviews that will deliver in a big bang release with a step change user experience update that will really propel BT Tradespace into the Social Commerce stratosphere.

We’ve still got a lot of work to do. These initiatives will hopefully get us to where we should have been already; the next steps are to further improve the application and infrastructure so that V1 will really fly. We also need to tidy up our processes so that we don’t land ourselves back in the same position in the future.

So it looks like we’ll survive the ‘death …’ with only a few cuts and grazes. The important thing now is to make sure we don’t get caught in the same trap again.

Comments Off

Scaling Without Scoobys Part 2: The Motion of the Ocean

Posted by Alex Loveless

20 Jul 2008 in Infrastructure

In my last post I posed my concerns about predicting load to enable scaling. On my odessy to discover a fool-proof way of predicting traffic growth I learned something: understanding of how much traffic your site can expect is largely useless. This is for the following reasons:

1) You can’t. It’s just not possible to know.

2) It’s not size of the boat, it’s the motion of the ocean

What do I mean by this? Well, scaling is not really about the amount of users that are using your site at some point in the future, it’s about being in the position of being able to respond to significant changes in traffic (and indeed user behaviour) in a timely fashion. Traffic doesn’t usually increase linearly. In fact, for a website growing its user base, it grows in waves, each larger in its intensity. You can take a punt at predicting these waves’ size and frequency (by understanding your business targets, marketing plan and seasonality etc.), but it’s more art than science (and when I refer to art I mean Jackson Pollock not Da Vinci).

So how to you account for this erratic and unpredictable behaviour? There are 2 key factors in your understanding of how to scale:

1) Your current capacity

2) Your platform’s ability to scale (both hardware and software)

Understanding your current capacity is key to know when to scale and your baseline quantifiers for scaling. If you have 4 servers that current run at 50% capacity, then you can take double the traffic. If you wish to be able to cope with spikes in that traffic (big waves) you probably want to continue with that 50% overhead – particularly if your traffic profile is prone to occasional (or frequent) big waves. If you expect to double your baseline traffic rate over the next 6 months then you need to start looking at growing that capacity (get a bigger boat) to nearer 8 servers over the next 6 months. It’s a simple as that (well, not always, but for the sake of argument). There are various ways of understand your capacity – it’s not as easy as just looking at your server usage profiles in most cases. The obvious of which is to run a load test (chuck increasing amounts of simulated traffic at the site until it falls over, at which point you know your capacity). Unfortunately that’s the easy bit.

All this is irrelevant if your platform won’t scale, or perhaps more commonly, won’t scale quickly. So if you’re out there in the choppy ocean with a little boat, you’re doomed as there’s no way you’ll get back into shore to get a bigger one, so that you can get back out there fishing (or being a pirate or whatever the hell your doing out there). Also, what if it’s the wrong kind of boat – what if you need a whaler rather than a trawler? Ideally, you start off with the appropriately sized boat of the correct ilk that’s suitable for the prevailing conditions; but before you can afford that boat you need to catch some more fish. So you need a boat that can get you to the nearest boatyard quickly for an upgrade each time the storm worsens.

I realise that this analogy sucks, but it’s the only way I could fit mild sexual innuendo into an article about technical platform scaling. So what point am I trying to make?

Your ability to cope with the motion of the ocean correlates directly to your ability to scale. If you can’t change, adapt and scale quickly then you’re doomed – dashed against the rocks. So it’s not the size of the boat, but it’s agility, that really counts. That way, when you hit stormy waters, you just chuck a bunch more hardware at the problem, PDQ. You obviously need an infrastructure that’s able to cope with the prevailing conditions, but this way you don’t have to worry too much about freak storms or rapidly building swell. Moving back to the original problem of predicting traffic numbers, it becomes much less important (although still vital to have a rough idea of) as you just scale as conditions dictate.

So perhaps the phrase should read: it’s not the size of your boat, but its inherent agility and ability to cope with the motion of the ocean. Somehow it doesn’t have the same ring to it…

The BT Tradespace team currently live on a small boat constructed of driftwood and coconut husks. We’re in the middle of a big, moderately choppy ocean. She’s a bit of a hotchpotch as we didn’t really know what type of boat we needed to build when we started. We don’t know how much the old girl can take, or how choppy the ocean will get. What we do know is that she’s neither very agile nor scaleable. What we’re endeavouring to find out is how much more she can take (by running a series of load tests) and how we can fortify her against any potential storms (by reviewing the design and architecture to see where there’s opportunity to strengthen and improve). Although there’s life in the old girl yet, she’ll not last forever. What we’re also doing working on is her successor, which won’t be a boat at all, but a Star Wars style battle cruiser that will fry that puny ocean with its laser cannons.

So batten down the hatches folks, it’s going to be a choppy ride, but by Jove we’re ready for it. Yar!

Comments Off

Scaling without scoobys

Posted by Alex Loveless

10 Jun 2008 in Infrastructure

Never let it be said that we don’t like a challenge at BT Tradespace. If we’re going to make our hallowed target of seller sites by the end of next March then we’re going to hit some pretty big technical challenges.

Not least of these is the task of planning the infrastructure to deal with this. The problem is, that although the target is perfectly adequate from a business/marketing perspective, it doesn’t tell us a lot about what to expect with regard to traffic volumes – the basic currency of web performance logistics. Understanding this is essential for our understanding of how we need to scale our infrastructure.

The problem us that seller sites, once they’ve been set up, remain pretty inert until someone either makes some changes to them (which for the many will be infrequent at best) or views them and potentially buys something. In the meantime they’re doing a bunch of other stuff – searching, browsing, socially commercing. This is a little hard to estimate for. What are the targets for these?

If your target is 1 million searches per day, then it’s a pretty easy job to plan for (if 1 server currently handles 100,000 daily searches, it’s a reasonable assumption that you’ll need 10 to cope with the million.) Now I’m quite aware that, in reality, it’s a little more complex than this, but just getting to this starting point would be good.
We can perhaps take our ambient traffic now as a ratio of seller sites then extrapolate accordingly. I tried this, and by way of and explanation of my assumptions when I distributed the numbers I wrote:

The assumption is that the overall usage will grow roughly at the same rate as the seller site numbers. This is broadly true of visits but not page views to date. However, generally traffic is growing markedly faster than seller sites. My feeling is that this will change over time as they ramp up marketing activity to sign-up seller sites. I do not believe that this directly translates into traffic although it will have a positive effect. I think with increased seller site activity, particularly if they’re agent sign-ups will cause seller site growth to exceed traffic growth, thus the net effect over time can be assumed to be roughly even. More simply put, traffic and seller site numbers should still grow at roughly the same pace.

Make any sense to you? Me either. It has an air of an Apprentice victim trying to explain to Sir Alan why they couldn’t sell an egg sucking machine to his own Grandmother – confused and slightly nervous.

So if it sounds like I’m struggling to get my head around this, it’s because I am. Not a scooby. This seems like it should be obvious, yet I feel like I’m missing something.

Answer on a postcard please folks.

Comments Off

Has the current political uncertainty made you

View the results of this poll

Previous polls

Archived articles