Archive for the ‘technology’ Category

The Importance of Virtualization

May
28

I’m proud to share another in a series of guest posts written by Lijit employees. This week we present an installment from Mike, who seemed overly excited about writing and sharing this post.

Hi, I’m Mike Merideth, the Director of IT here at Lijit, and I’m going to talk a little bit about the nuts and bolts of how we do what we do. Over the past year I’ve had the opportunity to design and implement the production network and server infrastructure on which Lijit runs. It’s been a great year of challenges and breakthroughs, but if there’s one key architectural concept that has gotten Lijit to where it is today, it is virtualization. We use Xen for our virtualization technology, which has the advantage of being free Software (both in the “free beer” sense and the “free speech” sense). CentOS 5.1 (a Linux distribution which is based on the market leader RedHat) includes this functionality out of the box, and has performed very well for us.

So why does Lijit use virtualization? There are a number of good reasons:

Flexibility: When you’re launching a new web product, it can be hard to predict what pieces of the application will need more resources than you originally gave them, and which will need less. We’re able to change the amount of memory, the number of CPUs and the amount of disk space a server has quickly, easily and remotely.

Availability: Because we use an iSCSI SAN for most all of our storage, we can move virtual servers between pieces of physical hardware. So if we lose one of our physical servers, we can quickly bring up the virtual servers it hosted somewhere else.

Resource utilization: CPUs today are incredibly fast and powerful; far more so than most applications need. Similarly, RAM has become cheap enough that a server with 16 or even 32 gigabytes of RAM is not particularly unusual, or particularly expensive. Running a simple web server on such a system would be a waste of CPU and memory, and therefore a waste of electricity. If you can run several virtual servers on such a system, however, you can get the maximum return on your investment by making sure you’re fully utilizing all of the CPUs and all of the RAM. Which is all tied to…

Cost savings: Colocation is expensive, and electricity certainly isn’t getting any cheaper. Using virtualization means we can get the absolute greatest value out of the rack space and electricity we’re paying for.

As of right now, we’re running about 200 virtual servers on about 25 physical servers. Just a few years ago we would have needed scores of physical servers consuming thousands and thousands of watts of power to do the work we’re able to do in this relatively modest environment. For a startup that would mean a higher burn rate with a shorter runway, and greater stock dilution for the founding stakeholders because of the amount of capital needed to get the work done. If you’re trying to get a tech startup off the ground, you owe it to yourself to see if you can leverage virtualization in your IT architecture. You’d really be crazy not too.

If you managed to read this post without your eyes glazing over, you may be interested in my new Linux infrastructure blog at http://linfrastructure.blogspot.com. I’m keeping notes on my experiences there, in the hopes that what I’ve learned over the past year can benefit others who find themselves in the same boat.

Photo credit: Leonard John Matthews

Third Party Cookies, Evil or Tasty?

May
1

Recently, one of our publishers reached out to us to get our take on third party cookies. They were considering removing our widget due to the fact that we set a cookie when a browser views our widget. This brings up the debate over security and safety when it comes to third party cookies, and cookies in general. We responded to the post in the comments, but I wanted to elaborate on that a bit more.

Why Lijit uses cookies:

We use cookies to not only track whether someone is a Lijit user (allowing them to login etc), but also to allow us to match up a blog/widget visitor to any searches they may perform through our widget. This helps us to provide valuable metrics to our publishers in the form of stats, which in turn allows publishers to give their readers better content.
Generally,cookies allow us to gather better data about our users. Due to the ever rising pattern of “cookie blocking”, either by browsers, firewalls, security software,or explicitly by users, we have had to find other ways to continue gathering statistics. This means we have to use traffic pattern matching techniques, and logging analysis to get all of the data we need.

Historically, advertisers were the primary “pushers” of third-party cookies. This would allow them to track your viewing behaviors across any properties where their ads appeared. Some people disliked this since there was no real value to the web user, and the advertisers got free data. This was perceived as not only a security and privacy issue , but also pushed the perception of third party cookies into a grey area. These conditions, and the rise of spyware and malware pushed OS, and browser companies to institute tighter control.

In the current world of social media , distributed web services, and widespread widget adoption, the value to the user has changed. There are many services, Lijit included, that offer value to the user during their browsing session, versus just “tracking” them. The key, is that the web user is informed about what sites they visit, and the kind of content they allow in their browser. It is important for companies to disclose how they use the information they collect, and Lijit does this in our privacy policy.

Overall, the message should be about awareness and consumer education. The value of enabling third party cookies can actually be additive to the consumer vs. being a security or privacy concern. Modern browsers allow you to whitelist services you trust, and there are many services on the web (such as Lijit) that deserve that trust.

Lijit Dev Talk 101

Apr
17

Derek Greentree, one of our Senior Software Engineers, has agreed to share his thoughts on what he does here at Lijit. You can read the first part of this series here.

When you first start optimizing a website, there are many questions to answer. What are you optimizing for? Raw speed? Maximum concurrency? How will you determine the bottlenecks in your way? How will you test the optimizations you make to be sure they were real and not placebos? How will your optimizations scale? And finally, how much time are you willing to put in for what amount of return? A famous quote among software development people, coined by Donald Knuth, states that premature optimization is the root of all evil. Web applications are no exception.

First, you’ll need an environment where you can test your changes. You should have a test environment anyway, but this environment will exist mostly for benchmarking, which means you’ll be putting load on it, so you may want to have a separate environment just for load testing. Since you’re going to be deploying your optimizations eventually to your production servers, this environment needs to mimic your production architecture as closely as possible. Don’t load test on a machine that serves its content from a network share, when your production servers serve that content from local disk.

When you have an environment ready, your next step is to benchmark what you currently have. Many tools exist for this, but at Lijit, for website benchmarking, we use Siege. After a siege run, you’ll have data such as how many requests completed, the distribution of HTTP status codes, and average number of transactions per second. Be statistically smart about this - run several tests and average them together to get a baseline reading. Try to max out your environment; knowing when your overhead runs out in your current architecture is powerful knowledge. And, if you can max out your environment during the test, you can be sure the limits you’re hitting aren’t limits of the load testers, but the thing being tested. Try to hit your site in a browser during the load test so you can see what the user experience will be like in a high load situation. Anticipate problems before they occur.

Once you have these baseline numbers, you can test optimizations by running the same sequence of load tests and comparing the results. You can see if the various status codes coming back from the test changed. For example, if you deploy an optimization and start getting a lot of HTTP 500 instead of HTTP 200, you broke something.

Getting a good benchmarking environment setup is the essential first step to trying to optimize your site. In the next article, we’ll talk about some basic things to look at when approaching optimization, including whether you need to do it at all, and some specific examples of problems and solutions we’ve encountered at Lijit.

Adding a Lijit user to your browser search engine

Feb
26

Modern browsers have added the ability to keep a list of frequently used search engines available for quick access at the top of your browser. If you’re using Firefox or Internet Explorer 7 you can add any Lijit search engine to that list. Simply go to a blog that has a Lijit Search Wijit on it, that user’s profile page or their search result page. You’ll notice an extra item in the list of search engines allowing you to add it to your list. Now anytime you need to do a search on that person, you can easily select their name and do that search. Below are screenshots for both IE7 and Firefox:

Internet Explorer 7:
Add a search engine in IE7

Firefox:
Add a search engine in Firefox

Blogroll crawling

Feb
26

Last week we released a new set of enhancements to Lijit, primarily focused on the signup process.

But there’s one feature that I’m particularly excited about but that isn’t immediately obvious: automatic blogroll crawling. What’s that you ask?

If you have a blog, it probably has a section where the you list other noteworthy blogs. These are blogs you read and basically find attention-worthy. Lijit can now automatically find this part of your blog and add all of those blogs to your Lijit search. And the Lijit server will check on your blog every day or so to see if you’ve made any changes–no need to configure things here every time you make a change.

Blogrolls form a huge implicit trust network, and no other service has really exploited them until now. Adding them to your network is just the first step, I’m also working on some blog authority algorithms that rely heavily on blogrolls. I’ll post some preliminary results soon.

So get those blogs rolling!

Our experience with Google Custom Search

Jan
7

Google Custom Search is cool. And it’s a natural step for Google to distribute their search technology (dare I say “longtail-ize”?) in the same way that they distributed their ad technology when they expanded Adwords (on their domain) into Adsense (on anyone’s page). So it was a natural fit for us to use it as the backend for our Lijit Personal Network Search, and we’ve been happy with the initial results.But it’s not perfect.

Ethan Zuckerman wrote about problems with Co-op search back in October, and Google quickly responded with a fix. However, we’re seeing a lot of Ethan’s problems here at Lijit as well. The problem is that if your desired search results would not normally fall in the top 1000 results of a normal Google search, they don’t get included in your results. For example, Brad Feld has written a ton about Microsoft in his blog at feld.com as can be seen in a typical site: Google search. However, when you use a Co-op search which includes feld.com/*, you don’t get any results fromthat domain. The problem seems to be that feld.com doesn’t make it into the top 1000 results for a normal search for ‘”microsoft”. In a similar vain, if you search me for “sex” you’ll get stuff from BoingBoing (a high PageRank site) but not my post “Attention is Meme Sex” like you might expect.*

So it seems that the fixes implemented for Ethan aren’t working across the board. But I am encouraged by Google’s response to Ethan and hope that they will eventually be able to solve our issues.