Why big companies choose Python?

In this article I will write about what can be behind that many big companies (like Google, Dropbox, NASA just to mention a few) use Python as a programming language of their choice.

If I would have to answer the question in one sentence I would say: Python is powerful. But let’s look at the reasons.

Python is easy

Python is an easy to learn programming language. It has a very steep learning curve which means you have to invest less time to get things going than with some other programming languages (like C/C++ or Java for example).

That’s because the language was designed to be easy to read. The language is broadly used by scientists who are not engineers or computer scientist. You might have heard about NumPy and SciPy, two advanced scientific Python libraries. They were designed and developed by scientists who are experts of the domain and built the tool to get their work done. And they could do this because Python is an easy-to-learn language.

Python is not a programming language

Python is not strictly a programming language it is a description of a programming language. This makes different implementations available which are implemented with different programming languages.

The most common implementations is implemented in C and is called CPython. This is the one you can download from python.org. This C implementation makes it easy to write wrappers around already existing C code and use them in your Python applications.

However there are some other implementations like Jython which is written in Java and gives you integration with the Java Virtual Machine. Or IronPython which is based on .NET and C#. Or PyObjc to work with Objectice-C. And last but not least PyJS for JavaScript.

As you can see there are a bunch of implementations which give you opportunities in usage.

Python is fast

Some might ask: “How can it be? Interpreted languages are always slow, aren’t they?” Well, they might but Python is fast. That’s because there was a lot of work done to improve the performance of Python. For example if you compare some parallel running code on Python 2 and Python 3 you can see that where the execution time in Python 2 grew exponentially Python 3 stayed almost the same.

Some say that CPython is slow when it comes to parallel execution because of the Global Interpreter Lock. That’s true for CPU-bound tasks but Google and Dropbox (where you deal with a lot of file I/O) know that I/O operations perform very good even with CPython.

And because Python has different implementations not only the most common one (CPython) they can be made very fast. For example the PyPy project puts efforts into speed up their Python implementation.

Or the Numba project which makes your already written code faster with introducing annotations on the codebase.

Python is efficient

Efficiency is today’s big thing to achieve. You are working with a lot of information (called Big Data) as a big company. Now handling this amount of data requires efficient handling. Iterative processing of data requires lists. And when lists grow their memory consumption grows too…

That is the same for Python however with Python you have generators (statements and functions) which load data lazily. This means they are only loaded when they are needed and this can reduce memory and time.

Demonstrating this a list comprehension with 100000000 (ten million) comparisons take on my machine around 8 seconds. This means that the list is generated when the comprehension is encountered. Using a generator expression instead this expression becomes 100000 times (yes, one hundred thousand times) faster because the list is not created. And this was just a basic example it can be even worse with real applications.

from time import time

start = time()
lc = [x for x in range(10000) for y in range(10000) if x == y]
print('List comprehension took {} seconds'.format(time()-start))

start = time()
ge = (x for x in range(10000) for y in range(10000) if x == y)
print('Generator expression took {} seconds'.format(time()-start))

start = time()
le = filter(lambda x:x in range(10000), range(10000))
print('Lambda expression took {} seconds'.format(time()-start))

The above code compares the creation of lists with different methods: list comprehension, generator expressions and lambdas. The results are the following:

List comprehension took 5.69756889343 seconds
Generator expression took 0.0 seconds
Lambda expression took 1.09210991859 seconds
List comprehension took 4.053405284881592 seconds
Generator expression took 0.0 seconds
Lambda expression took 0.0 seconds

Generator expressions are evaluated when using so it takes “no time” to create the variable holding the expression. No time means in this case around 0,0005 seconds. So it is no time at all. In Python 3 the lambda-version is improved too so it has the same runtime than the generator expression.

Naturally this is only for creating the generator or the lambda expression. When we use them (for example calculating their length too) then the time of the generator expression closes up to the list-comprehension — but lambda expressions stay fast.

from time import time

start = time()
lc = [x for x in range(10000) for y in range(10000) if x == y]
size = len(list(lc))
print('List comprehension took {} seconds with size of {}'.format(time()-start, size))

start = time()
ge = (x for x in range(10000) for y in range(10000) if x == y)
size = len(list(ge))
print('Generator expression took {} seconds with size of {}'.format(time()-start, size))

start = time()
le = filter(lambda x:x in range(10000), range(10000))
size = len(list(le))
print('Lambda expression took {} seconds with size of {}'.format(time()-start, size))
List comprehension took 5.38253808022 seconds with size of 10000
Generator expression took 2.6872689724 seconds with size of 10000
Lambda expression took 1.12811279297 seconds with size of 10000
List comprehension took 3.830382823944092 seconds with size of 10000
Generator expression took 3.8543853759765625 seconds with size of 10000
Lambda expression took 0.004000425338745117 seconds with size of 10000

With Python you can do everything

Python is used broadly among developers: for ETL, gaming, web development, system automation and testing.

Disney uses Python to help power their creative process. Mozilla uses Python to release a lot of open source packages built in Python.

This means if you want to do something with Python chances are big that someone already has so you do not have to start from scratch.

The Dropbox example

So is Python really used by big companies? Yes and not just for small scripts. Foe example Dropbox started with Python and stayed with it and when they realized they server 40 million users with their Python codebase. The reason for using Python was that they could write functionality in 100 lines of code which would have required 1000 lines with another language (C or C++).

Conclusion

You now may have the idea why big companies have chosen Python as their programming language. So if you want to learn it go ahead and you will see it is very easy.

If you think you need some support and a good book to get started with Python 3.5 I can suggest you this one.

Advertisements

2 thoughts on “Why big companies choose Python?

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s