Array Intersection Bake-off
One of those moments where an interview question turns into a research project, or is it really a bake off? The simple problem is demonstrate an algorithm to intersect two lists of numbers, fundamentally it’s a question about using modern interpreted languages and their associative array bits to make a simple intersection routine. However many languages support many different ways to do things. I’ve put together a test of Python vs. Java vs. Ruby vs. Perl vs. PHP and got a few interesting benchmarks.
Short Version of the results: java wins with python comming in second.
But, things are not so simple for instance there is a simple base approach (python example)
1def isect(a, b) :
2 o = []
3 h = {}
4 for i in a :
5 h[i] = True
6 for i in b :
7 if h.get(i, False) :
8 o.append(i)
9 return o
or a more language unique version
1def isect(a, b) :
2 h = dict(iter([(i, True) for i in a]))
3 return [i for i in b if h.get(i,False)]
or just using features of the language
1def isect(a, b) :
2 return list(set(a) & set(b))</pre>
All of these return the same result, but clearly we can think of this as a progression of an alorigthm. Version 1 focusing on textbook to code, version 2 says there’s some cool language features and version 3 says, dude like don’t you really know what your doing?
The upshot was that I sat down and wrote 12 different versions of this just to see what the language differences were.
Language | Version | Alg. Time | Run Time |
---|---|---|---|
java | 1 | 3.076 | 5.888 |
perl | 1 | 3.622 | 6.691 |
php | 1 | 3.901 | 22.878 |
python | 1 | 1.740 | 4.526 |
ruby | 1 | 3.517 | 9.853 |
java | 2 | 0.817 | 4.362 |
python | 2 | 3.550 | 6.441 |
ruby | 2 | 7.984 | 14.281 |
python | 3 | 1.500 | 4.356 |
ruby | 3 | 3.809 | 10.184 |
c++ | 1 | 0.830 | 1.209 |
python | 4 | 1.040 | |
php | 2 | 2.000 | |
php | 3 | 10.064 | |
java | 3 | 3992.045 |
Of course you’re probably wondering about language versions:
- java 1.6.0
- perl 5.8.8
- php 5.2.9
- python 2.5.4
- ruby 1.8.6
1// Java Version # 1
2 private static int[] isect(int a[], int b[]) {
3 int l[] = new int[a.length];
4 TreeMap h = new TreeMap();
5 int idx = 0;
6
7 for (int i = 0; i < a.length; i++) {
8 h.put(new Integer(a[i]), 1);
9 }
10 for (int i = 0; i < b.length; i++) {
11 if (h.containsKey(new Integer(b[i])))
12 l[idx++] = b[i];
13 }
14
15 int o[] = new int[idx];
16 for (int i = 0; i < idx; i++) {
17 o[i] = l[i];
18 }
19
20 return o;
21 }
1# Perl Version 1
2sub isect {
3 my($a, $b) = @_;
4 my(@o, %h);
5
6 for my $i (@$a) {
7 $h{$i} = 1;
8 }
9 for my $i (@$b) {
10 push(@o, $i) if $h{$i};
11 }
12
13 return @o;
14}
1# PHP Version 1
2function isect($a, $b) {
3 $h = array();
4 $o = array();
5
6 foreach ($a as $i) {
7 $h[$i] = true;
8 }
9 foreach ($b as $i) {
10 if ($h[$i]) {
11 array_push($o, $i);
12 }
13 }
14
15 return $o;
16}
1# Python Version 1
2def isect(a, b) :
3 o = []
4 h = {}
5 for i in a :
6 h[i] = True
7 for i in b :
8 if h.get(i, False) :
9 o.append(i)
10 return o
1# Ruby Version 1
2def isect(a, b)
3 return a & b
4end
1// Java Version 2
2 private static Integer[] isect(Integer a[], Integer b[]) {
3 ArrayList<integer> l = new ArrayList</integer><integer>(a.length);
4 HashSet</integer><integer> h = new HashSet</integer><integer>();
5
6 for (int i = 0; i < a.length; i++) {
7 h.add(a[i]);
8 }
9 for (int i = 0; i < b.length; i++) {
10 if (h.contains(b[i]))
11 l.add(b[i]);
12 }
13
14 return (Integer[])l.toArray(new Integer[0]);
15 }
1# Python Version 2
2def isect(a, b) :
3 h = dict(iter([(i, True) for i in a]))
4 return [i for i in b if h.get(i,False)]
1# Ruby Version 2
2def isect(a, b)
3 # Convert array to Set objects and perform intersection
4 a = a.to_set
5 b = b.to_set
6
7 return a & b
8end
1# Python Version 3
2def isect(a, b) :
3 return list(set(a)& set(b))
1# Ruby Version 3
2def isect(a, b)
3 o = Array.new
4 h = Hash.new
5
6 a.each do |i|
7 h[i] = 1
8 end
9 b.each do |i|
10 if h[i]
11 o.push(i)
12 end
13 end
14
15 return o
16end
1# PHP Version 2
2function isect($a, $b) {
3 $b = array_flip($b);
4 $o = array();
5
6 foreach ($a as $i) {
7 if(isset($b[$i])) {
8 $o[] = $i;
9 }
10 }
11
12 return $o;
13}
1# PHP Version 3
2function isect($a, $b) {
3 return array_intersect($a, $b);
4}
1// Java Version 3
2 private static Integer[] isect(Integer a[], Integer b[]) {
3 Set<integer> l = new HashSet</integer><integer>(Arrays.asList(a));
4 l.retainAll(Arrays.asList(b));
5
6 return (Integer[])l.toArray(new Integer[0]);
7 }
1# Python Version 4
2def isect(a, b) :
3 h = set(a)
4 return [i for i in b if i in h]
1// C++ Version 1
2std::list<int>* isect(int len, int a[], int b[]) {
3 __gnu_cxx::hash_set</int><int> h = __gnu_cxx::hash_set</int><int>();
4
5 for (int i = 0; i < len; i++)
6 h.insert(a[i]);
7
8 std::list< int> *l = new std::list< int>();
9
10 for (int i = 0; i < len; i++)
11 if (h.find(b[i]) != h.end())
12 l->push_back(b[i]);
13
14 return l;
15}
Would enjoy peoples thoughts and feedback, or alternate implementations for these and other languages