Coding contests >> Fast Sorting and Searching (C/C++)
Posted by seunosewa on 16:46:00 03-31-2002
You are given two arrays of 32-bit integers (I assume Linux/Win32/MacOS is your OS) and you are required to find:
(a) the sum of all the elements in the first array that are also in the second array.
(b) the number of elements in the first array that are also in the second array.

The first array has about 2-4 million elements and the second one has about one-quarter to one-sixth the number of elements in the first one.

RULES:
1. You are required to implement the following function in standard C/C++:
void perform (int * a, int * b, int na, int nb, int * sum, int * num)
where:
-a points to the first array of integers.
-b points to the second array of (random) integers.
-na is the length of a.
-nb is the length of b.
-sum points to the location where the sum of integers in the first array also contained in the 2nd one are to be stored.
-num points to the number of integers in the first array also in the second one.
2. The entry with the fastest, correct implementation wins.
3. All contestants will submit the .c or .cpp file containing the implementation and your own test program to me.
4. My testing will take place on an AMD Athlon k7-700 running Linux 2.4 using na=4000000 and nb=750000.
5. In order to encourage original thinking, only the following header files are allowed: stdlib.h, stdio.h, iostream.

Please send your original code to me.

Thanks.

[ This Message was edited by: seunosewa on 2002-04-01 03:50 ]
Posted by KaGez on 03:02:00 04-01-2002
ok, you win, so I don't start at all. Have fun coding
[addsig]
Posted by sacah on 13:30:00 04-02-2002
There are a lot of contests laterly that I cant understand, though I dislike C/C++, but still, aint anyone interested in contests anymore??
Posted by MoX on 15:04:00 04-02-2002
Some contest like this one here really appeal my interest and I would surely be coding on it right now, if I my time wasn't consumed by my own projects...

And to compete with a computer-science student in such a task (I suppose there has been a scientific approach to this topic and therefor the best algo is already known(and a computer-science student knows it too, therefor)) would consume lots of time and not lead to much...Nevertheless I would really like to know the solution of the prob. Perhaps you can send it to me when the contest is over, seunosewa? [addsig]
Posted by seunosewa on 19:56:00 04-02-2002
Well, I'm not a computer science student, though perhaps I should be (I can sometimes be caught reading arcane .ps / .pdf files downloaded from citeseer.nj.nec.com [hint!] ).

Possible approaches:
(1) For each element in a , scan b to find if the element is contained in b
(very time-consuming).

(2) Sort b. For each element in a, use a "binary-search" algorithm to discover if that element is also contained in b.

(3) Sort both a and b and calculate the "intersection" of the two "sorted sets".

(4) There's a third method I'm currently thinking of that will combine the searching and sorting... I'm not sure it will be much faster than method 2/3, but it effectively involves sorting a and b simultaneously.
Posted by KaGez on 03:07:00 04-03-2002
it's the same for me as for moxx... I'd like to join all these contests lately, but I've got tons (i mean it) of other things to do, some are my projects, some are numexp (http://numexp.sf.net) and gimp (http://gimp.org).
Anyways, if there was a contest that can be done in very little time, I'd _love_ to join that one
[addsig]
Posted by seunosewa on 04:52:00 04-03-2002
This contest actually should not take a long time. Its just that I also have another issue on my programmers' mind:
http://www.progsnet.org/modules.php?op=modload&name=Forum&file=viewtopic&topic=18&forum=13
I'll set a time to work on this...
Posted by MoX on 08:08:00 04-03-2002
I would know a fast way if one of the arrays was sorted (smallest number first, greatest number last)... [addsig]
Posted by seunosewa on 15:54:00 04-03-2002
Am I allowed to modify the details of the contest? The program I'm asking people to write will be of limited usefulness - I'm thinking of extending it to include the storage of the discovered numbers into an array rather than just summing / counting them.

This will make the program a bit more useful in real life - especially when the same system is used for text.

Cheers!
Posted by seunosewa on 04:57:00 04-11-2002
CONTEST CLOSED
Reason: No interest shown by anybody
Posted by robost86 on 09:20:00 04-11-2002
This happends to almost all contests, we might think about something to do about it.
Posted by inhahe on 20:23:00 09-20-2003
Quote:
On 2002-04-02 19:56, seunosewa wrote:
Well, I'm not a computer science student, though perhaps I should be (I can sometimes be caught reading arcane .ps / .pdf files downloaded from citeseer.nj.nec.com [hint!] ).

Possible approaches:
(1) For each element in a , scan b to find if the element is contained in b
(very time-consuming).

(2) Sort b. For each element in a, use a "binary-search" algorithm to discover if that element is also contained in b.

(3) Sort both a and b and calculate the "intersection" of the two "sorted sets".

(4) There's a third method I'm currently
thinking of that will combine the searching and sorting... I'm not sure it will be much faster than method 2/3, but it effectively involves sorting a and b simultaneously.



(3) what trick does one use to find the intersection of two sorted sets?

(4) do you mean sort a and b into one list, with each number having an attribute that says what list it's from, and take note when two numbers are compared and found equal to eachother that aren't from the same list?
actually i guess the last part would be better checked in an iteration through the list after the sort. i can't figure out if this way could possibly be faster than sorting the second list and then searching it for each item in the first list.

speaking of searching, it would be very efficient if you approximate where that value should be in the list and start there, and search up or down depending on whether the number there is less than or greater than the number you're searching for (and stopping when it gets greater than/less than).

although he didn't say that the numbers would be evenly dispersed over the whole 0 to 2^32-1 range, so maybe you'd have to build an index of the positions of the first number greater than or equal to, say, every 65,536th number from 0 to 2^32-1 (so start at list2[index[num -->--> 16]])

and/or start at list2[x], then jump to list2[x+(lastx-x)/(list2[lastx]-list2[x]*(num-list2[x])], etc. until you find the match or where the number /would/ be. i guess the first lastx should be index[(num-->-->16)+1] or index[(num-->-->16)-1].

(2) how does a binary search algorithm work?