i'm actually working on my masterthesis, whose topic is "big data technologies for fraud detection".
I want to build a system, which consist of hadoop and sap hana. hadoop has to import, parse and persistent the logfiles. the result of this process is one hugh logfile, which is eventually exported to sap hana appliance.
and here we go.
the format of the logfile look like this: Date, Time, User, IP, Timestamp
And i want to figure out, which user, changed his IP adress in less then 5 seconds. I tried to use a calculation view and to build a semi-cross product by joining the logfile with it self, by joining it by username. But the problem with this join is, that it joins everything twice. (A | B) and (B | A) because they matching the condition (username = username AND ip != ip)
So how could i only get the distinct values of the cross product?