Skip to Content
0

Twitter data extraction using SAP Data Services

Oct 23, 2016 at 02:09 PM

105

avatar image

Hi ,

I am planning to extract the twitter data using user defined transform. I am using the blue prints job. But I need to extract the other additional fields like followers_count, retweet_count,friend_count etc.

I have modified the code by adding those fields in the python script as shown below.

# tweet information
for innerIndex in (range(len(search_results))):
sResult = search_results[innerIndex]

id = sResult[u'id']
if str(id) <= self.max_id:
raise SearchError('Stopping the search for this term due to repeated data being found.')

if min_id is None:
min_id = long(id)
else:
if min_id > long(id):
min_id = long(id)

next_max_id = long(min_id) - 1


created_at = sResult[u'created_at']
if created_at is None:
continue

created_at = time.strftime('%a, %d %b %Y %H:%M:%S +0000', time.strptime(sResult[u'created_at'], '%a %b %d %H:%M:%S +0000 %Y'))
user = sResult[u'user']
metadata = sResult[u'metadata']

from_user = user[u'name']
if from_user is None:
continue

from_user_id = user[u'id']
from_user_name = user[u'screen_name']
#friend_count = user[u'friend_count']
favourite_count = sResult[u'favourite_count']
#followers_count = user[u'followers_count']
retweet_count = sResult[u'retweet_count']

if sResult[u'geo'] is None:
coordinates_lat = ''
coordinates_long = ''
coordinates_type = ''
else:
geo_result = sResult['geo']
coordinates_lat = geo_result['coordinates'][0]
coordinates_long = geo_result['coordinates'][1]
coordinates_type = geo_result[u'type']

id_str = sResult[u'id_str']
iso_language_code = metadata[u'iso_language_code']

if 'place' not in sResult or sResult[u'place'] is None:
place = None
else:
place = sResult['place'][u'full_name']

text = sResult[u'text']
text = text.replace('\n', '')

if len(text) > 300:
continue

try:
DSRecord = DataManager.NewDataRecord(1)
DSRecord.SetField(u'MAX_ID', unicode(max_id))
DSRecord.SetField(u'CREATED_AT', unicode(created_at))
DSRecord.SetField(u'FROM_USER', unicode(from_user_name))
DSRecord.SetField(u'FROM_USER_ID', unicode(from_user_id))
DSRecord.SetField(u'FROM_USER_ACCOUNT', unicode(from_user))
DSRecord.SetField(u'COORDINATES_LAT', unicode(coordinates_lat))
DSRecord.SetField(u'COORDINATES_LONG', unicode(coordinates_long))
#DSRecord.SetField(u'FRIEND_COUNT ', unicode(friend_count))
DSRecord.SetField(u'FAVOURITE_COUNT ', unicode(favourite_count))
#DSRecord.SetField(u'FOLLOWERS_COUNT ', unicode(followers_count))
DSRecord.SetField(u'RETWEET_COUNT ', unicode(retweet_count))
DSRecord.SetField(u'ID_STR', unicode(id_str))
DSRecord.SetField(u'LANGUAGE', unicode(iso_language_code))
DSRecord.SetField(u'TEXT', unicode(text) if len(text) <= 2000 else unicode(text[:2000]))
DSRecord.SetField(u'LOCATION', unicode(place))
DSRecord.SetField(u'SEARCH_TERM', unicode(self.term))
DSRecord.SetField(u'CHANNEL', unicode('t'))
DSRecord.SetField(u'PROXY', unicode(self.proxy))

Collection.AddRecord(DSRecord)
del DSRecord

But while executing the job, I am getting the below error.

6844 7828 DQX-058306 10/23/2016 5:45:00 PM |Sub data flow TdpBlueprintEn_Twitter_Search_V1_1_2|Transform Search_Twitter
6844 7828 DQX-058306 10/23/2016 5:45:00 PM Transform <Get_Search_Tasks>: Traceback (most recent call last):
6844 7828 DQX-058306 10/23/2016 5:45:00 PM File "EXPRESSION", line 402, in <module>
6844 7828 DQX-058306 10/23/2016 5:45:00 PM File "EXPRESSION", line 286, in search
6844 7828 DQX-058306 10/23/2016 5:45:00 PM KeyError: u'favourite_count'.
6844 7828 DQX-058306 10/23/2016 5:45:00 PM |Sub data flow TdpBlueprintEn_Twitter_Search_V1_1_2|Transform Search_Twitter
6844 7828 DQX-058306 10/23/2016 5:45:00 PM Transform <Search_Twitter>: <UDT0004>: Error executing the expression.
6844 7828 DQX-058302 10/23/2016 5:45:00 PM |Sub data flow TdpBlueprintEn_Twitter_Search_V1_1_2|Transform Search_Twitter
6844 7828 DQX-058302 10/23/2016 5:45:00 PM Transform <Search_Twitter>: DLL <udt_transformu.dll> runtime function <ProcessCollection> failed with error <3>. More detailed
6844 7828 DQX-058302 10/23/2016 5:45:00 PM information may be obtained from previous errors.
7792 6244 DFC-250038 10/23/2016 5:45:17 PM |Dataflow TdpBlueprintEn_Twitter_Search_V1
7792 6244 DFC-250038 10/23/2016 5:45:17 PM Sub data flow <TdpBlueprintEn_Twitter_Search_V1_1_2> terminated due to error <58302>.

Please help me to solve this issue.

Thanks & Regards,

Ramana.

10 |10000 characters needed characters left characters exceeded
* Please Login or Register to Answer, Follow or Comment.

0 Answers