Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topic Extraction #12

Open
1 task done
amiryselim opened this issue Mar 24, 2023 · 1 comment
Open
1 task done

Topic Extraction #12

amiryselim opened this issue Mar 24, 2023 · 1 comment

Comments

@amiryselim
Copy link

Contact Details

amir@tradeblock.us

Language

Python

Category

Data Processing and Enrichment

Description

  • Extracts top 5 one-word and two-word keywords and their respective counts from text
  • These extracted topics can be useful for data aggregation and analysis of text-heavy event properties, such as product reviews, user feedback, etc.

For example, applying this to a real product feedback submission adds this keyword breakdown to your event:

"keywords": {
  "oneWord": {
    "board": 11,
    "folder": 13,
    "manager": 11,
    "permission": 14,
    "permissions": 11
  },
  "twoWords": {
    "board manager": 11,
    "folder permission": 3,
    "manager level": 5,
    "permissions settings": 4,
    "specific folder": 4
  }
}

Code Block

def transformEvent(event, metadata):
    message = event["properties"]["message"]

    punctuation = ['"', "'", "!", "?", ".", "-", ":", ","]
    for mark in punctuation:
        message = message.replace(mark, "")

    stop_words = ["i", "me", "my", "myself", "we", "our", "ours", "ourselves", "you", "your", "yours", "yourself", "yourselves", "he", "him", "his", "himself", "she", "her", "hers", "herself", "it", "its", "itself", "they", "them", "their", "theirs", "themselves", "what", "which", "who", "whom", "this", "that", "these", "those", "am", "is", "are", "was", "were", "be", "been", "being", "have", "has", "had", "having", "do", "does", "did", "doing", "a", "an", "the", "and", "but", "if", "or", "because", "as", "until", "while", "of", "at", "by", "for", "with", "about", "against", "between", "into", "through", "during", "before", "after", "above", "below", "to", "from", "up", "down", "in", "out", "on", "off", "over", "under", "again", "further", "then", "once", "here", "there", "when", "where", "why", "how", "all", "any", "both", "each", "few", "more", "most", "other", "some", "such", "no", "nor", "not", "only", "own", "same", "so", "than", "too", "very", "s", "t", "can", "will", "just", "don", "should", "now"]

    lower_message = message.lower()
    words = lower_message.split()

    unigram_counts = {word: lower_message.count((word)) for word in set(words) if word not in stop_words}
    top_unigrams = dict(sorted(unigram_counts.items(), key=lambda item: item[1], reverse=True)[:5])

    bigrams = set([' '.join(words[x:x+2]) for x in range(len(words) - 1) if all([word not in stop_words for word in words[x:x+2]])])
    bigram_counts = {bigram: lower_message.count((bigram)) for bigram in bigrams}
    top_bigrams = dict(sorted(bigram_counts.items(), key=lambda item: item[1], reverse=True)[:5])

    event["properties"]["keywords"] = {"oneWord": top_unigrams, "twoWords": top_bigrams}

    return event

Input Payload for testing

[
  {
    "anonymousId": "8d872292709c6fbe",
    "channel": "mobile",
    "context": {
      "app": {
        "build": "1",
        "name": "AMTestProject",
        "namespace": "com.rudderstack.android.rudderstack.sampleAndroidApp",
        "version": "1.0"
      },
      "device": {
        "id": "8d872292709c6fbe",
        "manufacturer": "Google",
        "model": "AOSPonIAEmulator",
        "name": "generic_x86_arm",
        "type": "android"
      },
      "library": {
        "name": "com.rudderstack.android.sdk.core",
        "version": "1.0.2"
      },
      "locale": "en-US",
      "network": {
        "carrier": "Android",
        "bluetooth": false,
        "cellular": true,
        "wifi": true
      },
      "os": {
        "name": "Android",
        "version": "9"
      },
      "screen": {
        "density": 420,
        "height": 1794,
        "width": 1080
      },
      "timezone": "Asia/Kolkata",
      "traits": {
        "address": {
          "city": "Kolkata",
          "country": "India",
          "postalcode": "700096",
          "state": "West bengal",
          "street": "Park Street"
        },
        "age": "30",
        "anonymousId": "8d872292709c6fbe",
        "birthday": "2020-05-26",
        "createdat": "18th March 2020",
        "description": "Premium User for 3 years",
        "email": "identify@test.com",
        "firstname": "John",
        "userId": "sample_user_id",
        "lastname": "Sparrow",
        "name": "John Sparrow",
        "id": "sample_user_id",
        "phone": "9112340345",
        "username": "john_sparrow"
      },
      "userAgent": "Dalvik/2.1.0 (Linux; U; Android 9; AOSP on IA Emulator Build/PSR1.180720.117)"
    },
    "event": "Feedback Submitted",
    "integrations": {
      "All": true
    },
    "messageId": "1590431830915-73bed370-5889-436d-9a9e-0c0e0c809d06",
    "properties": {
      "message" : "Once the 'Member Permissions' and/or 'Role Permissions' from the 'Board Manager' level is overrided from a Folder level, there's no way to revert 'Board Manager' to take control over (override) the Folder level modified permissions. Now, it is extremely important to be able to override permissions from the general 'Board Manager' level based on specific Folder permissions settings, but it is also extremely important to be able to override the specific Folder permission settings again by the 'Board Manager' permissions settings. This makes maintainance more difficult since it would require to adjust permissions Folder by Folder just a for quick update on those before returning them back as it was before, instead of simply do it from the 'Board Manager' level do all the required updates in all of the Folders and revert the permissions back as it was before from one single place (Board Manager). This could maybe be a button placed somewhere in the specific Folder settings with an initial disabled state when it is not overriding the Board Manager permissions settings, and automatically enable the state of the button once this Folder overrides the Board Manager settings, so this button text could say something like 'Clear All Specific Folder Permissions', causing that the Board Manager permissions settings take control over that Folder again. Additional to that, please add an option to override this in a more silent/transparent way, just by being able to simply turn off a permission from the Folder or Board Manager level, and once the permission is turned on again, the last one (whether is from the Folder or Board Manager level) would be the one that overrides the other."
    },
    "originalTimestamp": "2020-05-25T18:37:10.917Z",
    "type": "track",
    "userId": "sample_user_id"
  }
]

License

  • I understand, that my code will be licensed under MIT license (copy of license is available in this repo)
@gitcommitshow
Copy link
Collaborator

gitcommitshow commented Mar 25, 2023

Interesting use case. I can imagine that it can be useful in case of customer support teams as well.
If performance were not an issue, this could be further enhanced by creating embeddings and then using them to categorise the feedback/question, eventually helping in routing the feedback to the right team in real-time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants