-
Notifications
You must be signed in to change notification settings - Fork 739
Improvement/stricter serp api enrichment filter #2670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe pull request introduces several new asynchronous functions and modifications related to credit management for member enrichment sources. Key changes include the addition of functions to check and manage credit availability using a Redis cache. The Changes
Possibly related PRs
Suggested labels
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Outside diff range and nitpick comments (10)
services/apps/premium/members_enrichment_worker/src/sources/serp/types.ts (2)
10-26
: Mark sensitive fields with a comment.The interface structure is well-defined and comprehensive. However, consider marking sensitive fields to ensure proper handling throughout the codebase.
export interface ISerpApiAccountUsageData { account_id: string + /** @sensitive */ api_key: string + /** @sensitive */ account_email: string account_status: string plan_id: string
10-26
: Add JSDoc comments to document the interface purpose and key fields.Consider adding documentation to improve code maintainability and help other developers understand the purpose of this interface and its critical fields.
+/** + * Represents SERP API account usage data and limits. + * Used for managing API credits and rate limiting. + */ export interface ISerpApiAccountUsageData { account_id: string api_key: string account_email: string account_status: string plan_id: string plan_name: string plan_monthly_price: number + /** Maximum number of searches allowed per month under current plan */ searches_per_month: number + /** Remaining searches available in the current plan */ plan_searches_left: number + /** Additional credits beyond the plan limit */ extra_credits: number + /** Total remaining searches (plan + extra credits) */ total_searches_left: number + /** Number of searches used in the current month */ this_month_usage: number + /** Number of searches performed in the current hour */ this_hour_searches: number + /** Number of searches performed in the previous hour */ last_hour_searches: number + /** Maximum number of searches allowed per hour */ account_rate_limit_per_hour: number }services/apps/premium/members_enrichment_worker/src/types.ts (2)
40-42
: LGTM! Consider enhancing the documentation.The method signature and purpose are well-defined. However, consider adding documentation about:
- Error handling behavior
- Whether the implementation should retry on failure
- What happens when the credit check service is unavailable
41-41
: Consider extracting the cache duration to a constant.The 60-second cache duration is hardcoded in the comment. Consider extracting this value to a named constant to make it configurable and maintain it in a single place.
+// Duration in seconds for caching credit check results +export const CREDIT_CHECK_CACHE_DURATION = 60;services/apps/premium/members_enrichment_worker/src/sources/serp/service.ts (1)
Line range hint
21-64
: Consider implementing a more flexible enrichment strategy.The current changes make the enrichment more restrictive through two mechanisms:
- A high fixed activity threshold (500)
- Credit availability checks
Consider these architectural improvements:
- Make the activity threshold configurable through environment variables
- Implement a priority queue for enrichment requests based on:
- Member activity level
- Last enrichment attempt
- Available credits
- Add metrics/monitoring for:
- Credit usage patterns
- Enrichment success rates
- API response times
This would provide more flexibility and visibility into the enrichment process.
services/libs/redis/src/cache.ts (1)
46-50
: LGTM! The implementation is correct and consistent.The new
exists
method follows the established patterns in the class:
- Uses the
prefixer
for key consistency- Correctly interprets Redis EXISTS command response (1 = exists, 0 = doesn't exist)
- Maintains the async/await pattern used throughout the class
Consider adding error handling consistent with other methods:
async exists(key: string): Promise<boolean> { + try { const actualKey = this.prefixer(key) const value = await this.client.exists(actualKey) return value === 1 + } catch (error) { + this.log.error('Failed to check key existence', { error, key }) + throw error + } }Consider adding JSDoc documentation for better API visibility:
+/** + * Checks if a key exists in the cache + * @param key - The key to check + * @returns Promise resolving to true if the key exists, false otherwise + */ async exists(key: string): Promise<boolean> {services/apps/premium/members_enrichment_worker/src/sources/clearbit/service.ts (1)
30-31
: Consider making the activity threshold configurable.The hardcoded activity threshold of 10 should be moved to a configuration file or environment variable for better maintainability and flexibility across different environments.
- public enrichMembersWithActivityMoreThan = 10 + public enrichMembersWithActivityMoreThan = Number(process.env.CROWD_ENRICHMENT_MIN_ACTIVITY_THRESHOLD ?? 10)services/apps/premium/members_enrichment_worker/src/activities/enrichment.ts (3)
56-66
: Simplify thesetHasRemainingCredits
FunctionThe current implementation uses a conditional statement to set the cache value to
'true'
or'false'
. This can be simplified by directly converting the booleanhasCredits
to a string.Apply this refactor to streamline the function:
export async function setHasRemainingCredits( source: MemberEnrichmentSource, hasCredits: boolean, ): Promise<void> { const redisCache = new RedisCache(`enrichment-${source}`, svc.redis, svc.log) - if (hasCredits) { - await redisCache.set('hasRemainingCredits', 'true', 60) - } else { - await redisCache.set('hasRemainingCredits', 'false', 60) - } + await redisCache.set('hasRemainingCredits', hasCredits.toString(), 60) }
73-76
: Eliminate Redundant Existence CheckThe
hasRemainingCreditsExists
function separately checks for the existence of the key, butgetHasRemainingCredits
can inherently handle missing keys.Consider removing
hasRemainingCreditsExists
and modifyinghasRemainingCredits
to rely solely on the result ofgetHasRemainingCredits
:- if (await hasRemainingCreditsExists(source)) { + const cachedValue = await getHasRemainingCredits(source) + if (cachedValue !== null) { return cachedValue }
85-87
: Consider Making Cache Expiration Time ConfigurableThe expiration time for the cache is hardcoded to 60 seconds. Depending on how frequently the credit availability changes, a different duration might be more appropriate.
Introduce a configurable constant or parameter for the cache TTL:
+ const CACHE_TTL_SECONDS = 60; // Adjust as needed export async function setHasRemainingCredits( source: MemberEnrichmentSource, hasCredits: boolean, ): Promise<void> { // ... - await redisCache.set('hasRemainingCredits', hasCredits.toString(), 60) + await redisCache.set('hasRemainingCredits', hasCredits.toString(), CACHE_TTL_SECONDS) }This enhances maintainability and allows easy adjustments in the future.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (7)
services/apps/premium/members_enrichment_worker/src/activities/enrichment.ts
(3 hunks)services/apps/premium/members_enrichment_worker/src/sources/clearbit/service.ts
(2 hunks)services/apps/premium/members_enrichment_worker/src/sources/progai/service.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/sources/serp/service.ts
(2 hunks)services/apps/premium/members_enrichment_worker/src/sources/serp/types.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/types.ts
(1 hunks)services/libs/redis/src/cache.ts
(1 hunks)
🔇 Additional comments (8)
services/apps/premium/members_enrichment_worker/src/sources/serp/service.ts (2)
12-16
: LGTM: Import changes are well-organized.
The new type import is properly grouped with related SERP API types.
21-21
: Verify the impact of increased activity threshold.
The threshold increase from 10 to 500 activities is substantial and could significantly affect member enrichment patterns. Let's verify the impact:
services/libs/redis/src/cache.ts (1)
46-50
: Verify the integration with credit management system.
Let's confirm how this new method is being used in the credit management system.
services/apps/premium/members_enrichment_worker/src/sources/clearbit/service.ts (1)
32-32
: LGTM! Enhanced member filtering criteria.
The SQL condition effectively combines activity threshold with email verification, implementing stricter filtering as intended in the PR objectives.
services/apps/premium/members_enrichment_worker/src/sources/progai/service.ts (1)
127-129
:
Implementation needs actual credit checking logic.
The current implementation of hasRemainingCredits
always returns true
, which effectively bypasses any credit management. This doesn't align with the PR's objective of implementing stricter filtering.
Consider implementing proper credit checking logic:
- Integrate with the Redis cache for credit tracking
- Add error handling for API/cache failures
- Include unit tests for various credit scenarios
Let's verify if other enrichment services have proper implementations:
services/apps/premium/members_enrichment_worker/src/activities/enrichment.ts (3)
7-7
: Importing RedisCache for Caching Functionality
The addition of RedisCache
import is appropriate for implementing caching mechanisms in the subsequent functions.
31-31
: Ensuring Credit Availability Before Data Retrieval
The updated condition in getEnrichmentData
now checks (await hasRemainingCredits(source))
before fetching data. This ensures that enrichment data is only retrieved when there are sufficient credits, optimizing resource usage.
85-87
: Verify the Appropriateness of Cache TTL
The 60-second cache expiration might not align with the actual frequency of credit updates from the enrichment services.
Consider confirming the update interval of credit information from the enrichment services. If the credits update less frequently, extending the cache TTL can reduce unnecessary service calls.
Would you like assistance in determining an optimal cache duration based on the service's update frequency?
// does the source have credits to enrich members, if returned false the source will be skipped | ||
// response will be saved to redis for 60 seconds and will be used for subsequent calls | ||
hasRemainingCredits(): Promise<boolean> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Implementation check reveals potential issues with credit validation
The verification shows that:
- SERP API service has a proper implementation checking the API account status
- Both Clearbit and ProgAI services have placeholder implementations that always return
true
- This could lead to unnecessary API calls even when credits are exhausted
• services/apps/premium/members_enrichment_worker/src/sources/clearbit/service.ts
: Implement proper credit checking
• services/apps/premium/members_enrichment_worker/src/sources/progai/service.ts
: Implement proper credit checking
🔗 Analysis chain
Verify implementation across all enrichment services.
Let's ensure all enrichment services properly implement the new hasRemainingCredits
method.
Let me try a different approach to search for the implementations.
Let me check specifically for the hasRemainingCredits
implementation in these identified service files.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check implementation of hasRemainingCredits across enrichment services
# Test: Search for class implementations of enrichment services
echo "Searching for enrichment service implementations..."
ast-grep --pattern 'class $_ implements IEnrichmentService {
$$$
}'
# Test: Verify hasRemainingCredits implementation in each service
echo "Verifying hasRemainingCredits implementation..."
ast-grep --pattern 'async hasRemainingCredits(): Promise<boolean> {
$$$
}'
Length of output: 368
Script:
#!/bin/bash
# Search for files containing IEnrichmentService
echo "Files referencing IEnrichmentService:"
rg "IEnrichmentService" -l
# Search for class implementations and hasRemainingCredits
echo -e "\nImplementations and method details:"
rg "class.*implements.*IEnrichmentService" -A 20
Length of output: 8082
Script:
#!/bin/bash
# Check for hasRemainingCredits implementation in each service file
echo "Checking SERP implementation:"
rg "hasRemainingCredits" services/apps/premium/members_enrichment_worker/src/sources/serp/service.ts -A 5
echo -e "\nChecking Clearbit implementation:"
rg "hasRemainingCredits" services/apps/premium/members_enrichment_worker/src/sources/clearbit/service.ts -A 5
echo -e "\nChecking ProgAI implementation:"
rg "hasRemainingCredits" services/apps/premium/members_enrichment_worker/src/sources/progai/service.ts -A 5
Length of output: 1195
async hasRemainingCredits(): Promise<boolean> { | ||
const config = { | ||
method: 'get', | ||
url: `https://serpapi.com/account`, | ||
params: { | ||
api_key: process.env['CROWD_ENRICHMENT_SERP_API_KEY'], | ||
}, | ||
} | ||
|
||
const response: ISerpApiAccountUsageData = (await axios(config)).data | ||
|
||
return response.total_searches_left > 0 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling and caching for credit checks.
The current implementation has several potential issues:
- No error handling for API failures
- No caching of credit checks (could hit rate limits)
- Direct environment variable access without validation
- No rate limiting for credit checks
Consider implementing this improved version:
+ private static CACHE_KEY = 'serp-api-credits';
+ private static CACHE_TTL = 300; // 5 minutes
async hasRemainingCredits(): Promise<boolean> {
+ try {
+ // Check if API key is configured
+ const apiKey = process.env['CROWD_ENRICHMENT_SERP_API_KEY'];
+ if (!apiKey) {
+ this.log.error('SERP API key not configured');
+ return false;
+ }
+
+ // TODO: Add Redis caching here
+ // const cached = await redis.get(EnrichmentServiceSerpApi.CACHE_KEY);
+ // if (cached !== null) {
+ // return cached === 'true';
+ // }
+
const config = {
method: 'get',
url: `https://serpapi.com/account`,
params: {
- api_key: process.env['CROWD_ENRICHMENT_SERP_API_KEY'],
+ api_key: apiKey,
},
}
const response: ISerpApiAccountUsageData = (await axios(config)).data
+ const hasCredits = response.total_searches_left > 0;
+
+ // TODO: Cache the result
+ // await redis.set(
+ // EnrichmentServiceSerpApi.CACHE_KEY,
+ // hasCredits.toString(),
+ // 'EX',
+ // EnrichmentServiceSerpApi.CACHE_TTL
+ // );
+
- return response.total_searches_left > 0
+ return hasCredits;
+ } catch (error) {
+ this.log.error('Error checking SERP API credits:', error);
+ return false;
+ }
}
Would you like me to help implement the Redis caching layer? I can provide the complete implementation or create a GitHub issue to track this enhancement.
Committable suggestion skipped: line range outside the PR's diff.
async hasRemainingCredits(): Promise<boolean> { | ||
return true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Implement actual credit checking logic similar to SERP service
The current implementation always returns true
, while other services like SERP implement proper credit checking. For Clearbit service, consider implementing similar logic:
- async hasRemainingCredits(): Promise<boolean> {
- return true
- }
+ async hasRemainingCredits(): Promise<boolean> {
+ try {
+ const response = await axios.get('https://api.clearbit.com/v1/remaining_credits', {
+ auth: {
+ username: process.env['CROWD_ENRICHMENT_CLEARBIT_API_KEY'],
+ password: ''
+ }
+ });
+ return response.data.remaining > 0;
+ } catch (error) {
+ this.log.error('Failed to check Clearbit credits:', error);
+ return false;
+ }
+ }
- The SERP service properly checks remaining credits via their API
- The ProgAI service also needs proper implementation
- Credit checking is crucial as mentioned in the interface comments: "if returned false the source will be skipped"
🔗 Analysis chain
Implement actual credit checking logic.
The current implementation always returns true
, which defeats the purpose of credit management. This should be implemented to actually check Clearbit API credits.
Let's verify if other enrichment services have proper credit checking implemented:
Consider implementing proper credit checking:
- async hasRemainingCredits(): Promise<boolean> {
- return true
- }
+ async hasRemainingCredits(): Promise<boolean> {
+ try {
+ // Add actual credit checking logic here
+ // Example: Check Clearbit API quota/usage endpoints
+ return true
+ } catch (err) {
+ this.log.error(`Failed to check Clearbit credits: ${err}`)
+ return false
+ }
+ }
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for hasRemainingCredits implementations in other enrichment services
ast-grep --pattern 'async hasRemainingCredits(): Promise<boolean> {
$$$
}'
Length of output: 78
Script:
#!/bin/bash
# Let's try a broader search for hasRemainingCredits implementations
rg -l "hasRemainingCredits"
# Then let's look at the interface definition
rg "interface.*EnrichmentService" -A 10
Length of output: 1616
Script:
#!/bin/bash
# Let's examine the implementations in other services
rg "hasRemainingCredits.*Promise<boolean>" -A 5 services/apps/premium/members_enrichment_worker/src/sources/serp/service.ts services/apps/premium/members_enrichment_worker/src/sources/progai/service.ts
Length of output: 1554
Script:
#!/bin/bash
# Let's see the complete SERP implementation to understand the credit checking logic
rg "hasRemainingCredits.*Promise<boolean>" -A 15 services/apps/premium/members_enrichment_worker/src/sources/serp/service.ts
Length of output: 622
async hasRemainingCredits(): Promise<boolean> { | ||
return true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Align implementation with existing patterns.
To maintain consistency with the rest of the codebase, consider these improvements:
- Add error handling similar to
getData
methods - Use the logger for monitoring credit checks
- Consider adding credit threshold configuration via environment variables
Here's a suggested implementation:
async hasRemainingCredits(): Promise<boolean> {
- return true
+ try {
+ this.log.debug('Checking ProgAI remaining credits')
+ // Add credit checking logic here
+ const hasCredits = true // Replace with actual check
+ this.log.debug(`ProgAI credits available: ${hasCredits}`)
+ return hasCredits
+ } catch (err) {
+ this.log.error('Error checking ProgAI credits', err)
+ throw new Error(`Failed to check ProgAI credits: ${err.message}`)
+ }
}
Committable suggestion skipped: line range outside the PR's diff.
export async function getHasRemainingCredits(source: MemberEnrichmentSource): Promise<boolean> { | ||
const redisCache = new RedisCache(`enrichment-${source}`, svc.redis, svc.log) | ||
return (await redisCache.get('hasRemainingCredits')) === 'true' | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle Missing Cache Entries in getHasRemainingCredits
If the cache key does not exist, redisCache.get
may return null
, leading to unexpected behavior when comparing with 'true'
.
Modify the function to handle null
values explicitly:
export async function getHasRemainingCredits(source: MemberEnrichmentSource): Promise<boolean> {
const redisCache = new RedisCache(`enrichment-${source}`, svc.redis, svc.log)
- return (await redisCache.get('hasRemainingCredits')) === 'true'
+ const value = await redisCache.get('hasRemainingCredits')
+ if (value === 'true') {
+ return true
+ } else if (value === 'false') {
+ return false
+ } else {
+ return false // or consider throwing an error if appropriate
+ }
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
export async function getHasRemainingCredits(source: MemberEnrichmentSource): Promise<boolean> { | |
const redisCache = new RedisCache(`enrichment-${source}`, svc.redis, svc.log) | |
return (await redisCache.get('hasRemainingCredits')) === 'true' | |
} | |
export async function getHasRemainingCredits(source: MemberEnrichmentSource): Promise<boolean> { | |
const redisCache = new RedisCache(`enrichment-${source}`, svc.redis, svc.log) | |
const value = await redisCache.get('hasRemainingCredits') | |
if (value === 'true') { | |
return true | |
} else if (value === 'false') { | |
return false | |
} else { | |
return false // or consider throwing an error if appropriate | |
} | |
} |
export async function hasRemainingCredits(source: MemberEnrichmentSource): Promise<boolean> { | ||
const service = EnrichmentSourceServiceFactory.getEnrichmentSourceService(source, svc.log) | ||
|
||
if (await hasRemainingCreditsExists(source)) { | ||
return getHasRemainingCredits(source) | ||
} | ||
|
||
const hasCredits = await service.hasRemainingCredits() | ||
|
||
await setHasRemainingCredits(source, hasCredits) | ||
return hasCredits | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Optimize hasRemainingCredits
Function Logic
The current implementation of hasRemainingCredits
involves multiple function calls and checks, which can be streamlined for better performance and readability.
Refactor the function as follows:
export async function hasRemainingCredits(source: MemberEnrichmentSource): Promise<boolean> {
const cachedValue = await getHasRemainingCredits(source)
- if (await hasRemainingCreditsExists(source)) {
- return getHasRemainingCredits(source)
- }
+ if (cachedValue !== null) {
+ return cachedValue
+ }
const service = EnrichmentSourceServiceFactory.getEnrichmentSourceService(source, svc.log)
const hasCredits = await service.hasRemainingCredits()
await setHasRemainingCredits(source, hasCredits)
return hasCredits
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
export async function hasRemainingCredits(source: MemberEnrichmentSource): Promise<boolean> { | |
const service = EnrichmentSourceServiceFactory.getEnrichmentSourceService(source, svc.log) | |
if (await hasRemainingCreditsExists(source)) { | |
return getHasRemainingCredits(source) | |
} | |
const hasCredits = await service.hasRemainingCredits() | |
await setHasRemainingCredits(source, hasCredits) | |
return hasCredits | |
} | |
export async function hasRemainingCredits(source: MemberEnrichmentSource): Promise<boolean> { | |
const cachedValue = await getHasRemainingCredits(source) | |
if (cachedValue !== null) { | |
return cachedValue | |
} | |
const service = EnrichmentSourceServiceFactory.getEnrichmentSourceService(source, svc.log) | |
const hasCredits = await service.hasRemainingCredits() | |
await setHasRemainingCredits(source, hasCredits) | |
return hasCredits | |
} |
…hub.com:CrowdDotDev/crowd.dev into improvement/stricter-serp-api-enrichment-filter
…embers with most activities
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (8)
services/libs/types/src/premium/enrichment.ts (1)
28-32
: LGTM! Consider adding JSDoc comments.The new interface is well-structured and serves its purpose in aggregating activity data for member identities. The property types are appropriate for tracking activity metrics across platforms.
Consider adding JSDoc comments to document the interface's purpose and properties:
+/** + * Aggregates activity metrics for a member's identity on a specific platform. + */ export interface IEnrichableMemberIdentityActivityAggregate { + /** Number of activities performed by this identity */ activityCount: number + /** Username of the identity on the platform */ username: string + /** Platform where the activities were performed */ platform: string }services/apps/premium/members_enrichment_worker/src/activities/getMembers.ts (1)
14-14
: LGTM! Consider adding JSDoc documentation.The switch to cursor-based pagination with activity count is a good improvement. Consider adding JSDoc documentation to explain the cursor structure and its role in member prioritization.
+/** + * Fetches members eligible for enrichment, ordered by activity count. + * @param limit Maximum number of members to fetch + * @param sources Array of enrichment sources to consider + * @param afterCursor Pagination cursor containing activity count and member ID, or null for first page + * @returns Promise resolving to array of enrichable members + */ export async function getEnrichableMembers(services/apps/premium/members_enrichment_worker/src/workflows/getMembersToEnrich.ts (2)
57-60
: Add safety check for empty members arrayWhile the current implementation works correctly with the defined batch size, consider adding a safety check before accessing the last array element to make the code more robust against potential configuration changes.
+ if (members.length === 0) { + return; + } await continueAsNew<typeof getMembersToEnrich>({ afterCursor: { memberId: members[members.length - 1].id, activityCount: members[members.length - 1].activityCount, }, })
Line range hint
35-52
: Well-structured workflow implementationThe workflow follows Temporal best practices with:
- Proper child workflow isolation using unique workflowIds
- Well-configured retry policies with exponential backoff
- Appropriate timeout settings
- Proper tenant isolation using searchAttributes
services/apps/premium/members_enrichment_worker/src/workflows/enrichMember.ts (1)
59-81
: Improve code structure while maintaining the enhanced logic.While the logic for selecting the most active GitHub identity is sound, the implementation could be more efficient and maintainable.
Consider this refactoring:
- // there can be multiple verified identities in github, we select the one with the most activities - const verifiedGithubIdentities = input.identities.filter( - (i) => - i.verified && - i.platform === PlatformType.GITHUB && - i.type === MemberIdentityType.USERNAME, - ) - - if (verifiedGithubIdentities.length > 1) { - const ghIdentityWithTheMostActivities = - await findMemberIdentityWithTheMostActivityInPlatform(input.id, PlatformType.GITHUB) - if (ghIdentityWithTheMostActivities) { - enrichmentInput.github = input.identities.find( - (i) => - i.verified && - i.platform === PlatformType.GITHUB && - i.type === MemberIdentityType.USERNAME && - i.value === ghIdentityWithTheMostActivities.username, - ) - } - } else { - enrichmentInput.github = verifiedGithubIdentities?.[0] || undefined - } + // Select the GitHub identity with the most activities + const verifiedGithubIdentities = input.identities.filter( + (i) => i.verified && + i.platform === PlatformType.GITHUB && + i.type === MemberIdentityType.USERNAME + ); + + try { + enrichmentInput.github = verifiedGithubIdentities.length > 1 + ? await findMemberIdentityWithTheMostActivityInPlatform(input.id, PlatformType.GITHUB) + .then(mostActive => mostActive + ? verifiedGithubIdentities.find(i => i.value === mostActive.username) + : verifiedGithubIdentities[0] + ) + : verifiedGithubIdentities[0] || undefined; + + console.debug( + `Selected GitHub identity: ${enrichmentInput.github?.value} from ${verifiedGithubIdentities.length} candidates` + ); + } catch (error) { + console.error('Error finding most active GitHub identity:', error); + enrichmentInput.github = verifiedGithubIdentities[0] || undefined; + }This refactoring:
- Simplifies the nested conditionals
- Adds error handling
- Includes logging for debugging
- Uses a more functional approach
services/apps/premium/members_enrichment_worker/src/activities/enrichment.ts (1)
126-131
: Consider adding JSDoc for better documentationWhile the function name is descriptive, adding JSDoc would help document the purpose and parameters more clearly.
+/** + * Finds the member identity with the highest activity count for a given platform + * @param memberId - The ID of the member to search for + * @param platform - The platform to filter activities by + * @returns Promise resolving to the member identity with activity aggregate + */ export async function findMemberIdentityWithTheMostActivityInPlatform( memberId: string, platform: string, ): Promise<IEnrichableMemberIdentityActivityAggregate> {services/libs/data-access-layer/src/old/apps/premium/members_enrichment_worker/index.ts (2)
66-76
: Consider adding an index to optimize the JSON aggregationWhile the JSON aggregation of member identities is well-structured, this operation could be expensive for members with many identities.
Consider adding a composite index on (memberId, platform, value, type, verified) to optimize the aggregation performance.
Cursor-based pagination change requires attention
The verification revealed two different pagination implementations:
getEnrichableMembers
inactivities/getMembers.ts
correctly uses the new cursor-based pagination withactivityCount
andmemberId
getMembersForLFIDEnrichment
workflow still uses the old ID-based pagination withafterId
Issues found:
getMembersForLFIDEnrichment
workflow needs to be updated to use the new cursor structuregetLFIDEnrichableMembers
activity (referenced but implementation not shown) likely needs to be updated to match the new pagination pattern🔗 Analysis chain
Line range hint
19-91
: Verify API consumers are updated for the new cursor structureThe change from ID-based to cursor-based pagination is a breaking change. Let's verify that all consumers have been updated.
Let me try to find any references to this function and related files to better understand the API usage patterns.
I need to see how the cursor is being used in the actual implementations to verify if there are any compatibility issues.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for direct calls to fetchMembersForEnrichment rg -l 'fetchMembersForEnrichment.*afterId' # Search for any remaining ID-based pagination patterns rg -l 'afterId.*members.*enrichment'Length of output: 83
Script:
#!/bin/bash # Search for any usage of fetchMembersForEnrichment function rg -A 5 'fetchMembersForEnrichment' # Search for files containing 'members' and 'enrichment' in their path fd -t f 'members.*enrichment|enrichment.*members'Length of output: 2285
Script:
#!/bin/bash # Let's check the actual implementation of getMembers activity to see how it handles the cursor cat services/apps/premium/members_enrichment_worker/src/activities/getMembers.ts # Also check the workflow implementation that might show cursor handling cat services/apps/premium/members_enrichment_worker/src/workflows/lf-auth0/getMembersForLFIDEnrichment.tsLength of output: 2804
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (9)
services/apps/premium/members_enrichment_worker/src/activities.ts
(2 hunks)services/apps/premium/members_enrichment_worker/src/activities/enrichment.ts
(4 hunks)services/apps/premium/members_enrichment_worker/src/activities/getMembers.ts
(2 hunks)services/apps/premium/members_enrichment_worker/src/types.ts
(2 hunks)services/apps/premium/members_enrichment_worker/src/workflows/enrichMember.ts
(2 hunks)services/apps/premium/members_enrichment_worker/src/workflows/getMembersToEnrich.ts
(2 hunks)services/libs/data-access-layer/src/activities/sql.ts
(2 hunks)services/libs/data-access-layer/src/old/apps/premium/members_enrichment_worker/index.ts
(2 hunks)services/libs/types/src/premium/enrichment.ts
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- services/apps/premium/members_enrichment_worker/src/types.ts
🔇 Additional comments (14)
services/apps/premium/members_enrichment_worker/src/activities/getMembers.ts (1)
26-26
: Verify the impact of cursor-based pagination change.
This is a breaking change that requires updates in all code that calls this function. Let's verify that all callers have been updated to use the new cursor structure.
✅ Verification successful
Let me analyze the results and generate additional verification scripts to get more context about the changes.
Let me generate additional scripts to check the workflow implementation and any other potential callers.
Based on the comprehensive evidence gathered, I can now provide the final response:
The cursor-based pagination change is properly implemented across the codebase
The changes are consistent throughout the codebase:
- The
getEnrichableMembers
function signature change is properly reflected in its only caller, thegetMembersToEnrich
workflow - The workflow correctly handles the new cursor structure in both consuming and generating the cursor for continuation
- The underlying
fetchMembersForEnrichment
implementation in the data-access-layer also supports the new cursor structure - No other callers or legacy implementations were found that need updating
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Find all usages of getEnrichableMembers to verify they've been updated
# to use the new cursor structure
# Search for function calls
echo "Searching for getEnrichableMembers usage:"
rg -A 5 "getEnrichableMembers\("
# Search for potential old-style afterId usage that might need updating
echo -e "\nChecking for potential missed updates using old 'afterId':"
rg "afterId.*getEnrichableMembers"
Length of output: 1649
Script:
#!/bin/bash
# Let's verify the cursor structure changes by:
# 1. Looking at the git history of the file to see the changes
# 2. Checking all imports of getMembers.ts
# 3. Looking for any other pagination-related code
# Check git history for the file
echo "Git history of getMembers.ts:"
git log -p services/apps/premium/members_enrichment_worker/src/activities/getMembers.ts
# Find all files importing getMembers
echo -e "\nFiles importing getMembers:"
rg -l "from ['\"].*getMembers['\"]"
# Look for cursor-related code
echo -e "\nCursor-related code:"
rg -B2 -A2 "afterCursor|cursor" services/apps/premium/members_enrichment_worker/src/
Length of output: 13160
Script:
#!/bin/bash
# Let's check the workflow implementation and any other potential callers
# Check the workflow implementation
echo "Content of getMembersToEnrich workflow:"
cat services/apps/premium/members_enrichment_worker/src/workflows/getMembersToEnrich.ts
# Look for any other potential callers of getEnrichableMembers
echo -e "\nSearching for any other imports or references to getEnrichableMembers:"
rg -l "getEnrichableMembers"
# Check the fetchMembersForEnrichment implementation for cursor handling
echo -e "\nLooking for fetchMembersForEnrichment implementation:"
rg -A 10 "fetchMembersForEnrichment.*\("
Length of output: 4389
services/apps/premium/members_enrichment_worker/src/workflows/getMembersToEnrich.ts (2)
22-22
: LGTM: Cursor initialization looks good
The change from afterId
to afterCursor
with proper null fallback is implemented correctly.
29-29
: Verify getEnrichableMembers signature update
The function call has been updated to use the new cursor parameter. Let's verify that the corresponding function signature in the activities has been updated accordingly.
✅ Verification successful
Function signature is correctly updated with cursor parameter ✅
The getEnrichableMembers
function in services/apps/premium/members_enrichment_worker/src/activities/getMembers.ts
has the correct signature that matches the updated call:
getEnrichableMembers(
limit: number,
sources: MemberEnrichmentSource[],
afterCursor: { activityCount: number; memberId: string } | null,
)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the getEnrichableMembers function signature in activities
# Expected: Should find the updated function signature with afterCursor parameter
ast-grep --pattern 'getEnrichableMembers = async (limit: number, sources: MemberEnrichmentSource[], afterCursor: $_)'
Length of output: 119
Script:
#!/bin/bash
# Let's try a broader search to find the getEnrichableMembers function definition
# First, search for the file containing the function
rg -l "getEnrichableMembers"
# Then let's see the actual implementation with context
rg -A 10 "export const getEnrichableMembers"
Length of output: 302
Script:
#!/bin/bash
# Let's check the content of the files we found
cat services/apps/premium/members_enrichment_worker/src/activities/getMembers.ts
Length of output: 1086
services/apps/premium/members_enrichment_worker/src/workflows/enrichMember.ts (2)
22-22
: LGTM: Activity proxy addition is well-configured.
The new activity follows the established pattern with appropriate timeout and retry configuration.
59-81
: Verify the impact of GitHub identity selection changes.
The new logic for selecting GitHub identities based on activity count might affect downstream processes that expect consistent identity selection.
Let's verify the usage:
✅ Verification successful
Based on the search results, I can now provide a final response regarding the GitHub identity selection changes:
GitHub identity selection changes are safe and properly handled
The new logic for selecting GitHub identities based on activity count is well-contained within the enrichment workflow and follows existing patterns. The code shows that:
- GitHub identity is only used as an enrichment input source and doesn't affect downstream processes
- The identity selection is scoped to the enrichment process and stored in
enrichmentInput.github
- No other components directly depend on or make assumptions about which GitHub identity is selected
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for code that depends on GitHub identity selection
echo "Searching for downstream usage of enriched GitHub identities..."
# Look for references to github identity in enrichment results
rg -l "enrichmentInput\.github" --type ts
# Look for potential consumers of enriched data
rg -l "MemberEnrichmentSource\.GITHUB" --type ts
Length of output: 309
Script:
#!/bin/bash
# Let's search for broader patterns to understand the GitHub identity usage
# Search for references to github identity in type definitions and interfaces
rg -A 5 "github.*:" --type ts
# Look for any code that processes enriched member data
rg -A 5 "enriched.*member" -i --type ts
# Search for potential database operations or API endpoints using member data
rg -A 5 "(updateMember|saveMember|getMember)" --type ts
# Look for any workflow or process that might depend on GitHub identity
rg -A 5 "MemberIdentityType.USERNAME.*GITHUB" --type ts
Length of output: 263821
services/apps/premium/members_enrichment_worker/src/activities/enrichment.ts (4)
1-13
: LGTM: Clean implementation of credit check in enrichment flow
The addition of credit validation in getEnrichmentData
is well-placed, ensuring that enrichment only proceeds when both the source is enrichable and credits are available.
Also applies to: 36-38
61-71
: LGTM: Well-implemented Redis caching with TTL
The function properly manages credit status with a reasonable 60-second TTL to prevent stale data.
73-76
: Skip comment: Previous review feedback still applies
78-94
: Skip comment: Previous optimization suggestion still applies
services/libs/data-access-layer/src/old/apps/premium/members_enrichment_worker/index.ts (4)
19-19
: LGTM: Improved pagination with cursor-based approach
The change from simple ID-based pagination to cursor-based pagination with activity count is a good improvement. This ensures consistent pagination when ordering by activity count while maintaining stable results across pages.
21-23
: LGTM: Well-structured cursor filter with proper parameterization
The cursor filter correctly implements keyset pagination using both activity count and member ID, with proper SQL injection protection through parameterized queries.
78-89
: LGTM: Well-structured joins and ordering
The query structure shows good practices:
- Appropriate use of LEFT JOIN for activity summary allows including members without activity
- Proper ORDER BY clause ensures consistent pagination
- WHERE clause maintains proper filtering logic
91-91
: LGTM: Clean parameter handling
The parameter array handling is clean and properly matches the query parameters for both cursor and non-cursor cases.
services/libs/data-access-layer/src/activities/sql.ts (1)
11-11
: LGTM!
The type import is correctly placed and properly used in the new function.
@@ -1,5 +1,6 @@ | |||
import { | |||
findMemberEnrichmentCache, | |||
findMemberIdentityWithTheMostActivityInPlatform, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Function needs unit tests
The implementation of findMemberIdentityWithTheMostActivityInPlatform
was found in services/libs/data-access-layer/src/activities/sql.ts
, but no corresponding test files were discovered. Since this is a new backend functionality that executes SQL queries, it should be properly tested.
- Add unit tests for
findMemberIdentityWithTheMostActivityInPlatform
inservices/libs/data-access-layer/src/activities/sql.test.ts
- Test the SQL query execution with different scenarios
- Verify the correct aggregation of activity counts
- Check handling of edge cases (no activities, multiple usernames)
🔗 Analysis chain
LGTM! Changes follow established patterns.
The addition of findMemberIdentityWithTheMostActivityInPlatform
to imports and exports is clean and maintains consistent ordering.
Let's verify the test coverage for this new functionality:
Also applies to: 54-54
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if tests exist for the new function
# Test 1: Look for test files related to the function
fd -g "*test*" -g "*spec*" | rg -l "findMemberIdentityWithTheMostActivityInPlatform"
# Test 2: Check the implementation and its tests in data-access-layer
rg -A 5 "findMemberIdentityWithTheMostActivityInPlatform" "services/libs/data-access-layer/src"
rg -A 5 "findMemberIdentityWithTheMostActivityInPlatform" "services/libs/data-access-layer/test"
Length of output: 977
Script:
#!/bin/bash
# Let's try a different approach to find test files
# Search for test files in the activities directory
fd -g "*test*" -g "*spec*" -p "services/libs/data-access-layer"
# Check content of any test files in the activities directory
rg -A 5 "activities/sql" "services/libs/data-access-layer"
# Look for the function implementation to understand its complexity
ast-grep --pattern 'export async function findMemberIdentityWithTheMostActivityInPlatform($_) {
$$$
}'
Length of output: 280
Script:
#!/bin/bash
# Let's try one final approach to find tests
# Search for test files in data-access-layer
fd "test" "services/libs/data-access-layer"
# Look for SQL implementation details
cat services/libs/data-access-layer/src/activities/sql.ts
# Check if there are any test files in the activities folder
fd "test" "services/libs/data-access-layer/src/activities"
Length of output: 36040
export async function findMemberIdentityWithTheMostActivityInPlatform( | ||
qdbConn: DbConnOrTx, | ||
platform: string, | ||
memberId: string, | ||
): Promise<IEnrichableMemberIdentityActivityAggregate> { | ||
const query = ` | ||
SELECT count(a.id) AS "activityCount", a.platform, a.username | ||
FROM activities a | ||
WHERE a."memberId" = $(memberId) | ||
AND a.platform = $(platform) | ||
GROUP BY a.platform, a.username | ||
ORDER BY activity_count DESC | ||
LIMIT 1; | ||
` | ||
|
||
return qdbConn.oneOrNone(query, { | ||
memberId, | ||
platform, | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix SQL query issues and add proper type mapping
There are several issues in the implementation:
- The ORDER BY clause uses incorrect column name
activity_count
instead of"activityCount"
- Missing soft delete filter that's used in all other queries
- Inconsistent column name quoting
- Raw database result is returned without mapping to the interface type
Apply this diff to fix the issues:
export async function findMemberIdentityWithTheMostActivityInPlatform(
qdbConn: DbConnOrTx,
platform: string,
memberId: string,
): Promise<IEnrichableMemberIdentityActivityAggregate> {
const query = `
- SELECT count(a.id) AS "activityCount", a.platform, a.username
+ SELECT COUNT(a.id) AS "activityCount", a."platform", a."username"
FROM activities a
WHERE a."memberId" = $(memberId)
AND a.platform = $(platform)
+ AND a."deletedAt" IS NULL
- GROUP BY a.platform, a.username
- ORDER BY activity_count DESC
+ GROUP BY a."platform", a."username"
+ ORDER BY "activityCount" DESC
LIMIT 1;
`
- return qdbConn.oneOrNone(query, {
+ const result = await qdbConn.oneOrNone(query, {
memberId,
platform,
})
+
+ return result ? {
+ activityCount: parseInt(result.activityCount),
+ platform: result.platform,
+ username: result.username
+ } : null
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
export async function findMemberIdentityWithTheMostActivityInPlatform( | |
qdbConn: DbConnOrTx, | |
platform: string, | |
memberId: string, | |
): Promise<IEnrichableMemberIdentityActivityAggregate> { | |
const query = ` | |
SELECT count(a.id) AS "activityCount", a.platform, a.username | |
FROM activities a | |
WHERE a."memberId" = $(memberId) | |
AND a.platform = $(platform) | |
GROUP BY a.platform, a.username | |
ORDER BY activity_count DESC | |
LIMIT 1; | |
` | |
return qdbConn.oneOrNone(query, { | |
memberId, | |
platform, | |
}) | |
} | |
export async function findMemberIdentityWithTheMostActivityInPlatform( | |
qdbConn: DbConnOrTx, | |
platform: string, | |
memberId: string, | |
): Promise<IEnrichableMemberIdentityActivityAggregate> { | |
const query = ` | |
SELECT COUNT(a.id) AS "activityCount", a."platform", a."username" | |
FROM activities a | |
WHERE a."memberId" = $(memberId) | |
AND a.platform = $(platform) | |
AND a."deletedAt" IS NULL | |
GROUP BY a."platform", a."username" | |
ORDER BY "activityCount" DESC | |
LIMIT 1; | |
` | |
const result = await qdbConn.oneOrNone(query, { | |
memberId, | |
platform, | |
}) | |
return result ? { | |
activityCount: parseInt(result.activityCount), | |
platform: result.platform, | |
username: result.username | |
} : null | |
} |
Changes proposed ✍️
What
copilot:summary
copilot:poem
Why
How
copilot:walkthrough
Checklist ✅
Feature
,Improvement
, orBug
.Summary by CodeRabbit
New Features
Bug Fixes
Documentation