Rishab K Pattnaik

0 Publications

0+ Projects

0 Organizations

01 — About

Who I Am

Hi, I’m Rishab — an AI researcher and engineering student from BITS Pilani who accidentally fell in love with deep learning during my third year. That first internship was the spark: I saw what machines could actually do, and I’ve been hooked ever since. These days, you’ll find me exploring the weird and wonderful world of AI architectures — from Transformers and Diffusion models to Mamba and beyond. I love digging into how models think: attention mechanisms, fine-tuning tricks like PEFT and distillation, and the magic behind text generation and image synthesis. But I’m not just here for the theory; I genuinely enjoy building things that work. Whether it’s a production system like Moody.AI or a research project in medical imaging, I love the journey from idea to deployment. Currently, I’m part of the Avatar team at FLAM, where I get to play with image generation — GANs, diffusion, normalizing flows — and even dabble in streaming tech like WebRTC (because why limit yourself?). If you’re into AI — whether it’s research, engineering, or just staying up late debating whether attention really is all you need — I’d love to connect. Let’s geek out, collaborate, or simply share ideas. The best conversations start with curiosity.

02 — Experience

Experience

Flam

Jan 2026 – Present

Hamad Medical

May – Aug 2025

BITS Pilani

Aug 2024 – Dec 2025

IGCAR

May – Aug 2024

● active

AI Engineering Intern

Bangalore, India

Built data pipelines for talking-head avatar models — video, audio, facial landmarks
Researched generative architectures including GANs, diffusion models, and normalizing flows
Investigated loss functions for lip-sync, expression disentanglement, identity preservation
Deployed real-time avatar inference via WebRTC streaming

✓ completed

AI Research Intern

Doha, Qatar (Remote)

Emergency research in the Dept. of Surgery at one of the Middle East's leading medical centres
Multimodal AI for emergency triage and CKD diagnosis
Applied ViT and Mamba architectures with PyTorch
Collaborated across BITS Pilani and HMC research teams

✓ completed

Research Assistant — ECE Dept.

BITS Pilani, Hyderabad

Medical imaging focused on bone fracture detection
Developed Wavelet-CNN hybrid architectures
Published in Healthcare Technology Letters & IEEE Sensor Letters

✓ completed

CV Research Intern

IGCAR, Kalpakkam

Camouflaged object detection for industrial inspection applications
Fine-tuned Meta's Segment Anything Model (SAM)
Adapter-based strategies to improve SAM on low-contrast edge cases

03 — Research

Research & Publications

A Lightweight Fourier Block Transformer for Android-Based Edge‑Enabled Detection of Osteopenia and Osteoporosis

IEEE Sensors Letters · March 2026

Lightweight Fourier Block Transformer achieving 88.41% accuracy for real-time osteoporosis detection directly on Android devices using knee X-ray sensor images.

88.41% accuracy 60 imgs/min on Android TFLite edge deployment

Architecture: patch embedding + DFT block + dropout — no cloud required, fully on-device inference.

Rishab K Pattnaik · Dr. R K Tripathy (BITS) · Dr. R B Pachori (IIT Indore)

IEEE · 2026

Multi-Frequency Aware Deep Representation Learning for Bone Fracture Detection

Healthcare Technology Letters, 2025

Novel multiband-frequency aware network achieving 92.22% accuracy on bone fracture detection benchmarks.

92.22% accuracy AUC-ROC 0.959 6,229 MXR images

2D Wavelet Transform (LL/LH/HL/HH) + frozen EfficientNetV2B2 per subband — surpasses ResNet50, ViT, Swin Transformer.

Rishab K Pattnaik · Dr. R K Tripathy (BITS) · Dr. H Liu (Coventry)

IET / Wiley · 2025

Deep Representation Learning for Pneumonia & Tuberculosis Detection

Elsevier Book Chapter, 2025

Published in "Non-stationary and nonlinear data processing for automated computer-aided medical diagnosis".

87.08% accuracy Viral PN · Bacterial PN · TB IoT cloud deployment

Transfer learning blocks on chest X-ray images — classifies three respiratory disease categories with CAD-assisted diagnosis.

BITS Pilani · Published in Elsevier Non-stationary Data Processing volume

Elsevier · 2025

04 — Projects

Featured Projects

U-Tube AI

Visual YouTube Agent

Multimodal AI that analyzes both audio and visual streams from YouTube videos.

EdgeSeg-AI

Research Paper Re-implementation

Memory-efficient image segmentation via sequential model loading.

Moody.AI

Multimodal Sentiment AI

Emotion recognition using DINOv2, Wav2Vec2, and DistilBERT for multimodal fusion.

OsteoDiagnosis.AI

Bone Health Diagnostics App

Android app for osteoporosis classification achieving 90% accuracy.

Expression.AI

Android App

Real-time facial expression recognition using TensorFlow Lite on Android.

Hand Gesture Identifier

Computer Vision

Real-time gesture recognition using FastViT achieving 97.5% accuracy.

AI Document Assistant

NLP / RAG

Document processing and Q&A pipeline using DeepSeek and Llama models.

Smart AI Checkers Bot

AI Game

Checkers game with a Minimax AI opponent with alpha-beta pruning.

Human Face Gender Detection

Neural Networks

Gender detection using InceptionV3 achieving 94.35% accuracy.

05 — Skills

Technical Arsenal

Languages

Python

SQL

Git

Verilog

AI / ML Frameworks

PyTorch

TensorFlow

OpenCV

Scikit-learn

Keras

NumPy

Pandas

Matplotlib

Docker

LangChain

FastAPI

Android Studio

Streamlit

Specializations

Machine Learning

Deep Learning

Generative AI

LLMs

06 — Education

Education

Birla Institute of Technology and Science, Pilani

Bachelor of Engineering in Electronics and Communication Engineering

Hyderabad, India · Currently Studying

Relevant Coursework

Machine Learning for Electronics Engineer
Neural Network and Fuzzy Logic
Deep Learning
Digital Signal Processing
Signals and Systems
Probability and Statistics
Microprocessor and Interfacing
Linear Algebra

07 — Writing

Blogs

MedMamba Explained

Deep Dive · Deep Learning · State-Space Models

The first Vision Mamba for Generalized Medical Image Classification — what it is, how it works, and why it matters.

Medium · 2024

Journal Paper: Multi-Frequency Aware Deep Representation Learning Framework for Automated Detection of Bone Fractures using Muscle X-ray Images

Published: Healthcare Technology Letters, 2025
DOI: 10.1049/htl2.70021
Authors: Rishab Kumar Pattnaik, Dr. Rajesh Kumar Tripathy (BITS Pilani), Dr. Haipeng Liu (Coventry University)
Received: July 14, 2025 | Revised: September 12, 2025 | Accepted: October 1, 2025

Abstract & Objective

Developed an AI-driven automated approach for detecting bone fractures in muscle X-ray (MXR) images using the Multiband-Frequency Aware Deep Representation Learning Network (MFADRLN)
Addresses critical challenges: subtle fracture detection, class imbalance, and limited annotated datasets
Designed to improve patient outcomes through rapid, accurate diagnostic support

Novel Methodology

Multiresolution Analysis: Utilized Discrete Wavelet Transform (2DDWT) to decompose MXR images into four frequency subbands (LL, LH, HL, HH)
Architecture: Applied frozen EfficientNetV2B2-based Deep Representation Learning (DRL) blocks to each subband independently
Feature Fusion: Concatenated subband-specific features and processed through dense layers with batch normalization and dropout
Key Innovation: Captures frequency-specific and spatial information simultaneously for detecting subtle fracture patterns

Dataset & Training

Datasets: Mendeley dataset (2048 fracture, 127 non-fracture) + FracAtlas dataset (717 fracture, 3337 non-fracture)
Total Images: 6,229 MXR images with 72% training, 8% validation, 20% test split
Hyperparameters: Adam optimizer, binary cross-entropy loss, batch size 32, 50 epochs, learning rate 0.001
Regularization: Batch normalization, dropout layers, and early stopping to prevent overfitting

Results & Performance

Test Accuracy: 92.22% with precision 93.40%, recall 88.87%, F1-score 0.841
Cohen's Kappa: 0.8418 (excellent agreement beyond chance)
AUC Metrics: AUC-ROC 0.959, AUC-PR 0.875
Cross-Validation: 90.11 ± 0.42% average accuracy across 5 folds
Generalization: Only 2.5% gap between training and validation accuracy

Comparative Analysis

Transfer Learning Models: Outperformed ResNet50 (85.22%), DenseNet201 (84.57%), MobileNetV2 (84.57%), InceptionV3 (88.79%), XceptionNet (84.57%), EfficientNetV2B2 (88.14%)
Transformer Models: Surpassed Vision Transformer ViT B/16 (84.42%) and Swin Transformer Base (87.84%)
Literature Comparison: Superior to modified InceptionV3 (81%), VGG16 (82%), DenseNet169 (84%)
Ablation Studies: All four wavelet subbands yielded optimal performance vs. individual or subset combinations

Model Interpretability

Grad-CAM Visualization: Implemented for each subband to reveal model decision-making process
LL-Subband: Captured broader structural information around fracture regions
LH, HL, HH Subbands: Highlighted directional discontinuities and fine texture details
Clinical Value: Enhanced trust by revealing frequency-specific features driving fracture detection

Real-World Deployment

WebApp Platform: Deployed using Streamlit cloud-based framework for IoT-enabled detection
Real-Time Predictions: Provides confidence scores (achieved 0.97 for fracture detection)
Performance: Average prediction time of 12.01 seconds from upload to diagnosis
Accessibility: Remote access, platform independence, enabling rapid clinical decision support

Technical Contributions

First Application: Multiband-frequency domain DRL for bone fracture detection in medical imaging
Novel Integration: Wavelet-based signal processing with modern deep learning architectures
Efficiency: Reduced trainable parameters via frozen pre-trained blocks while maintaining superior performance
Proven Impact: Frequency-domain decomposition significantly improved detection vs. direct image analysis

Clinical Impact & Future Directions

Diagnostic Support: Reduces human error, minimizes treatment delays, improves patient outcomes in orthopedic care
Key Strength: Identifies subtle fracture regions often missed in manual visual inspection
Future Development: Lightweight models for embedded devices (FPGA, microcontrollers), federated learning for multi-hospital collaboration
Multimodal Extension: Integration with EHRs and BERT-based language models for comprehensive diagnostic systems

A Lightweight Fourier Block Transformer for Android-Based Edge‑Enabled Detection of Osteopenia and Osteoporosis Using X‑Ray Sensor Imaging Data

Published in: IEEE Sensors Letters, Volume 10, Issue 3, March 2026
Article Sequence Number: 6003004
Date of Publication: 03 February 2026
Electronic ISSN: 2475-1472
Authors: Rishab Kumar Pattnaik, Dr. Rajesh Kumar Tripathy (BITS Pilani), Dr. Ram Bilas Pachori (IIT Indore)

Abstract & Motivation

Early detection of osteoporosis (OPRS) and osteopenia (OPNA) is critical for preventing fractures in the aging population.
Existing deep learning methods rely on cloud processing, limiting real‑time, point‑of‑care deployment.
This work proposes an Android‑based edge‑enabled lightweight Fourier block‑based transformer (LFBBT) for real‑time OPRS/OPNA detection directly on mobile devices using knee X‑ray sensor images.

Methodology – LFBBT Model

Architecture: Patch embedding layer + discrete Fourier transform (DFT) block + dense layers + dropout + output layer.
The DFT block captures frequency‑domain features, reducing computational complexity while retaining diagnostic information.
Designed specifically for resource‑constrained mobile environments (Android) with TensorFlow Lite deployment.

Dataset & Evaluation

Evaluated on a publicly available knee X‑ray database containing normal, osteopenia, and osteoporosis cases.
Metrics: accuracy, precision, recall, F1‑score, and inference speed on Android devices.

Results & Performance

Overall accuracy: 88.41% – surpassing various transfer learning models and pre‑trained transformers.
Real‑time throughput: 60 images per minute on an Android smartphone.
Enables instant, on‑device diagnosis without internet connectivity, ideal for rural and point‑of‑care settings.

Significance & Impact

First lightweight transformer optimized for bone health screening directly on mobile hardware.
Brings AI‑assisted diagnosis to the edge, reducing dependency on cloud infrastructure and improving accessibility.
Demonstrates the feasibility of combining Fourier‑based processing with transformer architectures for medical imaging on low‑power devices.

Future Directions

Extension to multi‑joint analysis (hip, spine) and integration with electronic health records.
Further model compression for even faster inference on entry‑level smartphones.

U-Tube AI

U-Tube AI is a revolutionary multimodal AI agent that transforms YouTube videos into comprehensive knowledge assets by analyzing both audio and visual streams. Unlike mainstream tools (NoteGPT, Notta, MyMap.AI) that rely solely on transcripts, U-Tube AI employs adaptive frame sampling and OCR to capture slides, diagrams, and code shown on screen. This research-backed approach achieves 70-90% cost reduction compared to direct VLM processing while maintaining complete visual context that transcript-only tools miss entirely.

----------------------------------------------------------------------------------------------------------------

GitHub Repository: https://github.com/Rishab27279/U-Tube-AI

----------------------------------------------------------------------------------------------------------------

Core Innovation

U-Tube AI addresses the critical limitation of existing AI note-taking tools that are completely blind to visual content. The framework implements a sophisticated multimodal architecture combining:

Adaptive Frame Sampling: Intelligent extraction of key visual moments (slides, diagrams, code snippets) rather than uniform processing
Dual-Model Synthesis: Gemini Flash for rapid structure mapping, Gemini Pro for targeted deep synthesis
Confidence-Based OCR Routing: Smart decision engine that triggers visual extraction only when visual content enhances understanding
Cost Efficiency: Reduces processing costs from $139/hour (direct VLM) to $15-40 for hour-long videos while maintaining 90%+ information capture

Technical Architecture

The system operates through an intelligent dual-stage pipeline optimized for both quality and production-ready efficiency:

Stage 1 - Structure Mapping: Gemini Flash analyzes full transcripts to extract topic hierarchies and identify visually dense sections requiring OCR
Stage 2 - Deep Synthesis: Gemini Pro performs targeted deep dives on each section, fusing transcript and extracted visual data into coherent narratives
RAG Pipeline: ChromaDB-powered vector database enables semantic search with timestamp retrieval and context-aware Q&A across video content
Iterative Refinement: Low-confidence outputs trigger up to 3 enhancement loops, progressively improving note quality

Key Technical Advantages

Research-Backed Sampling: Built on CVPR 2025 (AKS, BOLT) and AdaFrame principles showing intelligent frame selection outperforms uniform sampling by 5-6% while processing 70-85% fewer frames
Multimodal Fusion: Hierarchical synthesis where structural analysis guides targeted visual extraction to information-dense segments
Production-Ready Design: Practical cost efficiency for real-world use cases, avoiding prohibitively expensive full-video analysis
Export Flexibility: Notes available in Markdown, PDF, and DOCX formats for seamless workflow integration

Performance Benchmarks

U-Tube AI demonstrates significant advantages over existing solutions while maintaining comprehensive understanding:

Cost per Hour: $15-40 vs $139 (direct VLM processing) - 70-90% reduction
Visual Content Capture: 100% of slides, diagrams, and code vs 0% for transcript-only tools
Information Retention: Maintains >90% information capture despite selective frame processing
Processing Efficiency: Moderate speed balancing thoroughness with resource optimization

Real-World Applications

The framework excels across diverse educational and professional content types, addressing critical gaps in existing tools:

Technical Tutorials: Programming lectures with code editors and terminal outputs (Strivers, Programming with Mosh)
Academic Content: Slide-heavy presentations and architectural diagrams requiring visual understanding (CampusX, MIT OpenCourseWare)
Professional Development: Conference talks and technical presentations with visual aids and data visualizations
Interactive Learning: RAG-powered chat enables Q&A with timestamp retrieval for targeted video review

Intelligent Decision System

Not every video requires frame-by-frame analysis. U-Tube AI implements a confidence-based routing engine that optimizes resource allocation:

Visual Density Evaluation: Assesses whether visual content would enhance note quality before triggering OCR
Talking-Head Optimization: Avoids unnecessary frame processing when transcripts alone suffice
Iterative Refinement Loops: Low-confidence outputs trigger enhancement cycles up to 3 times
Cost-Quality Balance: Ensures efficient resource usage without compromising comprehensive understanding

Open Source Commitment

Complete Implementation: Full codebase with Streamlit UI, dual-model orchestration, and RAG pipeline

U-Tube AI my try of a paradigm shift in YouTube content analysis by treating visual information as first-class data rather than an afterthought. The framework combines cutting-edge computer vision research with practical cost efficiency, enabling students and professionals to efficiently digest complex technical content without losing critical visual context that drives true understanding.

EdgeSeg-AI

EdgeSeg-AI is a revolutionary framework that makes advanced image segmentation accessible to everyone by introducing a novel, resource-efficient approach to prompt-based image segmentation. The framework addresses the computational limitations of cutting-edge segmentation methods by sequentially orchestrating three specialized models: Large Language Model (LLM), Fine-tuned VLM and Segment Anything Model (SAM). This innovative architecture achieves a 60-70% reduction in peak memory usage while maintaining high segmentation quality.

----------------------------------------------------------------------------------------------------------------

GitHub Repository: https://github.com/Rishab27279/EdgeSeg-AI

----------------------------------------------------------------------------------------------------------------

Core Innovation

EdgeSeg-AI presents a unique architectural approach that sequentially orchestrates three specialized models, representing a fundamental shift from conventional approaches that load all models simultaneously:

Large Language Model (LLM): Complex prompt interpretation and simplification
Fine-tuned Florence-2 (VLM): Precise object detection from natural language descriptions
Segment Anything Model (SAM): High-quality mask generation
Memory Efficiency: 60-70% reduction in peak memory usage while maintaining segmentation quality

Technical Architecture

The framework operates through a carefully designed three-stage pipeline that maximizes efficiency and accessibility:

Prompt Simplification: Complex natural language queries are processed and refined for optimal object detection
Object Detection: Florence-2 generates precise bounding boxes from simplified prompts
Segmentation: SAM produces detailed masks using the detected bounding boxes as guidance

Key Technical Advantages

Interactive Prompt Refinement: User control and transparency in the segmentation process
Memory-Efficient Sequential Loading: Revolutionary approach to model orchestration
Maintained Accuracy: High segmentation quality despite resource constraints
Consumer Hardware Compatible: Works with standard 4GB+ GPU memory

Performance Benchmarks

EdgeSeg-AI demonstrates significant improvements in resource efficiency while maintaining segmentation quality:

Peak Memory Usage: Reduced from 12-16GB to 4-6GB (60-70% reduction)
Minimum GPU Memory: Reduced from 8GB to 4GB (50% reduction)
Processing Time: 3-5 minutes vs 1 minute (trade-off for accessibility)
Segmentation Quality: High quality maintained across all metrics

Real-World Applications

The framework demonstrates robust performance across diverse contexts, showcasing the potential for democratizing AI-powered image analysis:

Waste Management: Environmental applications ('Where should I throw the wrapper?')
Automotive Interfaces: Vehicle dashboard analysis ('Where should I look to know the speed?')
Multi-Domain Versatility: Adaptable across various industrial and consumer applications
Accessibility Focus: Designed for widespread adoption and implementation

Research Community Engagement

This work builds upon the foundational contributions of the LLM-Seg paper by Junchi Wang and Lei Ke from ETH Zurich, adapting their approach to prioritize computational efficiency and accessibility. The research community's insights are invaluable for advancing this work further in the following areas:

Benchmarking: Against additional state-of-the-art methods
Dataset Evaluation: Testing on diverse datasets and use cases
Memory Optimization: Strategies for further memory reduction
Novel Applications: Implementation in resource-constrained environments

Open Source Commitment

Full Open Source: Complete codebase available for exploration and contribution
Community Driven: Open for critical evaluation and collaborative improvement
Accessibility Focus: Democratizing advanced AI capabilities for everyone
Research Transparency: Supporting reproducible research and innovation

EdgeSeg-AI represents a significant advancement in making sophisticated image segmentation technology accessible to a broader audience, combining cutting-edge AI research with practical resource efficiency. The framework's innovative sequential model loading approach opens new possibilities for deploying advanced computer vision capabilities on standard consumer hardware, democratizing access to powerful AI tools across various domains and applications.

Moody.AI

Moody.AI is a cutting-edge multimodal AI system that analyzes emotions from video content using computer vision, audio processing, and natural language processing. Powered by state-of-the-art deep learning models including DINOv2, Wav2Vec2, DistilBERT, and Whisper, the system provides comprehensive sentiment analysis with an intuitive web interface and achieves 61% accuracy on the challenging MELD dataset.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Docker Hub: https://hub.docker.com/r/rishab27279/moody-ai | GitHub: https://github.com/Rishab27279/MoodyAI

System Features

Multimodal analysis with simultaneous processing of vision, audio, and text
Real-time emotion classification with interactive confidence scores
Production-ready containerized deployment with Docker
Modern glass morphism UI with animated visualizations
Supports video files (MP4, MOV, MKV) up to 200MB
Fast processing: 10-30 seconds per video on GPU

Model Architecture and Performance

Moody.AI employs a sophisticated trimodal fusion architecture combining multiple AI models:

Vision Processing: DINOv2 ViT-B/14 for facial expression analysis with 768-dimensional embeddings compressed to 256 dimensions
Audio Processing: Fine-Tuned-Wav2Vec2-base Model for emotional audio representations achieving 32% accuracy on MELD
Text Processing: Fine-tuned DistilBERT-base-uncased achieving 50% accuracy on MELD dataset
Speech-to-Text: OpenAI Whisper for multilingual transcription with automatic language detection

Trimodal Fusion Innovation

The breakthrough architecture combines all three modalities through advanced fusion techniques:

Vision Feature Compression: Novel noise reduction technique that improved accuracy from 45-49% to 61%
Cross-Modal Attention: Advanced fusion mechanisms enabling interaction between text, audio, and vision features
Progressive Architecture: Systematic evolution from bimodal (56-57% accuracy) to trimodal fusion
Dropout Regularization: 0.3 and 0.2 dropout rates to prevent overfitting in fusion layers

Video Processing Pipeline

Frame Extraction: ~1 frame per second sampling with CVLib face detection
Audio Extraction: MoviePy conversion to 16kHz WAV with Whisper transcription
Feature Extraction: Parallel processing of vision (518×518), audio (768-dim), and text (512 tokens) features
Multimodal Fusion: Cross-attention mechanism with concatenated feature fusion
Emotion Classification: Softmax output across 5 emotion classes

Technical Stack

Deep Learning: PyTorch 1.13.0+ with Transformers 4.21.0+
Web Interface: Streamlit 1.24.0+ with custom CSS styling
Computer Vision: OpenCV for face detection and image processing
Audio Processing: Librosa 0.9.2+ and MoviePy 1.0.3+ for audio extraction
Model Optimization: TensorFlow Lite compatibility with TIMM 0.6.7+

Emotion Categories Detected

Happy 😄
Surprise 😢
Angry 😠
Melancholy 😨
Neutral 😐

Performance Benchmarks

Overall Accuracy: 61% on MELD dataset (competitive with SOTA DialogueRNN at 60.25%)
Processing Speed: GPU (RTX 3050) processes videos in 10-20 seconds
Memory Usage: ~4GB model loading, 6-7GB peak usage for large videos
F1-Score: 0.59-0.63 macro average across emotion classes

User Interface and Deployment

Web Application: Streamlit-based interface with glass morphism design
Processing Modes: Lightweight, Balanced, and High Fidelity options
Docker Deployment: One-command deployment with pre-built container
Cloud Ready: Compatible with Streamlit Cloud, AWS, Azure, and Google Cloud Run
Interactive Results: Real-time emotion visualization with confidence scores and probability distributions

Hardware Requirements

Minimum: 8GB RAM, CPU-only processing (30-60 seconds per video)
Recommended: 16GB RAM, NVIDIA GPU with 4GB+ VRAM
Optimal: 32GB RAM, Modern GPU like RTX 3060+ (5-10 seconds per video)

This project demonstrates the advancement of multimodal AI on edge devices, combining computer vision, natural language processing, and audio analysis into a comprehensive emotion recognition system with state-of-the-art performance.

OsteoDiagnosis.AI: Revolutionizing Bone Health Through Intelligent Mobile Diagnostics

OsteoDiagnosis.AI is an innovative Android application that leverages advanced signal processing and deep learning techniques for automated bone health assessment. The app utilizes a novel, lightweight neural network architecture combining Signal Processing (Fourier) and Deep Learning to classify bone density conditions into three categories: Osteoporosis, Osteopenia, and Normal bone density. This research-driven project represents a significant advancement in mobile healthcare AI, combining our cutting-edge computer vision with clinical diagnostic applications.

----------------------------------------------------------------------------------------------------------------

APK: https://github.com/Rishab27279/OS_Detection_Binary_And_3_Class_DWT/releases/download/v1.0/app-debug.apk

----------------------------------------------------------------------------------------------------------------

Research Context

This project was developed under academic supervision as part of ongoing research in medical AI diagnostics in BITS Pilani Hyderabad. While the complete technical methodology and results are currently confidential as the Research Paper is in it's last stages, the application demonstrates the successful integration of signal processing techniques with Modern deep learning architectures for bone health assessment.

Application Features

Three-class bone density classification (Osteoporosis, Osteopenia, Normal)
Real-time inference capabilities on mobile devices
Offline processing with no internet connectivity required
Intuitive user interface designed for clinical workflow integration
Optimized model architecture for mobile deployment efficiency (Model Size < 100Mb)
High accuracy performance validated through rigorous testing protocols and Benchmarks (Accuracy ~ 90% ; Kappa Score ~ 0.82)

Technical Innovation

The core innovation lies in the fusion of traditional signal processing methodologies with state-of-the-art deep learning approaches:

Novel Architecture: Custom-designed neural network optimized for bone density classification tasks
Signal Processing Integration: Powerful Signal Processing techniques enhance feature extraction and model performance
Mobile Optimization (< 100 Mb): Lightweight model design ensures efficient on-device inference
High Accuracy (~90%): Achieved superior classification performance through innovative architectural design

Research Methodology

Due to the ongoing nature of this research and pending publication, specific technical details regarding the model architecture, training protocols, and validation datasets remain confidential. The methodology represents a novel contribution to the field of medical AI, particularly in bone health diagnostics.

Technical Implementation

Android Studio: Native Android application development environment
Deep Learning Framework: Advanced neural network implementation for mobile deployment[4]
Signal Processing Libraries: Integrated preprocessing pipeline for enhanced feature extraction
Mobile Optimization: Efficient model compression and inference optimization techniques

Classification Categories

Normal Bone Density: Healthy bone structure classification
Osteopenia: Mild bone density reduction detection
Osteoporosis: Severe bone density loss identification

Clinical Significance

Early Detection: Enables timely identification of bone health deterioration
Accessibility: Mobile-first approach increases diagnostic accessibility
Efficiency: Rapid classification reduces diagnostic time and healthcare costs
Accuracy: High-performance model ensures reliable clinical decision support

This project demonstrates the successful application of advanced AI techniques to critical healthcare challenges, showcasing the potential for mobile-deployed deep learning solutions in clinical diagnostics. The combination of academic rigor with practical implementation highlights the intersection of research innovation and real-world healthcare applications.

Expression.AI — Decode Emotions in Real-Time

Expression.AI is a real-time facial expression recognition Android application that uses deep learning to detect and classify human emotions. Powered by a custom made TensorFlow Lite model named ResInceptionCNN, the app is designed for fast, on-device inference and intuitive user experience. It combines the power of computer vision with emotion AI for mobile devices.

App APK -> https://github.com/Riiishaab/Expression.AI/releases/download/v1.0/ExpressionAI.apk

Application Features

Real-time facial emotion detection using the device's camera
Supports classification from uploaded images via gallery
Runs fully offline with no internet required for inference
Emotion results displayed with matching emojis and labels
Fast and efficient performance optimized for mobile usage

Model and Dataset

Expression.AI is built on a custom deep learning model trained for facial expression recognition:

Model Name: ResInceptionCNN — a hybrid architecture combining strengths of Residual and Inception networks
Dataset: Trained on the FER2013 dataset containing 48x48 grayscale facial images labeled with 7 emotion classes
Conversion: Exported to TensorFlow Lite format for mobile deployment

ResInceptionCNN Architecture

The core of the project is the ResInceptionCNN model:

Inception Blocks: Capture multi-scale spatial features for rich local and global context
Residual Connections: Enable deeper networks by addressing vanishing gradients and improving feature flow
Batch Normalization: Applied after each convolution to stabilize and speed up training
Dropout Layers: Reduce overfitting and improve generalization
Final Dense Layer: Outputs emotion class probabilities through a softmax activation

Technical Stack

Android Studio: Used for developing the UI and app logic
TensorFlow Lite: Facilitates fast, on-device model inference
OpenCV: Handles face detection from camera and gallery inputs
CameraX: Provides real-time camera feed with face capture

Emotion Categories Detected

Angry 😠
Disgust 😖
Fear 😨
Happy 😄
Sad 😢
Surprise 😲
Neutral 😐

User Interface

Live Camera View: Real-time emotion recognition with facial overlay
Image Upload: Choose from gallery for static image analysis
Emoji and Label Output: Emotion results shown with emojis and text
Minimal UI: Clean design with clear interaction flow and responsive layout

This project demonstrates the potential of deep learning on edge devices, blending AI, mobile development, and human emotion understanding into a seamless Android experience.

THE ULTIMATE CHECKERS AI

This code implements a comprehensive Checkers (Draughts) game featuring two AI players with contrasting strategies: a sophisticated Smart AI using the Minimax algorithm with alpha-beta pruning and a baseline Random AI that makes random legal moves.

Game Structure

The game utilizes an 8×8 checkerboard grid with traditional alternating dark and light squares
Two AI players compete: Red (Smart AI) and Blue (Random AI)
Turn-based gameplay with visual representation of each move
King promotion mechanics when pieces reach the opponent's back row
Standard checkers rules including mandatory captures and multiple jumps

AI Implementation

The Minimax algorithm implementation represents the core intelligence of the Smart AI:

Depth-Limited Search: The Smart AI (Red player) looks ahead 3 moves using the Minimax algorithm
Alpha-Beta Pruning: Optimizes the search by eliminating branches that won't affect the final decision
Recursive Evaluation: Each potential move is evaluated by simulating future game states
Heuristic Evaluation: Board positions are scored based on piece count, king status, and positioning
Maximizing Strategy: The AI selects moves that maximize its advantage while assuming optimal opponent play

The Random AI provides a contrasting approach:

Identifies all pieces with legal moves
Randomly selects one of these pieces
Randomly selects one of the legal moves for that piece
Executes without strategic planning

Game Mechanics

Move Validation: Comprehensive checking of legal moves including diagonal movement and jumps
Capture Logic: Implementation of mandatory jump rules and multiple capture sequences
Visual Representation: Board state displayed using HTML/CSS with color-coded pieces
Game Termination: Ends after 100 moves or when a player has no remaining pieces/moves
Winner Determination: Based on remaining piece count or inability of opponent to move

User Interface

Interactive Display: Color-coded pieces with clear visual distinction between regular pieces and kings
Move Animation: Step-by-step visualization with appropriate delays for user comprehension
Notifications: Popup alerts for game start, end, and important events
Status Updates: Ongoing information about game progress, turn count, and current player
Board Rendering: Clear representation of the current game state after each move

This implementation demonstrates advanced concepts in game AI development.

Rishab K Pattnaik

Who I Am

Experience

AI Engineering Intern

AI Research Intern

Research Assistant — ECE Dept.

CV Research Intern

Research & Publications

A Lightweight Fourier Block Transformer for Android-Based Edge‑Enabled Detection of Osteopenia and Osteoporosis

Multi-Frequency Aware Deep Representation Learning for Bone Fracture Detection

Deep Representation Learning for Pneumonia & Tuberculosis Detection

Featured Projects

U-Tube AI

EdgeSeg-AI

Moody.AI

OsteoDiagnosis.AI

Expression.AI

Hand Gesture Identifier

AI Document Assistant

Smart AI Checkers Bot

Human Face Gender Detection

Technical Arsenal

Languages

AI / ML Frameworks

Specializations

Education

Birla Institute of Technology and Science, Pilani

Bachelor of Engineering in Electronics and Communication Engineering

Relevant Coursework

Blogs

MedMamba Explained

AI Engineering Intern

Key Contributions

Technologies

AI Research Intern

Key Contributions

Technologies

Research Assistant

Key Contributions

Technologies

CV Research Intern

Key Contributions

Technologies

Book Chapter: Deep Learning Applications in Medical Imaging

Disease Overview

Pneumonia (PN)

Tuberculosis (TB)

Diagnostic Approach

Chest X-Ray (CXR) Imaging

Computer-Aided Diagnosis (CAD) Systems

Deep Representation Learning (DRL) Framework

Network Architecture

Performance Evaluation

Implementation and Deployment

Cloud-Based Framework

Future Directions

Journal Paper: Multi-Frequency Aware Deep Representation Learning Framework for Automated Detection of Bone Fractures using Muscle X-ray Images

Abstract & Objective

Novel Methodology

Dataset & Training

Results & Performance

Comparative Analysis

Model Interpretability

Real-World Deployment

Technical Contributions

Clinical Impact & Future Directions

A Lightweight Fourier Block Transformer for Android-Based Edge‑Enabled Detection of Osteopenia and Osteoporosis Using X‑Ray Sensor Imaging Data

Abstract & Motivation

Methodology – LFBBT Model

Dataset & Evaluation

Results & Performance

Significance & Impact

Future Directions

U-Tube AI

Core Innovation

Technical Architecture

Key Technical Advantages

Performance Benchmarks

Real-World Applications

Intelligent Decision System