This repository provides a step-by-step guide to integrating Hadoop with MongoDB for seamless data processing.
Follow the YouTube instructions in this video: https://youtu.be/knAS0w-jiUk?si=1kRWjsqXUrt669Vl
Follow the instructions in this video: https://youtu.be/tC49Nzm6SyM?si=wiXDoVOhadirJ1yj
Create a free account here.
Note: Use your Google Account to create the MongoDB account.
Download the required JAR files from https://repo1.maven.org/maven2/org/mongodb/ (Download only the .jar
files):
commons-lang-2.6
mongodb-driver-3.12.10
mongodb-driver-core-3.12.10
mongo-hadoop-core-2.0.2
bson-4.9.1
mongodb-driver-sync-4.9.1
Note: Store these files in the following directory:
C:/hadoop/share/hadoop/common/lib/
Make sure to replace mongo.input.uri value and mongo.output.uri value by your own MonogoDB data
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>mongo.input.uri</name>
<value>mongodb://localhost:27017/mydatabase.input_collection</value>
</property>
<property>
<name>mongo.output.uri</name>
<value>mongodb://localhost:27017/mydatabase.output_collection</value>
</property>
<property>
<name>mongo.input.uri</name>
<value>mongodb://<username>:<password>@<cluster-address>/<database>.<collection></value>
</property>
<property>
<name>mongo.output.uri</name>
<value>mongodb://<username>:<password>@<cluster-address>/<database>.<collection></value>
</property>
Note Make sure your Hadoop is Running
- Open IntelliJ and create a new project with Maven selected.
- Add
dependecies
andbuild plugins
in pom.xml file from the given pom.xml file - Create a file name
MongoDBInsert.java
in org.example and run the file. (It will upload the data to MongoDB database) - Open cmd and run
mongosh “your uri of MongoDB” use testDB db.usersNew.find().pretty()
- Create 3 more file named
MongoMapper.java
,MongoReducer
andMongoHadoopJob.java
and run the MongoHadoopJob.java file it will read the data from MongoDb and process it after that store it back to MongoDB. - Run the below command on cmd
db.processedUsersData.find().pretty()