Cassandra Database

21. What is the syntax to create keyspace in Cassandra?  

Syntax for creating keyspace in Cassandra is

CREATE KEYSPACE <identifier> WITH <properties>

22. What is a column family in Cassandra?  

In Cassandra, a collection of rows is referred as "column family".

23. How does Cassandra perform write function?  

Cassandra performs the write function by applying two commits:
  • First commit is applied on disk and then second commit to an in-memory structure known as memtable.
  • When the both commits are applied successfully, the write is achieved.
  • Writes are written in the table structure as SSTable (sorted string table).

24. What are the management tools in Cassandra?  

DataStaxOpsCenter: It is an internet-based management and monitoring solution for Cassandra cluster and DataStax. It is free to download and includes an additional Edition of OpsCenter.

SPM: SPM primarily administers Cassandra metrics and various OS and JVM metrics. It also monitors Hadoop, Spark, Solr, Storm, zookeeper and other Big Data platforms besides Cassandra.

25. What are the main features of SPM in Cassandra?  

The main features of SPM are:
  • Correlation of events and metrics
  • Distributed transaction tracing
  • Creating real-time graphs with zooming
  • Detection and heartbeat alerting

26. When can you use ALTER KEYSPACE?  

The ALTER KEYSPACE can be used to change properties such as the number of replicas and the durable_write of a keyspace.

27. What is Cassandra-Cqlsh?  

Cassandra-Cqlsh is a query language, used to communicate with its database. Cassandra cqlsh facilitates you to do the following things:
  • Define a schema
  • Insert a data and
  • Execute a query

28. What are the differences between a node, a cluster, and datacenter in Cassandra?  

Node: A node is a single machine running Cassandra.

Cluster: A cluster is a collection of nodes that contains similar types of data together.

Datacenter: A datacenter is a useful component when serving customers in different geographical areas. Different nodes of a cluster can be grouped into different data centers.

29. What is Cassandra-CQL collection?  

Cassandra-CQL collection is used to store multiple values in single variable. Cassandra facilitates you to use CQL collections in following ways:

* List: List is used when the order of the data needs to be maintained, and a value is to be stored multiple times (holds the list of unique elements).

* SET: SET is used for group of elements to store and returned in sorted orders (holds repeating elements).

* MAP: MAP is a data type used to store a key-value pair of elements.

30. What is the relationship between Apache Hadoop, HBase, Hive and Cassandra?  

Apache Hadoop, File Storage, Grid Compute processing via Map Reduce.

Apache Hive, SQL like interface on top of hadoop.

Apache Hbase, Column Family Storage built like BigTable

Apache Cassandra, Column Family Storage build like BigTable with Dynamo topology and consistency.

31. List out some key features of Apache Cassandra?  

It is scalable, fault-tolerant, and consistent.

It is a column-oriented database.

Its distribution design is based on Amazon’s Dynamo and its data model on Google’s Bigtable.

Created at Facebook, it differs sharply from relational database management systems.

Cassandra implements a Dynamo-style replication model with no single point of failure, but adds a more powerful “column family” data model.

Cassandra is being used by some of the biggest companies such as Facebook, Twitter,

Cisco, Rackspace, ebay, Twitter, Netflix, and more.

32. What do you understand by Data Replication in Cassandra?  

Database replication is the frequent electronic copying data from a database in one computer or server to a database in another so that all users share the same level of information.

Cassandra stores replicas on multiple nodes to ensure reliability and fault tolerance. A replication strategy determines the nodes where replicas are placed. The total number of replicas across the cluster is referred to as the replication factor. A replication factor of 1 means that there is only one copy of each row on one node. A replication factor of 2 means two copies of each row, where each copy is on a different node. All replicas are equally important; there is no primary or master replica. As a general rule, the replication factor should not exceed the number of nodes in the cluster. However, you can increase the replication factor and then add the desired number of nodes later.

33. What do you understand by Node in Cassandra?  

Node is the place where data is stored.

34. What do you understand by Data center in Cassandra?  

Data center is a collection of related nodes.

35. What do you understand by Commit log in Cassandra?  

Commit log is a crash-recovery mechanism in Cassandra. Every write operation is written to the commit log.

36. What is bloom filter?  

Bloom filter is an off-heap data structure to check whether there is any data available in the SSTable before performing any I/O disk operation.

37. Explain zero consistency?  

In zero consistency the write operations will be handled in the background, asynchronously. It is the fastest way to write data.

38. Mention what are the values stored in the Cassandra Column?  

There are three values in Cassandra Column. They are:
  1. Column Name
  2. Value
  3. Time Stamp

39. What do you understand by Kundera?  

Kundera is an object-relational mapping (ORM) implementation for Cassandra which is written using Java annotations.

40. How to start/stop Cassandra on a machine?  

Starting Cassandra involves connecting to the machine where it is installed with the proper security credentials, and invoking the cassandra executable from the installation's binary directory. An example of starting Cassandra on Mac could be:

sudo /Applications/Cassandra/apache-cassandra-1.1.1/bin/cassandra

.Net Interview Question

PHP Interview Question

Java Interview Question

AngularJS Interview Questions